1 / 90

Introduction to Bioinformatics: Unlocking the Secrets of Genomes and Sequences

Discover the merging of mathematics and biology in computational management of biological information. Explore applications such as sequence assembly, gene prediction, and genome annotation. Unravel the functions of genes and study genome evolution with Structural Genomics techniques.

troupe
Download Presentation

Introduction to Bioinformatics: Unlocking the Secrets of Genomes and Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioinformaticsICES 2006 Introduction Revised 29/09/06

  2. Interesting book • Bioinformatics (Sequence And Genome Analysis) • Type : Paperback • Publisher : Cold Spring Harbor Laboratory Press • Publication date : 05/09/2001 • Weight : 1606 gr. • Pages : 560 • Format : 27.99 x 20.52 x 2.92 cm • Number of books : 1 • ISBN : 0879696087

  3. Introduction BIOINFORMATICS • What is Bioinformatics • What is annotation • What is a high throughput measurement • What is Systems Biology • What is the future of molecular biology and bioinformatics • What is the impact of bioinformatics on industry

  4. What is Bioinformatics Computational management of all kinds of biological information (computational biology) • Organization of biological information (databases) • Analyzing biological data • Heterogeneous research field with many subfields • Alignments • Phylogeny • Protein structure modeling,…

  5. What is Bioinformatics • Merge between mathematics and biology is not new • Phylogeny, molecular modeling, population genetics • Acquired new attention since 94, invention of the term “bioinformatics” First usage of “bioinformatics” Trends Biotechnol 1993 Ann N Y Acad Sci 1993

  6. Bioinformatics: driving force Driving force: • Development of new technologies in molecular biology • Genetic entities are analyzed simultaneously at high throughput level: genomics, transcriptomics, translatomics interactomics, metabolomics) • Information flow poses challenges • IT management • Dataintegration/datamining • Impact on Molecular Biology • Changes way of biological thinking as will be illustrated • Gave rise to different disciplines at the intersection between bioinformatics/molecular biology bioinformatics

  7. Subfields in bioinformatics Structural genomics Functional genomics Comparative genomics Molecular modeling (not in this course)

  8. Structural Genomics/Annotation Comparative Genomics/ evolutionary biology Functional genomics/ Systems Biology

  9. Structural Genomics • Input: raw sequence data • Applications: Sequence assembly; Gene, promoter splice site prediction • Biological goal: annotation

  10. Structural Genomics Genome Assembly Distinct methods to sequence a genome Based on a physical map (top down): • Cut genome into pieces • Subclone sequence in BACs • Automated laboratory procedure to screen for overlapping fragments (contigs) and to produce physical map • Identify unique overlapping clones and subclone • Sequence and assemble • Method used for complex genomes e.g. Human Genome Consortium

  11. Structural Genomics Top down sequencing 1. 2. Genome fragmentation BAC library 3. 4. Physical map Subclone library

  12. Structural Genomics Top down sequencing 5. Genome assembly

  13. Structural Genomics Genome Assembly Shot gun sequencing (bottom up) • Fragment genome (long (10000 bp) and short pieces (2000 bp)) • Generate plasmid libraries • Derive sequence from overlaps in large numbers of random sequences (500 bp from each end to create overlaps) • Assemble the sequences without using the guide of a physical map. Contigs are assembled based on an alignment of all possible sequence pairs in the computer • Method used by Celera Genomics (C. Venter)

  14. Shot Gun Sequencing 1. Genome fragmentation 2. Library 3. Sequences 4. Genome assembly Structural Genomics

  15. Structural Genomics Annotation The whole bioinformatics research aims at • unraveling functions of novel genes • studying the evolution of genomes As genomes are collected they need to be annotated. This means that we will have • To identify the location of the genes on the genome (structural annotation) • To assign a function to each of the potential genes (functional annotation)

  16. Structural Genomics Structural annotation

  17. FEATURES EXTRACTION STEP Structural Genomics Ab initio gene prediction (cont.) Structural annotation statistically significant features are extracted from the training set HMM SVM Neural Networks based on the extracted features a model is constructed to be used for the prediction in the next step MODEL CONSTRUCTION genes are predicted according to model obtained PREDICTION

  18. Structural Genomics Structural annotation Ab initio gene prediction Uses sequence properties only • Codon usage, splice site recognition

  19. Structural Genomics Chromosome mapping • Construction of chromosome maps (LocusLink) • Study of chromosome rearrangements

  20. Structural Genomics Case study 1 Sequencing projects

  21. Sequence data

  22. Sequence data • http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome • http://www.ensembl.org/index.html • http://www.cbs.dtu.dk/services/GenomeAtlas/ • http://www.ncbi.nlm.nih.gov/Genomes/index.html

  23. Human Genome Project 16/02/2001

  24. Nuclear genome 3300 Mb ~80 000 genes Structural Genomics Structural annotation location of the genes, the introns, the exons, splice sites, the promoters, the repeated elements Human genome Mitochondrial genome 16.6 kb 37 genes ~25% ~75% Two rRNA genes 22 tRNA genes 13 polypeptide- encoding genes Genes and gene- related sequences Extragenic DNA ~90% ~60% ~10% ~40% Unique or low copy number Moderate to highly repetitive Coding DNA Noncoding DNA Gene Fragments Introns, unstranslated sequences, etc Pseudogenes Tandemly repeated or clustered repeats Interspersed repeats

  25. … also other organisms… 2002 2000 1998 Genome of Sars, April 2003 (3 weeks !) 2002: Rat & Rice

  26. Chimpanzee genome The human and chimp genomes are about 98.8% identical Where do the dramatic behavioural and phenotypic differences that originated since they divergence 7 million years ago come from? Blood samples from single chimp, called Clint, provided 98% of the genome data. • Genes involved in smell and hearing are significantly different between humans and chimpanzees • Changes in regulatory binding sites might have contributed to the divergence between both species

  27. Chimpanzee genome Donaldson et al., 2006 Genome Biol

  28. Chimpanzee genome Donaldson et al., 2006 Genome Biol

  29. Structural Genomics/Annotation Comparative Genomics/ evolutionary biology Functional genomics/ Systems Biology

  30. Comparative Genomics • Input: annotated sequences • Applications: Blast, ClustalW, tree construction • Biological goal: explaining evolution, metagenomics

  31. Comparative Genomics Comparison of sequences between genomes Based on sequence alignment tools • Aid in gene prediction: extrinsic gene prediction • Homology based prediction of gene function • Study of protein families (evolutionary modeling, duplications, see later in course) • Phylogenetic footprinting (see later in course)

  32. Extrinsic gene prediction Fielden et al. 2002

  33. Comparative Genomics Extrinsic gene prediction Introns en splicing

  34. Comparative Genomics Extrinsic gene prediction

  35. Comparative Genomics Homology based function prediction Primary sequence Homologs in related organisms Families of proteins Multiple sequence alignment Features characteristic for the protein family

  36. Comparative Genomics Study of evolution Ancestral gene Function 1 Time Gene duplication Copy 1 Copy 2 Function 1 Function 1 Copy 1 Copy 2 Function 1 New Function !

  37. Comparative Genomics Study of evolution birds (reptiles) mammals amphibians Ray-finned fish 1 genome duplication ? Vertebrates 1, 2, 3 genome duplications ?

  38. Comparative Genomics Case study 1 Metagenomics

  39. Metagenomics Many species are difficult to study in isolation because they fail to grow in laboratory culture, depend on other organisms for critical processes, or have become extinct. Metagenomics: DNA can be isolated directly from living or dead cells in various contexts and directly sequenced (shot gun sequencing)

  40. Sargasso seas Boston (04/16/04)—This Spring, J. Craig Venter is sailing around the French Polynesian Islands scooping up bucketfuls (figuratively) of seawater in an ambitious voyage to sample microbial genomes found in the world's oceans. His 95-foot yacht, Sorcerer II, has been outfitted with all manner of technical equipment to accommodate the task, as well as a few surfboards should that opportunity arise. 5% of ncbi consists of Sargasso sequences Go to ncbi and type sargasso posed challenges to genome assembly Allows building environmental fingerprints 70,000 entirely novel genes, from an estimated 1,800 genomic species, including 148 novel bacterial phylotypes.

  41. Metagenomics

  42. Comparative genomics Case study 2 Genome evolution (Yves Van de Peer)

  43. Gene duplications Ancestral gene Function 1 Time Gene duplication Copy 1 Copy 2 Function 1 Function 1 Copy 1 Copy 2 Function 1 New Function !

  44. Sub-, neo-, en ‘nonfunctionalization’ Neofunctionalization (Ohno) duplicated genes Subfunctionalization Of the coding region Loss of one subfunction Gene preservation by subfunctionalization gene loss by nonfunctionalization Regulatory regions Protein Coding Domain Van de Peer et al.

  45. A 1 2 3 4 5 6 7 8 9 10 11 duplication A 1 2 3 4 5 6 7 8 9 10 11 B 1 2 3 4 5 6 7 8 9 10 11 Gene loss, rearrangements, translocation, etc … 2 A 1 3 4 6 7 10 11 B 1 2 4 6 7 8 9 11 retained homologs (anchor points) Genome scale duplications Time

  46. 0.050 Frog (283822) 100 Frog (2119679) 90 Chicken (2119682) 93 RARa Human (4160009) 100 100 Mouse (133484) Zebrafish (215026) 98 Zebrafish (704370) Gene duplication Van de Peer et al.

  47. Impact of duplication on evolution birds (reptiles) mammals amphibians Ray-finned fish 1 genome duplication ? Vertebrates 1, 2, 3 genome duplications ? Van de Peer et al.

  48. Impact of duplication on evolution Duplicated genes and diversity of fishes: is there a correlation?

  49. Bioinformatics & Genome Evolution Segment A 4 1 3 6 7 10 11 • Map-based approach • Gene Homology Matrix • Start from genome annotations • Represent chromosomes as sorted gene lists • Identify all homologous gene pairs between and within chromosomes (all-against-all BLAST) • Score pairs of homologs in matrix • Duplicated regions appear as diagonals • Test significance of a cluster 1 2 4 Segment B 6 7 8 9 11

  50. Bioinformatics & Genome Evolution

More Related