1 / 32

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT

Comparative Genomics. TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT CCCTGTTTCCAGGTTTGTTGTCCCAAAATAGTGACCATTTCATATGTATA. Overview. I. Comparing genome sequences Concepts and terminology

madison
Download Presentation

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Genomics TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT CCCTGTTTCCAGGTTTGTTGTCCCAAAATAGTGACCATTTCATATGTATA

  2. Overview • I. Comparing genome sequences • Concepts and terminology • Methods • Whole-genome alignments • Quantifying evolutionary conservation (PhastCons, PhyloP) • Identifying conserved elements • Available datasets at UCSC • II. Comparative analyses of function • Evolutionary dynamics of gene regulation • Case studies • Insights into regulatory variation within and across species

  3. Distribution of evolutionary constraint in the human genome 4.2% of genome is putatively constrained ~1 million putative regulatory elements Lindblad-Tohet al. Nature478:476 (2011)

  4. Goals of comparative genomics • Infer the course of past evolution using statistical models • of sequence evolution • Identify sequence elements evolving more slowly or more rapidly • than neutral • Evaluate the precise degree of constraint on specific • positions • Predict the functional effects of nucleotide or amino acid • mutations in constrained sequences

  5. Vertebrate genomes available for comparative studies Primates Mammals Tetrapods Vertebrates

  6. Commonly used (and misused) terms • Mutation vs. Substitution • Mutations occur in individuals, segregate in populations • Substitutions are mutations that have become fixed • Mutations = within species; substitutions = between species • Conservation vs. Constraint • Conservation = an observation of sequence similarity • Constraint = a hypothesis about the effect of purifying selection • Homology, Orthology and Paralogy • Homologous sequences = derived from a common ancestor • Orthologous sequences = homologous sequences separated by a speciation event • (e.g., human HOXA and mouse Hoxa) • Paralogous sequences = homologous sequences separated by gene duplication • (e.g., human HOXA and human HOXB)

  7. Basic premises in comparative sequence analysis • Most sequence differences among genomes are neutral • Involve substitutions with minimal or no functional impact • Fixed by random genetic drift • Fixation rate is equal to mutation rate • Genomes become more dissimilar with greater phylogenetic distance • Most mutations that affect function are eliminated by purifying selection • Constrained elements have lower substitution rates than expected from the neutral rate • Contingent on the effect of the mutation and degree of constraint on the function • Manifests as sequence conservation, even among distant species • Beneficial mutations may be driven to fixation by positive selection • May be detected as “faster-than-neutral” substitution rate • Expected to be rare

  8. Phylogenies • Phylogenetic trees show two things: • Evolutionary relationships among species or sequences: branching order • Evolutionary distance (e.g., degree of similarity or divergence): branch length Terminal node Branch Internal node

  9. Phylogenies • Phylogenetic trees show two things: • Evolutionary relationships among species or sequences: branching order • Evolutionary distance (e.g., degree of similarity or divergence): branch length Species tree Gene tree

  10. Orthologs and paralogs in gene trees HMGCS1 HMGCS2 Capra et al. 2013

  11. Paralogs Orthologs Orthologs Orthologs and paralogs in gene trees Duplication Capra et al. 2013

  12. Orthologs and paralogs in gene trees 1:1 Orthologs 1:1 Orthologs Human HMGCS1 Human HMGCS2 1:2 Capra et al. 2013

  13. Ortholog assignments at Ensembl

  14. Ortholog assignments at Ensembl

  15. Ortholog assignments at Ensembl

  16. Steps in sequence comparisons • Sequence alignment • Global vs. local • Whole-genome vs. genome segments (e.g., genes) • Identify sites that are homologous (not necessarily identical) • Measure similarity and divergence of sequences • Sequence similarity – level of conservation • Rates of change among sequences - divergence • Infer degree of evolutionary constraint • Are the sequences more conserved than expected from neutral evolution?

  17. Rates of sequence change are estimated using models of the substitution process Transition probabilities: 

  18. Phylogeny        Substitution rates are calculated for each lineage in a sequence phylogeny

  19. Conserved sequences identified by local reductions in substitution rate localneut  aligned position aligned position

  20. Tools for quantifying evolutionary conservation across genomes • Alignment: Multiz • Generates multiple species alignment relative to a base genome • Constructed from pairwise alignment of individual genomes to reference • 46-way and 100-way alignment to hg19, 30-way to mm9; 60-way to mm10

  21. 100-way Multiz alignment in hg19 Green = level of sequence similarity at each site

  22. Conservation of synteny: “net” alignments • Conservation of genome segments • Order and orientation of genes and regulatory sequences

  23. Conservation of synteny: “net” alignments • Synteny is frequently conserved on megabase scales

  24. Tools for quantifying evolutionary conservation across genomes • Alignment: Multiz • Generates multiple species alignment relative to a base genome • Constructed from pairwise alignment of individual genomes to reference • 46-way and 100-way alignment to hg19, 30-way to mm9; 60-way to mm10 • PhastCons • Estimates the probability that a nucleotide belongs to a conserved element • Sensitive to ‘runs’ of conserved sites – effective for identifying conserved blocks • For hg19, elements are calculated at three phylogenetic scopes • (Vertebrate, Placental Mammal, Primate) • PhyloP • Measures conservation independently at individual positions • Provides per-base conservation scores: (-log p value under hypothesis of neutrality) • Positive scores suggest constraint; negative scores suggest accelerated evolution

  25. Identifying conserved elements: PhastCons PhastCons scores PhastCons elements lod: 882 Score: 694 lod score: log probability under conserved model – log probability under neutral model Score: normalized lod score on 0-1000 scale Use scores to rank elements by estimated constraint

  26. PhastCons elements estimated at 3 phylogenetic scopes Primate Placental Vertebrate

  27. Level of conservation decays with increasing evolutionary distance

  28. PhyloP: measuring basewise conservation PhyloP scores • Scores are calculated independently for each base • Scores are –log P values under hypothesis of neutral evolution • Positive scores = constraint • Negative scores = acceleration

  29. Per-site phyloP conservation scores 4.49 1.77 -0.96 Use PhastCons to identify conserved elements Use phyloP to evaluate individual sites within elements

  30. Accessing conservation data

  31. Multiple genome alignments and conservation metrics are calculated independently for each reference genome Orthologous region in mouse: 30-way multiz alignment

  32. Conservation identifies critical binding sites in regulatory elements Regulatory info (ENCODE) Conservation Important binding sites and variants that affect function will be here

More Related