1 / 33

Evolution of Young Gene Duplicates in the C. elegans Genome

Evolution of Young Gene Duplicates in the C. elegans Genome. Vaishali Katju Genomes & Genomic Analysis - Dr. M. Werner-Washburne September 26th, 2006. General Outline. Why bother about Gene Duplication ? Identifying young gene duplicates Early Evolutionary Dynamics

aldon
Download Presentation

Evolution of Young Gene Duplicates in the C. elegans Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of Young Gene Duplicates in the C. elegans Genome Vaishali Katju Genomes & Genomic Analysis - Dr. M. Werner-Washburne September 26th, 2006

  2. General Outline • Why bother about Gene Duplication? • Identifying young gene duplicates • Early Evolutionary Dynamics  Degree of structural resemblance  Intron preservation  Genomic location and movement  Span of duplication • Conclusions

  3. Genome Size Plants Vertebrates Invertebrates Fungi Protists Eubacteria Archaea 105 106 107 108 109 1010 1011 Number Nucleotide Pairs per Haploid Genome

  4. Gene Content Plants Vertebrates Invertebrates Fungi Protists Eubacteria Archaea 10,000 20,000 30,000 40,000 50,000 60,000 Number of Genes

  5. "In a strict sense, nothing is evolution is created de novo. Each new gene must have arisen from an already existing gene."- OHNO 1970 Gene Duplication -- Why Bother? • Gene duplication plays a fundamental role in adaptive evolution and the origin of • organismal diversity •  Increase in genome size from primitive organisms to more complex ones. •  An increase in the number of functionally different genes over the course • of evolutionary history • The trend from simple to complex, and from unicellular to multicellular organisms is characterized by a parallel increase in the number of 'communication' proteins (Tamames et al. 1996) • Within higher organisms, most genes belong to multigene families or superfamilies (some with  1000 members)

  6. Duplication Nonfunctionalization Neofunctionalization (pseudogene) The Haldane Model for Fates of Duplicate Genes • Evolutionary importance of gene duplication recognized by evolutionary biologists and geneticists as early as the 1920s(Bridges, Haldane, Muller). • Gene duplication creates redundancy, which in turn enables functional diversification and adaptation.  One copy maintains the ancestral function.  The other copy is free to evolve and take on a new function.

  7. Bridges (1936) • first gene duplication observed as a characteristic • unpaired loop on polytene chromosomes of • Drosophila • Bar locus at region 16A of Chromosome X Wild type 16A 16A Bar eye 16A 16A 16A 16A Heterozygous Homozygous Double Bar eye 16A 16A 16A 16A 16A 16A

  8. Gene Duplication in the Pre-Genomic Era • Identification of young gene duplicates key to revealing mechanisms • of duplication and evolutionary forces responsible for fixation of a • duplicate copy. • Handful of known young duplicates represent a highly limited and • biased sample: •  serendipitously discovered •  copies of functionally important genes • (Maroni et al. 1987; Long & Langley 1993; Lootens et al. 1993; Lenormand et al. 1998) • Precludes statistically robust inferences about the early evolution of • gene duplicates.

  9. Genome-wide Identification of Duplicate Genes (Lynch and Conery 2000) • nine eukaryotic genomes • download complete set of putative amino acid sequences for each genome from • Genbank • Excluded: •  possible nonfunctional protein sequences that did not start with • methionine •  sequences annotated as known or suspected pseudogenes and • transposable elements. • identification of similar ORFs by BLAST (retained those pairs with < E-10) • Excluded multigene families with > 5 members

  10. Importance of Catching 'em Young • Examination of large, unbiased samples of young gene duplicates in the early stages of evolution is key to understanding the origin, divergence and preservation of new genes.  Early evolutionary dynamics of gene duplicates, may, to a large extent determine their future fate and functional role.  Provide clues to the mutational origin of duplicate genes.  Elucidate how a single extra copy in one individual comes to get fixed at the population- or species-level. Natural selection? Random Genetic Drift?

  11. Mutations Point Mutations Insertions Deletions • Mutations: Errors in DNA replication • Small-scale mutations affecting one or a few nucleotides include: Substitution of one Addition of one or Removal of one or nucleotide for another more nucleotides more nucleotides Point mutations occurring within a protein-coding region of the genes may be classified into three kinds, depending upon what the altered codon codes for. (i) Synonymous (or Silent) mutations: code for the same amino acid (ii) Nonsynonymous (or Missense) mutations: code for a different amino acid (iii) Nonsense mutations: code for a stop codon; can truncate the protein

  12. Degeneracy of the Genetic Code • 20 amino acids • 43 = 64 possible codons • Many codons are redundant; • two or more codons can • code for the same aa. • The degeneracy of the genetic code is what accounts for the existence of • synonymous/silent mutations.

  13. Mutations and the Molecular Clock CAATCGATCG 50 million years CAATTTATTT CAATTGATCG 25 million years CAATTTATCT CAATTTATCG Common Ancestor • Known rate for this sequence = 1 base substitution per 25 million years • Two sequences differ by four bases. • Two sequences differ by 100 millions years of evolution, so their common ancestor lived • 50 million years ago. • Molecular Clock Hypothesis: It holds that in any given DNA sequence, mutations accumulate at an approximately constant rate. The difference between the sequences of a DNA segment (or protein) in two species would be proportional to the time since the species diverged from a common ancestor (coalescence time).

  14. Synonymous Substitutions (KS) as a Proxy for Evolutionary Time: • Synonymous (silent) substitutions are thought to be largely neutral or • invisible to natural selection because they do not change the amino acid • sequence. • Fraction of synonymous substitutions (KS) for a pair of sequences approximates the time since divergence (in our case, duplication). Met Phe Arg Ser Pro Thr Duplicate Copy 1 AUG UUU CGA UCC CCG ACC Duplicate Copy 2 AUG UUU CGU UGC CCC ACC Met Phe Arg Cys Pro Thr • KS = Fraction of synonymous substitutions per synonymous site = 2/5 = 0.40 or 40% • KA = Fraction of nonsynonymous substitutions per nonsynonymous site =1/13 = 0.077 or 7.7%

  15. Biologically Relevant Questions Unanswered by Large-Scale Comparative Genomic Analysis • To what extent does a duplicate copy structurally resemble the progenitor copy at birth? • Are exon-intron boundaries maintained in the gene duplication process (role of reverse transcription)? • Where do duplicate copies reside at or close to inception and how does their location alter with time? • What is the approximate physical span of a duplication?

  16. Investigating Evolutionary Dynamics of a Genome-Wide Sample of Young Gene Duplicates in C. elegans Lynch & Conery 2000

  17. Methods • 290C. elegans gene-duplicate pairs with KS 0.10 • Classification of 290 gene duplicate pairs into two age-cohorts: • Cohort KS = 0…………… 68 pairs • Cohort 0 KS 0.1…………… 222 pairs • Retrieval of unspliced, spliced and UTR nucleotide sequences from Wormbase (http://www.wormbase.org) • Comparison of exons, introns, and flanking region sequences of both gene copies within a duplicate pair • Chromosomal location, strand orientation, cDNA data

  18. Biologically Relevant Questions Unanswered by Large-Scale Comparative Genomic Analysis • To what extent does a duplicate copy structurally resemble the progenitor copy at birth? • Are exon-intron boundaries maintained in the gene duplication process (role of reverse transcription)? • Where do duplicate copies reside at or close to inception and how does their location alter with time? • What is the approximate physical span of a duplication?

  19. Canonical Model for the Evolution of Functionally Novel Proteins After Gene Duplication Ohno 1970 Underlying Assumptions: • Gene duplications is complete  duplicate is structurally and functionally identical to ancestral copy • Neofunctionalization or nonfunctionalization of a duplicate copy due to  point mutations short indels.

  20. Canonical Model for the Evolution of Functionally Novel Proteins After Gene Duplication Ohno 1970 Complete Gene Duplication Functional Divergence by gradual accumulation of mutations

  21. New World Primates Old World Primates Positive selection of duplicated opsin to yield a shift in the absorption spectrum (15/348 sites with aa substitutions) Duplication of Red Opsin on Chr. X in common ancestor of OWM OWM-NWM split 35-40 mya Red opsin (X-linked) Blue opsin (autosomal) Diverged  500 mya 43% similar Duplication of X-Linked Visual Pigment Proteins in Old World Primates • OWM have trichromatic colour vision compared to dichromatic vision of NWM. • Three types of photoreceptors mediate colour vision in OWM: Red-Green- Blue • Red and Green opsins are 96% similar, but both only 43% similar to Blue opsins

  22. Three Categories of Structural Resemblance Between Duplicate Copies Complete Partial Chimeric

  23. Complete Partial Chimeric Early Presence of Structural Heterogeneity between Duplicate Copies

  24. Biologically Relevant Questions Unanswered by Large-Scale Comparative Genomic Analysis • To what extent does a duplicate copy structurally resemble the progenitor copy at birth? • Are exon-intron boundaries maintained in the gene duplication process (role of reverse transcription)? • Where do duplicate copies reside at or close to inception and how does their location alter with time? • What is the approximate physical span of a duplication?

  25. 50 bps W01D2.1 C54C6.1 C03A7.4 C03A7.7 B0035.2 C47A4.2 Minor Role of Reverse Transcription in the Origin of Gene Duplicates • 278/290 pairs with • intron in at least • one copy • Differential intron • retentionin only 3 • pairs • Introns preserved • in both copies in • ~ 99% of duplicate • pairs

  26. Biologically Relevant Questions Unanswered by Large-Scale Comparative Genomic Analysis • To what extent does a duplicate copy structurally resemble the progenitor copy at birth? • Are exon-intron boundaries maintained in the gene duplication process (role of reverse transcription)? • Where do duplicate copies reside at or close to inception and how does their location alter with time? • What is the approximate physical span of a duplication?

  27. The Majority of Gene-Duplicates within the KS = 0 Cohort Reside on the Same Chromosome Gadj= 24.6, P0.0001

  28. Lowered Survivorship of Duplicates in Genomic Proximity to Cognate Copy Explains Increase in Genomic Distance between Duplicates Over Time R2 = 0.165 108 106 104 102 100 0.00 0.02 0.04 0.06 0.08 0.10 Physical Distance between Gene Duplicates (bp) R2 = 0.0062 108 106 104 102 100 0.00 0.02 0.04 0.06 0.08 0.10 Substitutions per Silent-Site (KS) • 180 gene duplicate pairs with both copies on the same chromosome • Kendall’s  = 0.317; P < 0.0001 • Range : 0 - 14.8 Mb • Median Distance: 6.5 kb • No correlation apparent when KS = 0 age-cohort (55 Pairs) removed • Kendall’s  = 0.055; P = 0.366 • Median Distance: 44.9 kb

  29. Same Strand 0.8 Opposite Strand 0.6 Frequency of Duplicate Pairs 0.4 0.2 0.0 0 0.01 - 0.10 Substitutions/Synonymous Site Gadj= 4.2, P0.05 Two-Fold Excess of Duplicate Copies in Inverse Orientation within the KS = 0 Cohort

  30. Biologically Relevant Questions Unanswered by Large-Scale Comparative Genomic Analysis • To what extent does a duplicate copy structurally resemble the progenitor copy at birth? • Are exon-intron boundaries maintained in the gene duplication process (role of reverse transcription)? • Where do duplicate copies reside at or close to inception and how does their location alter with time? • What is the approximate physical span of a duplication?

  31. Predominance of Duplications Involving Short Sequence Tracts * Duplication Span = length of sequence homology between two duplicate copies Median duplication span ~ 1.4 kb 70% of all duplication spans < 2kb Average gene length ~ 2.5 kb

  32. Conclusions • Early presence of structural heterogeneity between duplicates •  Unique exon(s) in one or both copies • ~ 50% of all duplicates in KS = 0 cohort • ~ 64% of all duplicates in 0 KS 0 cohort •  Contrary to the common assumption that duplicates are • structurally and functionally identical to ancestral gene at birth • Distribution of duplication spans in highly biased towards • relatively short tracts that may not encompass entire genes •  Partial duplications are frequent

  33. Conclusions contd. • Mechanism of Origin of Gene Duplicates •  Small role of RNA-mediated transposition •  Illegitimate Recombination events leading to Inverted Duplications may be an underestimated mechanism of gene duplication • Gene duplicates reside close to the ancestral copy at inception and become physically distant with evolutionary age (further apart on same chromosome or located on different chromosomes) •  Data more compatible with a model of enhanced survival of duplicates distant from the ancestral locus at birth

More Related