270 likes | 434 Views
Models of Molecular Evolution II. Level 3 Molecular Evolution and Bioinformatics Jim Provan. Page and Holmes: Sections 7.3 – 7.4. Isochore structure of vertebrate genomes.
E N D
Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4
Isochore structure of vertebrate genomes • Why do patterns of base composition – the frequencies of the four bases and of codons used to specify amino acids – differ between genomes? • Mean G + C content in bacteria ranges from 25% to 75%, but there is little intragenome variation • Genomes of vertebrates have a much greater range of G + C values: • Caused by continuous sections (> 300kb) each of which has a uniform G + C content (isochores) • G + C content of isochores also varies between species
G + C rich isochores Correlate with reverse Giesma (R) bands Early replicating High density of genes SINEs present CpG islands in genes High G + C content at third codon position High frequency of retroviral sequences High frequency of chiasmata A + T rich isochores Correlate with Giesma (G) bands Late replicating Low density of genes (only tissue specific) LINEs present No CpG islands High A + T content at third codon position Low frequency of retroviral sequences Low frequency of chiasmata Properties of vertebrate isochores
Theories on the existence of isochores • Selectionist hypothesis of Bernardi et al. suggests that GC-rich isochores predominantly found in warm-blooded vertebrates are an adaptation to higher body temperature: • Extra hydrogen bond in G-C pair may lessen possibility of thermal damage to DNA • Desert plants also have higher GC contents • Evidence for independent occurrence of isochores since birds and mammals do not share an immediate ancestor • However, some thermophilic bacteria are AT-rich
Theories on the existence of isochores • Neutralist explanation for the existence of isochores is that they simply reflect variation in the process of mutation across the genome • Studies on argininosuccinate synthetase processed pseudogenes from anthropoid primates: • Pseudogenes were derived from same functional ancestral gene but then inserted into different parts of the genome • Despite their common ancestry, they now differ in base composition • Because pseudogenes are not subject to selection, differences in base composition must have been due to regional variation in mutation patterns
Why should mutation patterns vary across genomes? • Replication hypothesis suggests that genes which replicate earlier in the cell cycle are more GC-rich than those which replicate later: • Believed to be due to the fact that G and C precursor pools of dNTPs are larger at this time – errors are more likely to incorporate G or C • Repair hypothesis is based on assumption that efficiency of DNA repair varies across genome: • May be an outcome of transcriptionally active areas being repaired more efficiently • CpG islands are maintained by a special repair system – efficiency of DNA replication may be dependent on location
Why should mutation patterns vary across genomes? • Recombination hypothesis claims that isochore structure of vertebrate genomes is the outcome of differences in the pattern and frequency of recombination: • Low GC localities will be associated with regions of reduced recombination: • Genes with low rates of recombination have low GC values • The large, non-recombining region of the Y-chromosome has a low GC composition • Fact that recombination plays such a large part in the structuring of eukaryote genomes makes this an attractive hypothesis • Although the relative contributions of these hypotheses are still unclear, the neutralist interpretation seems more likely
E. coli Human CGA CGC CGG CGU AGA AGG CUA CAC CUG CUU ARG UUA UUG LEU Codon usage
What determines codon usage? • Degeneracy of genetic code: • Null hypothesis is that all codons for a particular amino acid are used with equal frequency • Refuted when nucleotide sequences became available for a wide range of organisms • Selectionist argument: • Highly expressed genes show most codon bias because they require more translational efficiency: coevolution of tRNAs and codons • Also supports the neutralist prediction of a relationship between functional constraint and substitution rate
Strong selection for translational efficiency Weak selection for translational efficiency Restricted tRNAs used More tRNAs used Weak codon bias Strong codon bias High rate of synonymous substitution (many neutral mutations) Low rate of synonymous substitution (few neutral mutations) Gene expression and codon bias Highly expressed genes Lowly expressed genes
The molecular clock • Idea of a molecular clock is central to the neutralist theory, since it demonstrates the constancy of the underlying neutral mutation rate • Previous example of a-globin • Does not imply that all genes and proteins evolve at the same rate: • Great variation between proteins (fibrinonectins vs. histones) • Variation in rate among genes and proteins is compatible with the neutral theory if the underlying cause is changes in selective constraint • Key question concerning the validity of a molecular clock is whether rates of substitution are constant within genes across evolutionary time
Neutral theory and the molecular clock • Rate of nucleotide substitution (fixation) at any site per year, k, in a diploid population of size 2N is equal to the number of new mutations (neutral, deleterious or advantageous) arising per year, m, multiplied by their probability of fixation, u: k = 2N mu • For a neutral mutation, probability of fixation is reciprocal of population size: u = 1/2N • So substitution rate for a neutral mutation is: k = (2N )(1/2N )m
Neutral theory and the molecular clock (continued) • Parameters for population size (2N) cancel out, leaving: k = m • One of the most important formulae in molecular evolution – means that rate of substitution in neutral mutations is dependent only on underlying mutation rate and is independent of other factors such as population size • Also holds for mutants with a very weak selective advantage e.g. s < 1/2Ne
Substitution of selectively advantageous mutations • Probability of fixation is roughly twice the selection coefficient: u = 2sNe/N • Substituting this into the original equation, we get: k = 4Nesm • In this case, substitution rate for an advantageous mutation also depends on population size and magnitude of selective advantage • For natural selection to produce a molecular clock, it is necessary for Ne, s and m (combination of ecological, mutational and selective events) to be the same across evolutionary time – highly unlikely!
Constancy of the molecular clock • Neutral theory predicted a molecular clock and first protein sequence data appeared to confirm this: led Kimura to cite this as the best evidence for neutrality • As more comparative sequence data became available, particularly from mammals, examples of rate variation began to appear • Debate arose concerning the constancy of the molecular clock
Testing the molecular clock • Dispersion index R(t): test whether there is more rate variation between lineages than expected under a Poisson process: • If the data fit a Poisson process, variance in number of substitutions between lineages should be no greater than the mean number • If the data fit a Poisson process then R(t) = 1.0, if not then R(t) > 1.0 and the clock is said to be overdispersed • A star phylogeny should be used, since any phylogenetic structure will complicate the calculations (e.g. placental mammals)
Protein Haemoglobin a Haemoglobin b Myoglobin Cytochrome c Ribonuclease a-Crystallin Species (n) 6 6 6 4 4 6 Amino acids 141 146 153 104 123 175 R(t) 1.17 3.04 1.60 3.22 2.15 2.71 Testing the molecular clock • Mammalian protein data presented a serious problem for neutralists • Problems most likely due to inaccuracies in phylogenies: • “Outlier” in data was guinea pig • Guinea pig is much more divergent than previously thought
The relative rate test compares the difference between the numbers of substitutions between two closely related taxa in comparison with a third, more distantly related outgroup A B C X The relative rate test • If A and B have evolved according to a molecular clock, both should be equidistant from C • dAC = dBC • A and B must be closest relatives and C must not be too far removed
Old World monkey New World monkey Human 1 2 3 The relative rate test • Synonymous sites in nine nuclear genes (3520 bp): • d12 = 6.7 • d13 – d23 = 2.3 ± 0.6 • yh-globin pseudogene (1827 bp): • d12 = 7.9 • d13 – d23 = 1.5 ± 0.4 • Three introns (3376 bp): • d12 = 6.9 • d13 – d23 = 1.0 ± 0.5 • Two flanking regions (936 bp): • d12 = 7.9 • d13 – d23 = 3.1 ± 1.1
Lineage effects and the molecular clock • Substitution rate varies with underlying neutral mutation rate: k = m • Three ways for rates to vary between species: • Differences in generation time • Differences in metabolic rate • Differences in efficiency of DNA repair • These are known as lineage effects: neutralists believe that lineage effects alone can account for all variation in molecular clock • Selectionists believe that genes also show rate variation due to other, selection-driven factors (residue effects)
Time Generation time and the molecular clock
Generation time and the molecular clock • At the molecular level, generation time (g) can be defined as time it takes for germ-line DNA to replicate i.e. from one gamete to the next • Since most mutations occur at this point, rate of substitution under neutral theory is a function of both mutation rate and generation time: k = m/g • General conclusion from molecular data is that the clock is generation time dependent at silent sites and in non-coding DNA: • Silent rates in orang-utan, gorilla and chimp are 1.3-, 2.2- and 1.2-fold faster than in humans, which matches differences in generation times
The metabolic rate hypothesis • In sharks, rate of silent change is five- to sevenfold lower than in primates and ungulates which have similar generation times: • Led to the hypothesis that differences in molecular rate are a better explanation for differences in mutation rates than differences in generation time (metabolic rate hypothesis) • States that organisms with high metabolic rates have higher levels of DNA synthesis • Two pieces of mitochondrial DNA evidence support this: • Small bodied animals, which have higher metabolic rates, tend to have higher mutation rates • Warm-blooded animals also have higher mutation rates than cold-blooded animals
10 Rodents Dogs Horses Geese Primates Bears % sequence divergence per Myr 1 Tortoises Whales Newts Salmon Frogs Tortoises Sharks Sea turtles 0.1 0.01 0.1 1 10 100 1000 10,000 100,000 Body mass (kg) Relationship between body mass and sequence evolution
Direct damage Correctly repaired Incorrectly repaired Repair DNA Mutation Replication errors DNA repair and mutation
DNA repair and mutation • Repair mechanisms are extremely complex and there are many repair pathways • There is some evidence supporting the hypothesis that DNA repair influences mutation rate: • Evidence that highly transcribed genes are more efficiently repaired • Base composition and substitution rates at silent sites in mammalian genes tends to be gene- rather than species-specific: suggests that homologous genes are transcribed and repaired in a similar manner • Conversely, closely related species such as hominind primates, which share very similar repair mechanisms, can exhibit greatly differing substitution rates