620 likes | 904 Views
THE HUMAN GENOME SERIES. MAMMALIAN GENES. I. Conservation and Slow Evolution (today). II. Functional Innovation and Rapid Change (Feb 10). Your genome!. Feb 3. Feb 10. S L O W. F A S T. Questions. Are we ‘just’ E. coli, except more so? Where do new genes come from?
E N D
THE HUMAN GENOME SERIES MAMMALIAN GENES I. Conservation and Slow Evolution (today) II. Functional Innovation and Rapid Change (Feb 10)
Feb 3 Feb 10 S L O W F A S T
Questions • Are we ‘just’ E. coli, except more so? • Where do new genes come from? • Do all genes evolve at the same rate? • Do all tissues & organs evolve at the same rate? • Where do we fit in the tree of life? • What specifies the differences between us and rodents, or us and chimps? • What specifies the elevated complexity of us versus other animals? • Can we understand sequence variation among humans? • How can gene function contribute to behaviour?
Theodosius Dobzhansky (1900-1975) “Nothing in Biology makes sense except in the light of Evolution”
Are we ‘just’ E. coli, except more so? "Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant" Jacque Monod (1972) 1965 Nobel laureate
"Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant ?" Genes 5.4k ~ 30k
Mode of Protein Evolution • De novo creation • Gene fusion / fission • Gene duplication • Rapid sequence change • Pseudogenisation
3000 Mya Archaea 1000 Mya Invertebrates 75 Mya Rodents 5 Mya Chimpanzee Genomes and Timelines wrt 1000 Mya 100 Mya 10 Mya 1 Mya
THE ORIGIN AND EVOLUTION OF MODEL ORGANISMS Hedges, SB Nature Reviews Genetics3, 838 -849 (2002)
Assembly DNA Repeats Gene Prediction Genome Comparison Gene Comparison Sequencing
Gene Number • Walter Gilbert [1980s] 100k • Antequera & Bird [1993] 70-80k • John Quackenbush et al. (TIGR) [2000] 120k • Ewing & Green [2000] 30k • Tetraodon analysis [2001] 35k • Human Genome Project (public) [2001] ~ 31k • Human Genome Project (Celera) [2001] 24-40k • Mouse Genome Project (public) [2002] 25k -30k • Lee Rowen [2003] 25,947
“Revealed: the secret of human behaviour. Environment, not genes, key to our acts” “We simply do not have enough genes for this idea of biological determinism to be right. The wonderful diversity of the human species is not hard-wired in our genetic code. Our environments are critical.” J Craig Venter February 10, 2001
Complexity? • Is ‘culture’ proportional to population size? • Is the complexity of the WWW proportional to its size? • Combinatorial argument • Genetic interactions; alternative splicing; non-genic regulation; post-transcriptional & post-translational modifications
Architecture numbers in 4 eukaryotic proteomes Data generated using SMART Complexity of Protein Sequences
Orthologues and Paralogues Cenancestor SP1 SP2 DP2 A1 B1 C1 C2 C1 and C2 are paralogues A1 and B1 and (C1 and C2) are orthologues
Only 1,195 human genes were found that had single orthologues in worm and fly. Approx 95% of human genes do not have obvious orthologues in fly and worm Data from Rich Copley and Peer Bork
Drosophila Human 220 119 C. elegans 12 Extracellular signalling proteins are among the most different between animals
Antifreeze protein type III from Antarctic eel pout (Lycodichthys dearborni) Few sequence- based findings. For example … [359 residues]
Are we polyploid? Richard Copley
Segmental Duplication in the Human Genome Bailey et al. Science. 2002 297: 1003-7. Am J Hum Genet. 2003 73: 823-34
Horizontal Gene Transfer? • The claim: “113 of these genes are widespread among bacteria, but, among eukaryotes, appear to be present only in vertebrates. These genes [may have] entered the vertebrate (or prevertebrate) lineage by horizontal transfer from bacteria.”
The coral Acropora millepora shares a surprisingly large number of genes with vertebrates. Curr Biol. 2003 Dec 16; 13(24): 2190-5. Stanhope et al. Nature 2001 Jun 21; 411(6840): 940-4. “Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates.” Gene loss is a powerful force in shaping gene repertoire. "Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant“ ?
‘New Domains’ 23 of 94 InterPro families: Defense and Immunity e.g. IL, interferons, defensins 17 of 94 InterPro families: Peripheral nervous system e.g.Leptin, prion, ependymin 4 of 94 InterPro families: Bone and cartilage GLA, LINK, Calcitonin, osteopontin 3 of 94 InterPro families: Lactation Caseins (a, b, k), somatotropin 2 of 94 InterPro families: Vascular homeostasis Natriuretic peptide, endothelin 5 of 94 InterPro families: Dietary homeostasis Glucagon, bombesin, colipase, gastrin, IlGF-BP 18 of 94 InterPro families: Other plasma factors Uteroglobin, FN2, RNase A, GM-CSF etc.
Pseudogenes • Two types: processed and non-processed • 70% processed vs 30% non-processed • ~ 20,000 Torrents et al. Genome Res. 2003 13: 2559-67.
SNPs • Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. • They occur with an average density of 1/1000 nucleotides of a genotype • Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that are believed to have the highest impact on phenotype. • Ditto for SNPs in regulatory regions. Synonymous change: TTA (Leu) → TTG(Leu) Non-synonymous change: TTA (Leu) → TTT (Phe)
What’s the difference between a mutation and a polymorphism? Frequency! A frequency value of 1% of the polymorphic allele is usually taken as a threshold between mutation and polymorphism.
An example of a polymorphic variant which disrupts a critical disulphide bond. Although this variant (260 Cys→Tyr) in HLA-H protein is strongly associated with hereditary haemochromatosis, its frequency is as high as 6% in Northern Europeans with up to 14% in Ireland. from Sunyaev et al. HMG 2001, Vol. 10, No. 6 591-597
Questions • Are we ‘just’ E. coli, except more so? NO. • Where do new genes come from? • Do all genes evolve at the same rate? • Do all tissues & organs evolve at the same rate? • Where do we fit in the tree of life? • What specifies the differences between us and rodents, or us and chimps? • What specifies the elevated complexity of us versus other animals? • Can we understand sequence variation among humans? • How can gene function contribute to behaviour?
Comparative Genomics:Humans vs Rodents Human and mouse c-kit mutations show similar phenotypes. The utility of mouse as a biomedical model for human disease is enhanced when mutations in orthologous genes give similar phenotypes in both organisms. In a visually striking example of this, the same pattern of hypopigmentation is seen in (a) a patient with the piebald trait and (b) a mouse with dominant spotting, both resulting from heterozygous mutations of the c-kit proto-oncogene.
Rodents as models for human disease • All but a handful of human genes have orthologous counterparts in the mouse and rat genomes. • In general, disease genes are not under different selective constraints relative to all other genes. • Rodents are good model organisms for human disease
Mouse equivalents of human disease variants Hs normal: MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI Hs variant: MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI Mm normal: MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL
Equivalent disease variants? • 23 human disease-associated sequence variants whose variant amino acids are normal in the mouse. Including: • Breast Cancer (BRCA1 and BRCA2) • Cystic Fibrosis (CFTR) • Type 2D LGMD (SGCA) • Becker Muscular Dystrophy (DMD) • These variants are unlikely to be of value in understanding human disease.
Mouse vs Human • Do all genes evolve at the same rate? • Do all tissues & organs evolve at the • same rate? • Where do we fit in the tree of life? • What specifies the differences between • us and rodents?
More organisms … more comparisons …
~ 1000 more genes identified… Guigó,R. et al. PNAS (2003) 100, 1140-1145
Sequence conservation Figure 25. Sequence conservation between mouse and human genes Mouse genome paper Nature 420, 520-562
Slow Evolution The human spermidine synthase gene (SRM) and its mouse orthologue (Srm). The fifth exon in the mouse gene (green) is interrupted by an intron in the human orthologue.
Orthologues and Paralogues Cenancestor SP1 SP2 DP2 A1 B1 C1 C2 C1 and C2 are paralogues A1 and B1 and (C1 and C2) are orthologues
Human and mouse “local synteny” “Syntenic” regions contain orthologues!
How do we link genomes & genes to evolution? • Do all genes evolve at the same rate? • Do all tissues & organs evolve at the • same rate? • Where do we fit in the tree of life? • What specifies the differences between • us and rodents?
Little selection at cSNP sites Significant selection at functional sites Mouse-Human Orthologues % Identity • sites not in domains: 64.4% • cSNP sites: 67.1% • all sites: 70.1% • sites in domains: 88.9% • disease sites: 90.3%
Thomas et al., Nature 424, 788 - 793 A model of neutral evolution • KS – the number of synonymous substitutions per synonymous site • takes advantage of the redundant genetic code • 4D sites GCx (ALA), CCx (PRO), TCx (SER), ACx (THR), CGx (ARG), GGx (GLY), CTx (LEU), GTx (VAL) • “how much would a gene have changed if selection had not acted upon it?”
Neutral rates vary see also Hardison et al. Genome Res. 2003 13: 13-26.