660 likes | 883 Views
Haplotype Trees. Using The Evolutionary History of Small DNA Regions To Investigate Common Diseases. Replication. Coalesence. Unrooted Haplotype Tree. Statistical Vs. Maximum Parsimony. A = AGCT B = TGCT C = TACT D = AAGG. e 3. The Apo-protein E Haplotype Tree. 21. 14. 30. 1522.
E N D
Haplotype Trees Using The Evolutionary History of Small DNA Regions To Investigate Common Diseases
Replication Coalesence
Statistical Vs. Maximum Parsimony A = AGCT B = TGCT C = TACT D = AAGG
e3 The Apo-protein E Haplotype Tree 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 e4 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 e2 22
What Use Are Haplotype Trees? • Provides an Interpretive Framework When Integrated With Other Analyses • Evolutionary History Generates Hypotheses About Current Significance • Provides a Powerful Tool For Detecting Current Genotype-Phenotype Associations
A Haplotype Tree Can Provide an Interpretive Framework When Integrated With Other Analyses
Hamon and Sing estimated interactions for all 53 pairs of ApoE sites for lnApoE variability in North Karelia, Females 8.0 560- 1163** 560-832** 6.0 560-2440** R2 X 100 4.0 832-1163** 2.0 3937-4075 0.0 g0624 g1163 g0624 g1998 g0560 g2440 g3937 g4951 g2440 g5361 g1998 g3937 g1163 g5361 g1998 g4075 g0560 g3106 g1163 g4075 g4075 g5361 g1998 g2907 g2440 g2907 g0832 g5361 g1522 g2440 g0624 g3937 g0832 g2907 g0624 g0832 g1163 g3937 g0560 g4075 g0832 g3106 g1163 g1522 g1163 g3106 g1998 g4951 g0560 g1998 g1998 g2440 g3937 g5361 g1998 g5361 g1163 g4951 g0624 g5361 g0624 g2440 g0832 g1163 g0560 g0624 g2440 g4075 g0832 g4075 g2440 g4951 g3937 g4075 g0560 g5361 g0832 g4951 g1163 g2440 g2907 g3937 g0832 g3937 g1163 g1998 g0832 g1522 g0624 g4075 g2440 g3106 g3106 g5361 g0832 g1998 g0832 g2440 g2440 g3937 g0560 g3937 g0560 g0832 g0560 g1163
21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 3701 73 560 4951 29 Parallel Mutations At Site 560 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 Sites Identified By Hamon and Sing That “Interact” With Site 560 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
21 14 Evolutionary Hypothesis: Two Functional Mutations (Occurring On A Specific Haplotype Background) Have Created Three Allelic Clades For the Phenotype Of ln(ApoE); the Red, Blue and Black Clades 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 3701 73 560 4951 29 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
The Red Clade Is Uniquely Defined By These Two Sites 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 The Blue Clade Is Uniquely Defined By These Two Sites 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
The Red Clade Is Not Uniquely Defined By These Two Sites Due to Homoplasy 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
21 14 Sites 560 and 624 Fall into an Alu Repeat 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 The Apo-protein E Haplotype Tree 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
1 Kb Exon 1 Exon 2 Exon 3 Exon 4 73 832 308 471 545 624 560 3106 1163 1522 1575 1998 2440 2907 3673 3937 4036 4075 4951 5361 5229a 5229b Single SNP Analysis of lnApoE in North Karelia, females * * * Indicates a significant single site effect
The Single SNP Analysis Identifies Sites With A Weaker Phenotypic Association Because It Cannot Deal With Homoplasy At Site 560 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
The Single SNP Analysis Identifies Sites With A Weaker Phenotypic Association Because It Cannot Deal With Homoplasy At Site 560 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 There is a deliberate attempt To find SNPs that are Polymorphic in most or all Populations and that have High heterozygosities; that is, SNPs just like the one at Site 560. 10 560 16 624 624 4951 24 9 560 1575 22
Linkage Disequilibrium Is Frequently Used in Association Studies, But Also Is Frequently Misinterpreted.Haplotype Trees Can Aid In Understanding The Proper Biological Interpretation
ApoE Gene Stengård et al. (1996) showed the amino acid replacement alleles at ApoE have a major impact on mortality due to CAD in a longitudinal study.
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region These Two Sites Are in Disequilibrium
The Apo-protein E Haplotype Tree 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 1575 22
The Apo-protein E Haplotype Tree These haplotypes Are T at Site 832 & C At Site 3937 21 14 30 1522 1575 5361 2907 26 624 17 20 18 624 1 4 560 29 3701 73 560 4951 832 11 23 28 19 624 545 4036 5361 471 1163 3937 1998 1998 5361 2 2440 832 624 25 15 7 6 5 12 560 560 3 8 560 3106 5229B 4951 308 31 13 27 3673 4075 10 560 16 624 624 4951 24 9 560 These haplotypes Are G at Site 832 & T At Site 3937 1575 22
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region Site 3937 Is An Amino Acid Polymorphism That Affects ApoE Function and CAD
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region Suppose Only This Portion Was Sequenced Site 3937 Is An Amino Acid Polymorphism That Affects ApoE Function and CAD
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region Suppose Only This Portion Was Sequenced Site 832 Would Appear to Have The Strongest Association with ApoE Function and CAD Site 3937 Is An Amino Acid Polymorphism That Affects ApoE Function and CAD
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region Suppose Only This Portion Was Sequenced Site 832 Would Have The Strongest Association with ApoE Function and CAD
0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 E E E E x x x x o o o o n n n n 1 2 3 4 7 3 4 5 5 6 8 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 3 0 7 4 6 2 3 1 5 5 9 4 9 1 6 7 9 0 0 9 2 2 3 8 1 5 0 4 2 6 2 7 9 4 0 0 7 0 3 3 7 5 2 2 6 3 2 5 8 0 7 6 3 7 6 5 1 9 9 1 1* B A Apoprotein E Gene Region Would you infer From this Association That the Marker Closest to the Functional Site Was Here? Suppose Only This Portion Was Sequenced Site 832 Would Have The Strongest Association with ApoE Function and CAD
Haplotype Trees Estimate an Evolutionary History That Can Generate Hypotheses About The Current Significance of Genetic Variation
Detecting Recombinantion Events in LPL a=3, b=5, k=3, p =0.0179, crossover between sites 13 and 29. 1 10 20 30 40 50 60 69 2JNR CAGTTTCCCT CAGCACGATC GCAATTGCAC CTCAATGTAT AGTTGTAACC GAGTCCGCAT AACTATAGG 5NR CAGTTTATCT CACCACGATA GCAATTGCAC CTCAATGTAT AGTTGTAACC GAGTCCGCAT AACTATAGG Node a CAGTTTATCT CACCACGATC GCAATTGCTC TTTAATGTAT AGTTGTAACC GAATCAGCAT AACTATAGG a=2, b=7, k=2, p =0.0278, crossover between sites 16 and 19. Node d CAGTTTATCT CACCACGATC GCAACTGCTC TTTAATGTAT AGTTGTAACC GAATCAGCAT AACTATAGG 11J CAGTATATCT CACCATGATC GCAACTGCTC TTTAATGTAT AGTTGTAACC GAATCAGCAT AACTATAGG Node e CAGTATATCT CACCATGAGC GCAATTGCAC TTTAA?GTAT AGTTGTAACC GAATCAGCAT CACTGGAGA 11J CAGTATATCT CACCATGATC GCAACTGCTC TTTAATGTAT AGTTGTAACC GAATCAGCAT AACTATAGG Node e CAGTATATCT CACCATGAGC GCAATTGCAC TTTAA?GTAT AGTTGTAACC GAATCAGCAT CACTGGAGA T-1 CAGTTTATCT CACCACGAGC GCAATTGCAC TTTAA?GTAT AGTTGTAACC GAATCAGCAT CACTGGAGA
Positive (Diversifying) Selection or Subdivision Positive (Directional) Selection or Bottleneck Neutral Genetic Drift, Expanding Population Size Neutral Genetic Drift, Stable Population Size Negative Selection
Evolutionary Inferences On LPL • 5’ End Subject to Directional Selection, With A Selective Sweep Enhanced By Recombination • 3’ End Subject to Diversifying Selection • Implies That Most Current Polymorphisms With Functional Significance Are In 3’ End
Haplotype Trees Provide a Powerful Tool For Detecting Current Genotype-Phenotype Associations • Nested Clade Analysis • Tree Scanning
Nested Clade Analysis • In 1987 Published The Nested Clade Method For Using A Haplotype Tree As A Tool For Discovering Gene/Phenotype Associations • Nests The Haplotypes in Tree Into Evolutionary Clades (Branches) • The Resulting Nested Design Provides Asymptotic Independence And A Priori Contrasts For Detecting Phenotypic Associations.
The Drosophila Adh Haplotype Tree 1-6 1-11 1-10 1-7 1-1 1-2 1-9 1-3 1-5 1-8 1-4
The Drosophila Adh Haplotype Tree 2-5 2-3 2-1 2-4 2-2
Results of Nested Analysis of Variance of Adh Activity Using The Adh Haplotype Tree ** *** ** ** *** Significant 0.1% Level ** Significant at 1% Level
Functional Allelic Categories from the Nested Analysis of Variance of Adh Activity ** *** ** **
Phenotypic Distributions Identified Though Nested Clade Analysis
Nested Clade Analyses • Greater Statistical Power By Focusing On Fewer Comparisons • Greater Biological Power In Detecting Mutations With Phenotypic Effects • Deals With High Levels of Genetic Variation Through Pooling Into Clades • Deals With Linkage Disequilibrium Through Haplotypes And Tree Branches • Useful In Ultimately Identifying Causative Mutations
Nested Clade Analyses • Although Nesting Is Common In Statistics and Evolutionary Biology, It Is Unfamiliar and Daunting To Others • The Analysis Finds Phenotypic Associations With Haplotypes or Groups of Haplotypes: Does Not Deal Directly With Dominance Effects Or Genotypes. • Is Inherently A Single Locus (Or Smaller) Analysis: Does Not Deal Directly With Epistasis
Tree Scanning A New Method for Using Haplotype Trees At Candidate Loci To Investigate Genotype-Phenotype Associations.