• 560 likes • 803 Views
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium. Goals of this segment. Briefly summarize HapMap design and current status
E N D
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium
Goals of this segment • Briefly summarize HapMap design and current status • Discuss the application of HapMap to all aspects of association study design, analysis and interpretation
HapMap Project A freely-available public resource to increase the power and efficiency of genetic association studies to medical traits High-density SNP genotyping across the genome provides information about • SNP validation, frequency, assay conditions • correlation structure of alleles in the genome All data is freely available on the web for application in study design and analyses as researchers see fit
HapMap Samples • 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI) • 90 individuals (30 trios) of European descent from Utah (CEU) • 45 Han Chinese individuals from Beijing (CHB) • 45 Japanese individuals from Tokyo (JPT)
HapMap progress • PHASE I – completed, described in Nature paper • * 1,000,000 SNPs successfully typed in all 270 HapMap samples • * ENCODE variation reference resource available • PHASE II –data generation complete, data released this past Monday • * >3,500,000 SNPs typed in total !!!
ENCODE-HAPMAP variation project • Ten “typical” 500kb regions • 48 samples sequenced • All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples • Current data set – 1 SNP every 279 bp A much more complete variation resource by which the genome-wide map can evaluated
Completeness of dbSNP Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP
Recombination hotspots are widespreadand account for LD structure 7q21
Utility of LD in association study • “If I’m a causal variant, what is relevant to my detection in association studies is how well correlated I am with one of the SNPs or haplotypes examined in the study.”
Coverage of Phase II HapMap(estimated from ENCODE data) Panel %r2 > 0.8 max r2 YRI 81 0.90 CEU 94 0.97 CHB+JPT 94 0.97 From Table 6 – “A Haplotype Map of the Human Genome”, Nature
Coverage of Phase II HapMap(estimated from ENCODE data) Panel %r2 > 0.8 max r2 YRI 81 0.90 CEU 94 0.97 CHB+JPT 94 0.97 Percentage of deeply ascertained common variants highly correlated with a HapMap SNP From Table 6 – “A Haplotype Map of the Human Genome”, Nature
Coverage of Phase II HapMap(estimated from ENCODE data) Panel %r2 > 0.8 max r2 YRI 81 0.90 CEU 94 0.97 CHB+JPT 94 0.97 Average maximum correlation between a deeply ascertained variant and a neighboring HapMap SNP From Table 6 – “A Haplotype Map of the Human Genome”, Nature
Coverage of Phase II HapMap(estimated from ENCODE data) Panel %r2 > 0.8 max r2 YRI 81% 0.90 CEU 94% 0.97 CHB+JPT 94% 0.97 Vast majority of common variation (MAF > .05) captured by Phase II HapMap
Applying the HapMap • Study design - tagging • Study coverage evaluation • Study analysis - improving association testing • Study interpretation • Comparison of multiple studies • Connection to genes/genomic features • Integration with expression and other functional data • Other uses of HapMap data • Admixture, LOH, selection
Tagging from HapMap • Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 G G A A G G G T T G G A C C C C C C C C C C C C A A A A T T G G G C C C high r2 high r2 high r2 Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 After Carlson et al. (2004) AJHG 74:106
Pairwise Tagging Efficiency Tag SNPs were picked to capture common SNPs in release 16c.1 for every 7,000 SNP bin using Haploview. Tagging Phase I HapMap offers 2-5x gains in efficiency
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 A A G G G G G T T G G A A C C C C C C C C C C C C C C C A A T T A A G G G C C C Use of haplotypes can improve genotyping efficiency Tags: SNP 1 SNP 3 2 in total Test for association: SNP 1 captures 1+2 SNP 3 captures 3+5 “AG” haplotype captures SNP 4+6 Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 tags in multi-marker test should be conditional on significance of LD in order to avoid overfitting
Efficiency and power tag SNPs ~300,000 tag SNPs needed to cover common variation in whole genome in CEU Relative power (%) random SNPs Average marker density (per kb) P.I.W. de Bakker et al. (2005) Nat Genet Advance Online Publication 23 Oct 2005
How to pick tag SNPs? • What is the genetic hypothesis? Which variants do you want to test for a role in disease? • functional annotation (coding SNPs) • allele frequency (HapMap ascertainment) • previously implicated associations • Go to http://www.hapmap.org – DCC supported interactive tagging • Export HapMap data into tools such as Tagger, Haploview (www.broad.mit.edu/mpg)
Will tag SNPs picked from HapMap apply to other population samples? CEU CEU CEU Utah residents with European ancestry(CEPH) Whites from Los Angeles, CA Botnia, Finland Population differences add very little inefficiency Platform presentation: Paul de Bakker (#223: Sat 9.30)
Applying the HapMap • Study design - tagging • Study coverage evaluation • Study analysis - improving association testing • Study interpretation • Comparison of multiple studies • Connection to genes/genomic features • Integration with expression and other functional data • Other uses of HapMap data • Admixture, LOH, selection
Genome-wide association coverage • If genome-wide products are typed on the HapMap sample panel, the SNPs on HapMap not included in the panel provide an evaluation for the coverage of the product • ENCODE (deep ascertainment) • Phase II (dense, genome-wide)
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 A A G G G G G T T G A G A C C C C C C C C C C C C C T T A A A A G C G G C C C C Association tests with fixed markers Tests of association: SNP 1 SNP 3 = SNP on whole-genome product (~1 - 5% common variation directly assayed)
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 G G A A G G G T T A G G A C C C C C C C C C C C C C T T A A A A G C G G C C C C high r2 high r2 Association tests with fixed markers Tests of association: SNP 1 SNP 3
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 A A G G G T T A G A C C C C C C C C C C C C T T A A G C G C C C high r2 high r2 Association tests with fixed markers Tests of association: SNP 1 SNP 3 SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5
Genome-wide products can capture most common variation Example: 500K data generated by Affymetrix and recently submitted to HapMap DCC
More on this topic • Platform presentations tomorrow morning 8 AM sharp: • Peer • Jorgenson • Lazarus • As well as several detailed posters!
Applying the HapMap • Study design - tagging • Study coverage evaluation • Study analysis - improving association testing • Study interpretation • Comparison of multiple studies • Connection to genes/genomic features • Integration with expression and other functional data • Other uses of HapMap data • Admixture, LOH, selection
Can incorporating tests of haplotypes of SNPs on the genome-wide product improve this coverage?
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 G G A A G T T G A A C C C C C C C C C C C C A A T T G C G C C C Improving association power using data from HapMap Tests of association: SNP 1 SNP 3 SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 G G A A G T T G A A C C C C C C C C C C C C A A T T G C G C C C Improving association power using data from HapMap Tests of association: SNP 1 SNP 3 SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5
G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 A A G G G T G A C C C C C C C C A A T T G G C C Improving association power using data from HapMap Tests of association: SNP 1 SNP 3 “AG haplotype” SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5 SNP 4 SNP 6
Applying the HapMap • Study design - tagging • Study coverage evaluation • Study analysis - improving association testing • Study interpretation • Connection to genes/genomic features • Comparison of multiple association studies • Integration with expression and other functional data • Other uses of HapMap data • Admixture, LOH, selection
Integration with genomic features • Positive association to a SNP on HapMap enables detailed interpretation: • How many other SNPs are in LD with this SNP? • What genes are in LD with this SNP? • What coding variants and putative functional variants are in LD with this SNP? Potential to improve power by modifying Bayesian priors of each association test based on this information
Example: Complement Factor H - AMD • Original SNP hit in Affy 100K experiment – rs380390 • Extent and structure of LD from HapMap aids in the fine mapping phase of project Klein et al Science 2005
Example: Complement Factor H - AMD rs380390
Example: Complement Factor H - AMD rs380390
Meta-analysis of association studies • When different marker sets are used to study association (candidate gene or genome-wide), results can be readily integrated when all markers are typed on HapMap samples
Example: DTNBP1 and schizophrenia • Multiple studies have described modest association to schizophrenia • Most studies have examined small numbers of non-overlapping sets of SNPs • HapMap data can be used to determine whether these association finding Derek Morris, Mousumi Mutsuddi (WCPG meeting)
Extensive LD across DTNBP1 Phase II HapMap - 186 SNPs 180 kb
2 3 4 5 7 10 AGGCCA AAGCCT AGGCCT AGGCCA AGATTA GGATCA 4 (GA), 5 (CT) 10 (AT) 7(CT) 2 (AG) 3 (GA) Phylogeny of DTNBP1 tag SNPs Ancestral haplotype 6% 33% 42% 8% 11%
Tag SNPs 2 3 4 5 7 10 AGGCCA AAGCCT AGGCCT AGGCCA AGATTA GGATCA Associated alleles reported Straub 2002 Van den Oord 2003
Tag SNPs 2 3 4 5 7 10 AGGCCA AAGCCT AGGCCT AGGCCA AGATTA GGATCA Associated alleles reported Straub 2002 Van den Oord 2003 Schwab 2003
Tag SNPs 2 3 4 5 7 10 AGGCCA AAGCCT AGGCCT AGGCCA AGATTA GGATCA Associated alleles reported Straub 2002 Van den Oord 2003 Van den Bogaert 2003 Funke 2004 Schwab 2003
Tag SNPs 2 3 4 5 7 10 AGGCCA AAGCCT AGGCCT AGGCCA AGATTA GGATCA Associated alleles reported Straub 2002 Van den Oord 2003 Williams 2004 Bray 2005 Van den Bogaert 2003 Funke 2004 Schwab 2003
Tag SNPs 2 3 4 5 7 10 AGGCCA AAGCCT AGGCCT AGGCCA AGATTA GGATCA Associated alleles reported Kirov 2004 Straub 2002 Van den Oord 2003 Williams 2004 Bray 2005 Van den Bogaert 2003 Funke 2004 Schwab 2003