190 likes | 477 Views
Jeff Bailey S5-432 . Analyzing Copy Number Variation in the Human Genome. Continuum of Genomic Variation. Forms of genetic variation. Nucleotide. Single base-pair changes Point mutations (1 per 800 bp) Small insertions/deletions Frameshift, microsatellite, minisatellite Mobile elements
E N D
Jeff Bailey S5-432 Analyzing Copy Number Variationin the Human Genome
Continuum of Genomic Variation Forms of genetic variation. Nucleotide • Single base-pair changes • Point mutations (1 per 800 bp) • Small insertions/deletions • Frameshift, microsatellite, minisatellite • Mobile elements • Retroelement insertions (300bp -10 kb) • Large-scale genomic copy number variation (>10 kb) • Large-scale Deletions • Segmental Duplications • Local Rearangements • Chromosomal variation • Translocation, inversion, fusion Copy Number Variation Structural Variants (SV) Cytogenetics
Gain Gain >green >red Loss METHOD 1: Copy Number Variation:Array Comparative Genomic Hybridization • Two genomic surveys of normal individuals identified 76 and 255 CNV regions by array CGH ( Sebat et al. Science 2004; Iafrate et al.Nat Genet 2004) • 30% CNVs overlap duplicated regions (variant SD = CNV) ( Sebat et al. Science 2004) (blue line) Modified:Feuk et al. Nat Rev Genet 2006
99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births) Segmental Duplications (SD) 5.4% of the genome (>90% identity and >1 kb) chr22 • Properties: • Clustered • Complex regions Bailey and Eichler (2006) Nat Rev Genet
SDs predispose to copy number variation I D D’ Cen Tel I D’ D Cen Non-allelic Homologous Recombination (Lupski, 1999) I I D’- D D D’ Cen Tel GAMETES D - D’ Cen Tel Change in Dosage Sensitive Genes → phenotype or disease Dynamic Regions – predisposed to further rearrangements
Complex disease associations 1) Recurrent germline rearrangements causing congenital disease 2) Rare CNVs causing disease in a small proportion of affected individuals in a Mendelian fashion 3) Common CNVs that are responsible for a proportion of complex genetic risk in many individuals
>48 kb Putative Deletion within fosmid < 32 kb Putative Insertion within fosmid Method 2: End-Sequence Pair (ESP) Analysis insert fosmid • ~1.1 million fosmid end-sequence pairs derived from a single donor (sequenced by MIT to help close gaps in the reference genome) • Fosmid insert size tightly distributed around mean (40 kb) • Compare fosmid optimal placements to detect deviations from expected. Fosmid: Concordant Insertion Deletion Inversions ReferenceGenome Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8X genome coverage) Results: Tuzun*, Bailey*, Sharp* et al. Nat. Genet 2005
Fosmid SV Project • Fosmid End Sequencing 8 HapMap Individuals • 1695 structural variants • 525 novel insertion sequences (Kidd et al. 2008 453:56) NAHR-non-allelic homologous recombination NHEJ-- repair of double strand breaks VNTR-- strand slippage Retrotransposition-- insertion of L1, SVA or Alu element
Method 3: Whole Genome Sequencing • Genome Resequencing Studies • SNPs: 3,2 M bases • Non-SNP: 9.1 M bases • 22% events, 74% variant bases (Levy et al Plos Biol 2007:e266) • Read Depth, Mismapping Pairs • Future: Perfect Whole Genome Assembly
Summary of Human Genome Copy Number Variation (12/2006) • 20% of the human genome is CNV? • 3000+ genes with exons in these regions CNV? (Currently 30% of genome and 9473 genes)
How many genes are truly CNV? • Lack of Breakpoint Precision? • BACs: 150-250 kb clones of which only a part of the sequence may be CNV • False positives? • Multiple studies: Increasethe proportion of false positives since true positivestend to overlap BAC gene CNV FP TP Study#1 #2 #3
Design of Custom oligonucleotide aCGH • Equal number of probes per exon (exon size 3 bp – 10 Kb). • Limitation: NimbleGen algorithm creates equally spaced probes across a region. 1 2 3 Select genomic regions to target for probe design Merge overlapping regions Select oligonucleotide probe sequences (average 12/exon) and place on microarray Bailey et al. Cytogenet Genome Res 2008
Mean intensity difference -0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD Step #1: Seed Step #2: Extension -0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD 4-exon Partial-gene CNV Detection Method ExonStructure Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Probe Regions Hybridization Log2 probe intensity Bailey et al. Cytogenet Genome Res 2008
CNV in RHD 25 Chr 1 (kb) 25,390 25,350 25,370 Gene Model Exons Probe Regions GM12878 GM18517 GM18507 GM18956 GM19129 GM12156 GM18502 GM19240 GM18555 Segmental Duplications Bailey et al. Cytogenet Genome Res 2008
Detecting >500 bp and >5% freq 8,599 CNV regions: 3.7% of genome (112.7 Mb)2 genomes: 1,098 CNVs 0.78% (24 Mb) Conrad, et al. 2009 Nature
Causal CNVs Conrad, et al. 2009 Nature
Infectious Disease Genetics Human Genome Pathogen Genome • Complex interplay that results in infectious disease phenotype • Potential host defense responses and pathogen virulence are encode in respective genomes. • SD and CNV represent key mechanisms for adaptation and diversification of responses for both host and pathogen. • The study of SD and CNV is necessary to fully understand the genetics and biology of infectious disease pathogenesis. Environment Vector Genome
Human CNV typing and association studies • Comprehensive CNV Typing Chip (1st generation) • Collaboration with the Eichler Lab • Preferentially targeting gene CNVs (5,000 CNVs → 1000 genic regions → 30% host defense) • Agilent and NimbleGen oligoarray platforms • Defining copy number responsive probes • Defining copy specific probes to remove cross-hybridization • Case-control studies to examine infectious disease and immune phenotypes for association with CNVs
Human Malaria • Malaria: 2-3 million deaths per year • “strongest known force for evolutionary selection in the recent history of the human genome” (Kwitkowski 2005 Am J Hum Genet) • HbS, HbC, HbE, thalassemia, ABO, Duffy null, SE Asian ovalocytosis, IL-4, CR1, HLA-DRB ... • Hypothesis: Strong selection will have impacted CNVs • Testing case-control samples for CNV associations with resistance to infection and cerebral malaria.