150 likes | 352 Views
Detection of Structural Variants. Structural Variants (SVs). ► Variants that c hange the landscape of chromosomes ► Copy-number variations (CNVs) Microscopic Sub-microscopic deletions/insertions/duplications usu. > 1kb, unbalanced rearrangements large-scale CNVs ≥ 50 kb
E N D
Structural Variants (SVs) • ► Variants that change the landscape of chromosomes • ► Copy-number variations (CNVs) • Microscopic • Sub-microscopic • deletions/insertions/duplications usu. > 1kb, unbalanced rearrangements • large-scale CNVs ≥ 50 kb • ► Other structutral variants • Indels (insertions, deletions usu. ≤1kb) • Balanced rearrangements (inversions, translocations) • Segmental duplications • Repeats
SVs - Functional Impacts ► Wide-spread in human genomes ► Evolution, genetic diversity between individuals, genetic diseases ► More significant impact on phenotypic variation than SNPs ► Higher de novo locus-specific mutation rate than SNPs New mutation rate Zhang et al. Annual Reviews. 2009
Detecting Methods – Now & Then Low-throughput ► Microscopy Leujeune et al. Acad Sci 248 (1959) Down’s syndrome (trisomy 21’) ► In situ hybridization (ISH), Fluorescent labeled probes (FISH) ► Southern Blotting ► PCR-based methods High-throughput ► Comparative genome hybridization – array-CGH R. Redon et al., Nature 444, 444 (2006). DNA microarrays Detection of CNVs Resolution: low, > 50kb ► Fosmid paired-end sequencing (FPES) E. Tuzun et al., Nat. Genet. 37, 727 (2005) Low resolution: > 8kb Laborious High-throughput Next generation sequencing ► Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Detection of SVs of large size, SVs in complex genomic regions (segmental duplication-rich) ► Paired-end mapping (PEM) Korbel et al. Science 318 (2007) Detection of deletions < 1kb (< insert size) breakpoints in a small region
Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) Flow Chat 3kb-fragments -> Paired-ends mapping to the ref human genome -> Analysis for distribution -> Determination of cutoffs -> Detection of SVs
Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) Signatures used to detect SVs • Deletions: paired-ends spanning longer regions than a specified cutoff • Simple insertions: paired-ends spanning shorter regions than a cutoff • Mated insertions: sequences connected to a distal locus with paired ends • Unmated insertions: sequences connected to a distal locus with only one predicted breakpoint • Inversions: Paired-ends with different orientation
Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) Detected SVs SV size: simple insertions (2 - 3 kb); others (~3 kb or larger) Average breakpoint resolution: 644 bp (allowing validation by PCR)
Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) SV validation ► PCR (97%) ► Comparison with the Database of Genomic Variants (DGV) (60%) ► Comparison with alternative human genome assembly (“Celera assembly”) (12- 22%) ► Array-CGH (65%) ► Fiber-FISH ► One-pass PCR for breakpoint junctions
CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Pipeline Raw genome sequence data (.fastq files/short-gun sequencing) Mapping/alignement to the reference genome (.bam files/MAQ alignment) Filtering for reads of low mapping quality (.SAMtools)
CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Estimation of Read Depth (RD) ► RD = read count = number of mapped reads in nonoverlapping windows of 100bp ► Each read counted once, by the start position ► Adjustment of RD after the deviation in coverage for a given GC content (GC content influence sequence coverage) GC-corrected RD
CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Event Detection – Event-wise testing ► GC-corrected RD = quantitative measurement of genome copy number ► Deletion = decrease, duplication = increase in coverage (across multiple consecutive windows) ► Event-wise testing method as CNV-calling algorithm Based on significant testing Search for small events of statistical significance and cluster them into larger events
CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Event-wise testing - Algorithm ► Coversions Z-core (Zi) = (RDi – mean RD)/Standard deviation Upper-tail probability Pi(upper) = P(Z>Zi) Lower-tail probability Pi(lower) = P(Z<Zi) ► For an interval of consecutive windows A (with l windows) If Max{Pi(upper)/i Є A} < {FPR/(L/l)}1/l duplications If Max{Pi(lower)/i Є A} < {FPR/(L/l)}1/l deletions FPR = the nominal false positive rate desired for the entire chromosome L = no. of windows of a chromosome l = no. of windows in the interval A ► Search first with 2-window events, then increase the size of event by 1 ► Stop searching with N-1 when {FPR/(L/N)}1/N> 0.5
CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Event-wise testing - CNV calling results
CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Call Results - Filtering ► Merging of clusters of small events with copy number change in the same direction ► Filtering out events with median RD = 0.75 X – 1.25 X overall mean RD ► Significant testing (one-sided Z-test) of merged events. Significance filtering threshold 10-6 ► Increased stringency but reduced sensitivity Call Results - Simulations ► To test the FPRs and false negative rates of the EWT calls ► Pairwise comparison of RD among individuals distinguish polymorphic from monomorphic events T-test Polymorphic : t-test P-value < 0.001 & the abosulte difference between median read counts > 0.5 ► Simulation of the obtained data sets
Comparison between RD and PEM RD PEM Median size 1100 bp 414 bp less Simple repeats more more Segmental duplications less