170 likes | 288 Views
SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data. Motivation. Chromosomal changes cause genetic diseases aneusomies Easy to detect Copy number changes of genes Not so easy. Array CGH .
E N D
SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data
Motivation • Chromosomal changes cause genetic diseases • aneusomies • Easy to detect • Copy number changes of genes • Not so easy
Array CGH • Comparative Genome Hybridization CGH to DNA microarrays • Method for detecting copy number changes • Data analyzed using thresholds • Not reliable to detect single-copy gains or losses when using large insert clones as probes • High false positives and false negatives • Inconsistent for probes of different chromosomal regions • Cannot be used for clinical diagnostic applications!
Data Adjustment • Normalization and Correction • Reason: variations between probes • Control vs. control data ratio • Find mean and SD • Divide control vs. test ratios by that mean
Threshold method • Compare each data from control vs. test experiment to threshold values • Below 0.8=deletion • Above 1.2=polysomy
SW-ARRAY • Smith-Waterman algorithm adapted for Array CGH • New way to analyze Array CGH data • Reason: • Log ratio data is contiguous one-dimensional series, where locations of high values may indicate polysomic regions, low deletions
SW-ARRAY • Step 1: • Remove outlying probes • Log intensity ratio more than 2.5 MAD away from median of other probes in array • MAD=Mean Absolute Deviation • Robust measure of Standard Deviation
SW-ARRAY • Step 2: • Log ratio data - t0 • Ensures that the mean of adjusted data is negative • t0=median + 0.2 x MAD
SW-ARRAY • Step 3: • Search for high-scoring islands • Definition • locally high-scoring segment-a positive scoring segment whose score cannot be increased by shrinking or expanding segment boundaries
SW-ARRAY T(p,q)=score of segment X(i)=score for the pth probe ordered along genome
SW-ARRAY S(p)=score of island ending at p B(p)=beginning point of the island S(0)=0 P>0
SW-ARRAY • Iterate through locations along gene probes • Search where scores>0 • Find max-scoring island • Record data • Set island=0 • Find next max-scoring island
SW-ARRAY • Statistical Significance • In 1000 runs with permuted log ratios for each probe • find frequency of highest scoring island in each run
Experiment • Test Group • DNA from subjects with well-characterized monosomies • Control groups • Data analyzed using 2 methods • Threshold • SW-ARRAY
Experiment Results • Threshold Method • 78.1% correct identification of copy-number changes • SW-ARRAY • Identified 13/14 of the monosomic regions with high significance levels in the 14 blind tests
Ideal Conditions for SW-ARRAY • numerious probes border region of copy number change • long sequences for which edge effects are minimized