320 likes | 521 Views
CRIBI Biotechnology Centre Università di Padova, Italy. Genomic Research University of Padova. Romualdi Chiara Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration NETTAB 2003 Workshop
E N D
CRIBI Biotechnology Centre Università di Padova, Italy Genomic Research University of Padova Romualdi Chiara Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration NETTAB 2003 Workshop Bioinformatics for the management, analysis and interpretation of microarray data
Microarray variability • Inter - experiment variability Gene probes deposited in replicates Replicates are deposited in different region of the chip • Intra - experiment variability • Swap of Cy3 and Cy5 • Replicate of the experiment • Hybridisation, labelling, amplification … variability • Global, local and surface normalization
SCAN Laser 16-bit TIFFs Log2(ch1/ch2) … and image variability ? Each microarray is scanned with a single laser run for each fluorochrome … … intensity values of spots are calculated. … if a single microarray undergoes multiple scanning runs, the DNA spot images obtained are not exactly superimposable…
DNA spot images obtained from multiple scanning runs, are not exactly superimposable I II III IV
A = weakly expressed B = moderately expressed C = highly expressed Differences in pixels intensities Serial scans I II III IV V VI VII VIII IX X A spot B C
Image variability 4% FP Quantification output variability Different microarray results Pixel intensities differences Probably only a portion of the fluorochromes is excitable by the laser beam and measurable by the photomultiplier, while the confocal scanning system is detecting the fluorescence of a spot subregion.
Novel software for image integration http://muscle.cribi.unipd.it/microarrays/spot/ I1 I2 I3 I4 • pot superimposes n Tif images (input microarray images) • VP=(pixel11, pixel12, … , pixel1n) • 2) Calculates for each pixel vector of the n images: • - Pixel intensity mean (mean of VP) • - Pixel intensity maximum, exclusion of saturated pixels (Max of VP) • 3) Develops a virtual Tif image that summarizes the n input ones
A B C A = weakly expressed Max. Mean B = moderately expressed C = highly expressed Resulting virtual image after ten serial scans I II III IV V VI VII VIII IX X
Resulting virtual image after ten serial scans: entire microarray
Serial scans and image integration improve spot (A) and background (B) uniformity Image uniformity improves spot detection and quantification
< 1 % False Positives Serial scans and image integration improve reliability of microarray results 4% False Positives
Competitive hybridisation with the same mRNA Two experiments where two equal aliquots of skeletal muscle RNA (A) and heart muscle RNA (B) were labelled with Cy3 and Cy5 and challenged in competitive hybridisation. In these case, all the Cy3/Cy5 ratios of spot intensities should lie at around 1. Due to experimental variability, a portion of spot intensity ratios are far from 1 mean Number of outliers decreases with image integration max
Variation of spot signal intensity with incremental number of scans = Amino-allyl = RT-labelling of total RNA = DNA dendrimer probe = RT-labelling of aRNA = TSA Spot Intensity ~40.000 units Spot Intensity ~500 units
In the first experiment, we challenged RNAs of skeletal and heart muscle in competitive hybridisation. 1 In the second one, we compared RNAs of dystrophic (facioscapulohumeral muscular dystrophy) and normal muscle. 2 Quantification of the efficacy of the multi-scans approach in detecting differentially expressed genes We performed and analysed two microarray experiments hybridised with a target made with RT labelling, and TSA methodology: We performed and analysed two microarray experiments hybridised with a target made with RT labelling, and TSA methodology: • 2 replicates for each experiment with dye swapping (4 spots replicates) • SNOMAD web tool (global and local options) for data normalization • SAM for identification of differentially expressed genes
Evaluation efficacy approach 1 Identify differentially expressed genes in 1 scan experiments Integrate the first 2, 4, 6, 8 and 10 serial scans and for each integration find genes differentially expressed 2 CFP: genes found to be differentially expressed with 1 scans but not with the all the serial integrated images CFN: genes found to be differentially expressed with the serial integrated images but not with 1 scan 3 NFP: genes found to be differentially expressed with the i –th integration but not with all the subsequent NFN: genes found to be differentially expressed with the i-th integration but not with the previous ones 4
Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling - 1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle Overexpressed genes Underexpressed genes CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-i ones
Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling - 1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle Overexpressed genes Underexpressed genes NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image
Increase of the number of differentially expressed genes FSHD vs. normal – TSA - With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle Overexpressed genes Underexpressed genes CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-1 ones
Increase of the number of differentially expressed genes FSHD vs. normal – TSA - With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle Overexpressed genes Under expressed genes 2 4 6 8 10 NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image
The greatest improvement in differentially expressed genes revealed by multi-scan approach concerns weakly expressed genes. Relationship between CFN and their spot intensities Dystrophic vs. normal muscle Skeletal muscle vs. heart Cy5 Cy3 Frequency Frequency Spot Intensity Spot Intensity
CFN, over expressed in sk. muscle CFN, underexpressed in sk. muscle CFP (17) H.sapiens clone alpha_est218/52C1 (18) H.sapiens CD27-binding (Siva) protein transcript variant 1 (19) human skeletal muscle 1.3 kb mRNA for tropomyosin; (20) H.sapiens cathepsin H (9) troponin T2, cardiac (10) alpha-actin, cardiac muscle (11) myosin-binding protein C, cardiac (12) H.sapiens heat shock 90 kDa protein 1, alpha (13) H.sapiens haplotype M*2 mitochondrion (14) H.sapiens chromosome 5, BAC (15) H.sapiens macrophage migration inhibitory factor (glycosylation-inhibiting factor) (16) H.sapiens ring finger protein 28 (1) myosin-binding protein C, fast type (2) titin (3) human DNA sequence (4) human DNA sequence (5) H.sapiens mRNA for striate muscle-specific hypothetical protein (ORF1), clone 00275 (6) human DNA sequence (7) H.sapiens acetyl-coenzyme A transporter (8) human autoantigen small nuclear ribonucleoprotein Sm-D ΣPOT results validation with RT-PCR semi-quantitative heart sk. muscle
Future work Integration of pot into scanner softwares Conclusions RT-labelling : Many FP (~ 10% of differentially expressed genes found with 1 scan) Many FN (~ + 50% of differentially expressed genes found with 1 scan) TSA-labelling : Small number of FP Highly increasing of FN (~ + 200%) Maximum and mean results overlap for the 80% of FP and FN transcripts 4-6 scans seems to be the best number of scans required for a satisfactory inprovement in detecting differentially expressed genes
Technical details • potis written in C language with libtiff libraries, it runs on UNIX system • SAMhttp://www-stat.stanford.edu/~tibs/SAM/index.html • SNOMAD http://pevsnerlab.kennedykrieger.org/snomadinput.html • Spotting device: GenePackArray 21 with 16 stealth micro pins • Scanner: Perkin Elmer LITE dual confocal laser scanner with software Scan Array • Image analysis software: QuantArray • HumanMuscleArray: http://muscle.cribi.unipd.it/microarrays/human.html
Genomic Research University of Padova http://grup.cribi.unipd.it/ Acknowledgements Gerolamo Lanfranchiproject supervisor Microarray Team Silvia Trevisan, Barbara Celegato, Bioinformatics Team Germano Costa, Micky Del Favero Reference Romualdi Chiara et al. (2003) Nucl. Acids. Res. 31: e149. Web sites http://muscle.cribi.unipd.it/microarrays/ http://muscle.cribi.unipd.it/microarrays/spot/
2 scans 4 scans 6 scans 8 scans 10 scans Mean Max Mean Max Mean Max Mean Max Mean Max Over Exp. FP 26 (13) 19 (10) 21 (11) 21 (11) 24 (12) 19 (10) 20 (10) 30 (15) 24 (12) 24 (12) FN 18 (9) 41 (21) 37 (19) 36 (18) 36 (18) 53 (27) 50 (25) 18 (9) 34 (17) 29 (15) Under Exp. FP 6 (19) 7 (22) 2 (6) 5 (16) 2 (6) 3 (9) 4 (13) 3 (9) 4 (13) 1 (3) FN 7 (22) 12 (38) 13 (41) 15 (47) 14 (44) 18 (56) 23 (72) 15 (47) 20 (63) 20 (63) FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling - 1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle
FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling - 1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle Overexpressed genes Underexpressed genes
2 scans 4 scans 6 scans 8 scans 10 scans Mean Max Mean Max Mean Max Mean Max Mean Max Over Exp. FP 26 (13) 19 (10) 21 (11) 21 (11) 24 (12) 19 (10) 20 (10) 30 (15) 24 (12) 24 (12) CFP - - 19 15 18 13 15 13 16 14 FN 18 (9) 41 (21) 37 (19) 36 (18) 36 (18) 53 (27) 50 (25) 18 (9) 34 (17) 29 (15) CFN - - 15 20 22 20 24 17 34 15 Under Exp. FP 6 (19) 7 (22) 2 (6) 5 (16) 2 (6) 3 (9) 4 (13) 3 (9) 4 (13) 1 (3) CFP - - 2 4 2 3 2 2 2 1 FN 7 (22) 12 (38) 13 (41) 15 (47) 14 (44) 18 (56) 23 (72) 15 (47) 20 (63) 20 (63) CFN - - 4 7 10 9 13 11 17 14 Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling - 1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-1 ones
2 scans 4 scans 6 scans 8 scans 10 scans Mean Max Mean Max Mean Max Mean Max Mean Max Over Exp. NFP 26 19 2 6 3 2 1 7 2 2 NFN 18 41 10 5 4 10 4 0 0 3 Under Exp. NFP 6 7 0 1 1 0 1 0 1 1 NFN 7 12 9 8 3 8 7 3 2 3 Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling - 1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image
2 scans 4 scans 6 scans 8 scans 10 scans Mean Max Mean Max Mean Max Mean Max Mean Max Over Exp. FP 0 (0) 0(0) 0(0) 0 (0) 1(1) 1(1) 0 (0) 1(1) 0 (0) 0 (0) FN 110 (74) 90 (61) 154 (104) 131 (89) 184 (124) 137 (93) 198 (134) 158 (107) 207 (140) 169 (114) Under Exp. FP 2 (2) 2 (2) 2 (2) 2 (2) 1 (1) 2 (2) 2 (2) 2 (2) 2 (2) 2 (2) FN 175 (164) 157 (147) 214 (200) 198 (185) 229 (214) 191 (179) 263 (246) 212 (198) 255 (238) 244 (228) FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan Increase of the number of differentially expressed genes FSHD vs. Normal – TSA - With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle
FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan Increase of the number of differentially expressed genes FSHD vs. normal – TSA - With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle Overexpressed genes Underexpressed genes 2 4 6 8 10
2 scans 4 scans 6 scans 8 scans 10 scans Mean Max Mean Max Mean Max Mean Max Mean Max FP 0 (0) 0(0) 0(0) 0 (0) 1(1) 1(1) 0 (0) 1(1) 0 (0) 0 (0) CFP - - 0 0 0 0 0 1 0 0 Over Exp. 110 (74) 90 (61) 154 (104) 131 (89) 184 (124) 137 (93) 198 (134) 158 (107) 207 (140) 169 (114) FN CFN - - 107 85 152 117 175 131 189 150 FP 2 (2) 2 (2) 2 (2) 2 (2) 1 (1) 2 (2) 2 (2) 2 (2) 2 (2) 2 (2) CFP - - 2 2 0 2 1 2 1 2 Und. Exp. 175 (164) 157 (147) 214 (200) 198 (185) 229 (214) 191 (179) 263 (246) 212 (198) 255 (238) 244 (228) FN CFN - - 170 149 203 173 223 179 241 197 Increase of the number of differentially expressed genes FSHD vs. normal – TSA - With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-1 ones
2 scans 4 scans 6 scans 8 scans 10 scans Mean Max Mean Max Mean Max Mean Max Mean Max Over Exp. NFP 0 0 0 0 1 1 0 0 0 0 NFN 110 90 47 46 31 17 21 16 14 14 Und. Exp. NFP 2 2 0 0 0 0 0 0 0 0 NFN 175 157 44 49 22 17 32 23 6 21 Increase of the number of differentially expressed genes FSHD vs. normal – TSA - With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image