220 likes | 383 Views
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments. Presented by Nan Lin 13 October 2002. Introduction to cDNA Microarray Experiment. Single-slide Design Two mRNA samples (red/green) on the same slide Multiple-slide Design
E N D
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002
Introduction to cDNA Microarray Experiment • Single-slide Design • Two mRNA samples (red/green) on the same slide • Multiple-slide Design • Two or more types of mRNA on different slides • Exclude: time-course experiment
Examples of Multiple-slide Design • Apo AI • Treatment group: 8 mice with apo AI gene knocked out • Control group: 8 C57B1/6 mice • Cy5: each of 16 mice • Cy3: pooling cDNA from 8 control mice • SR-BI • Treatment group: 8 SR-BI transgenic mice • Control group: 8 “normal” FVB mice • Microarray Setup • 6384 spots, 4X4 grids with 19X21 spots in each
Single-slide Methods • Two types • Based solely on intensity ratio R/G • Take into account overall transcript abundance measured by R*G • Historical Review • Fold increase/decrease cut-offs (1995-1996) • Probabilistic modeling based on distributional assumptions (1997-2000) • Consider R*G (2000-2001) e.g. Gamma-Gamma-Bernoulli
Summary of Single-slide Methods • Producing a model dependent rule: drawing two curves in the (R,G) plane • Power (1-Type II error rate) • False positive rate (Type I error rate) • Multiple testing • Replication is needed because gene expression data are too noisy
Image Analysis • “Raw” data: 16-bit TIFF files • Addressing • Within a batch, important characteristics are similar • Segmentation • Seeded region growing algorithm • Background adjustment • Morphological opening (a nonlinear filter) • Software package: Spot in R environment
Single-slide Data Display • Plot log2R vs. log2G • variation less dependent on absolute magnitude • normalization is additive for logged intensities • evens out highly skewed distributions • a more realistic sense of variation • Plot M=log2 (R/G) vs. A=[log2(RG)]/2 • More revealing in terms of identifying spot artifacts and for normalization purpose
Normalization • Identify and remove sources of systematic variation other than differential expression • Different labeling efficiencies and scanning properties for Cy3 and Cy5 • Different scanning parameters • Print-tip, spatial or plate effects • Red intensity is often lower than green intensity • The imbalance between R and G varies • across spots and between arrays • Overall spot intensity A • Location on the array, plate origin, etc.
Normalization (Cont.) • Global normalization • subtract mean or median from all intensity log-ratios • More complex normalization • Robust locally weighted regression • M=spot intensity A+location+plate origin • Use print-tip group to represent the spot locations • log2 (R/G) log2 (R/G) –l(A,j) • l(A,j): lowess in R (0.2<f<0.4) • Control sequences
Graphical Display for Test Statistics (I) • Test statistics • Hj: no association between treatment and the expression level of gene j, j=1,…,m. • Two-sided alternative • Two-sample Welch t-statistics • Replication is essential to assess the variability in treatment and control group • The joint distribution is estimated by a permutation procedure because the actual distribution is not a t-distribution
Graphical Display for Test Statistics (II) • Quantile-Quantile plots
Graphical Display for Test Statistics (III) • Plots vs. absolute expression levels
Multiple Hypothesis Testing: Adjusted p-values (I) • P-value: Pj=Pr(|Tj|>=|tj||Hj), j=1,…,m. • Family-wise Type I Error Rate (FWER) • The probability of at least one Type I error in the family • Strong Control of the FWER • Control the FWER for any combination of true and false hypotheses • Weak Control of the FWER • Control the FWER only under the complete null hypothesis that all hypotheses in the family are true
Multiple Hypothesis Testing: Adjusted p-values (II) • Adjusted p-value for Hj • Pj=inf{a: Hj is rejected at FWER=a} • Hj is rejected at FWER a if Pj<=a • P-value adjustment approaches • Bonferroni • Sidak single-step • Holm step-down • Westfall and Young step-down minP
Multiple Hypothesis Testing: Estimation of adjusted p-values (I)
Multiple Hypothesis Testing: Estimation of adjusted p-values (II)
Discussion • M-A plots • Normalization • Robust local regression, e.g. lowess • Q-Q plots & Plots vs. absolute expression level • False discovery rate (FDR) • Replication is necessary • Design issues • Factorial experiments • Joint behavior of genes • R package SMA