1 / 22

Presented by Nan Lin 13 October 2002

Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments. Presented by Nan Lin 13 October 2002. Introduction to cDNA Microarray Experiment. Single-slide Design Two mRNA samples (red/green) on the same slide Multiple-slide Design

annot
Download Presentation

Presented by Nan Lin 13 October 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002

  2. Introduction to cDNA Microarray Experiment • Single-slide Design • Two mRNA samples (red/green) on the same slide • Multiple-slide Design • Two or more types of mRNA on different slides • Exclude: time-course experiment

  3. Examples of Multiple-slide Design • Apo AI • Treatment group: 8 mice with apo AI gene knocked out • Control group: 8 C57B1/6 mice • Cy5: each of 16 mice • Cy3: pooling cDNA from 8 control mice • SR-BI • Treatment group: 8 SR-BI transgenic mice • Control group: 8 “normal” FVB mice • Microarray Setup • 6384 spots, 4X4 grids with 19X21 spots in each

  4. Single-slide Methods • Two types • Based solely on intensity ratio R/G • Take into account overall transcript abundance measured by R*G • Historical Review • Fold increase/decrease cut-offs (1995-1996) • Probabilistic modeling based on distributional assumptions (1997-2000) • Consider R*G (2000-2001) e.g. Gamma-Gamma-Bernoulli

  5. Summary of Single-slide Methods • Producing a model dependent rule: drawing two curves in the (R,G) plane • Power (1-Type II error rate) • False positive rate (Type I error rate) • Multiple testing • Replication is needed because gene expression data are too noisy

  6. Image Analysis • “Raw” data: 16-bit TIFF files • Addressing • Within a batch, important characteristics are similar • Segmentation • Seeded region growing algorithm • Background adjustment • Morphological opening (a nonlinear filter) • Software package: Spot in R environment

  7. Single-slide Data Display • Plot log2R vs. log2G • variation less dependent on absolute magnitude • normalization is additive for logged intensities • evens out highly skewed distributions • a more realistic sense of variation • Plot M=log2 (R/G) vs. A=[log2(RG)]/2 • More revealing in terms of identifying spot artifacts and for normalization purpose

  8. Normalization • Identify and remove sources of systematic variation other than differential expression • Different labeling efficiencies and scanning properties for Cy3 and Cy5 • Different scanning parameters • Print-tip, spatial or plate effects • Red intensity is often lower than green intensity • The imbalance between R and G varies • across spots and between arrays • Overall spot intensity A • Location on the array, plate origin, etc.

  9. An Example: Self-Self Experiment

  10. Normalization (Cont.) • Global normalization • subtract mean or median from all intensity log-ratios • More complex normalization • Robust locally weighted regression • M=spot intensity A+location+plate origin • Use print-tip group to represent the spot locations • log2 (R/G) log2 (R/G) –l(A,j) • l(A,j): lowess in R (0.2<f<0.4) • Control sequences

  11. Apo AI: Normalization

  12. Graphical Display for Test Statistics (I) • Test statistics • Hj: no association between treatment and the expression level of gene j, j=1,…,m. • Two-sided alternative • Two-sample Welch t-statistics • Replication is essential to assess the variability in treatment and control group • The joint distribution is estimated by a permutation procedure because the actual distribution is not a t-distribution

  13. Graphical Display for Test Statistics (II) • Quantile-Quantile plots

  14. Graphical Display for Test Statistics (III) • Plots vs. absolute expression levels

  15. Multiple Hypothesis Testing: Adjusted p-values (I) • P-value: Pj=Pr(|Tj|>=|tj||Hj), j=1,…,m. • Family-wise Type I Error Rate (FWER) • The probability of at least one Type I error in the family • Strong Control of the FWER • Control the FWER for any combination of true and false hypotheses • Weak Control of the FWER • Control the FWER only under the complete null hypothesis that all hypotheses in the family are true

  16. Multiple Hypothesis Testing: Adjusted p-values (II) • Adjusted p-value for Hj • Pj=inf{a: Hj is rejected at FWER=a} • Hj is rejected at FWER a if Pj<=a • P-value adjustment approaches • Bonferroni • Sidak single-step • Holm step-down • Westfall and Young step-down minP

  17. Multiple Hypothesis Testing: Estimation of adjusted p-values (I)

  18. Multiple Hypothesis Testing: Estimation of adjusted p-values (II)

  19. Apo AI: Adjusted p-values (I)

  20. Apo AI: Adjusted p-values (II)

  21. Apo AI: Comparison with Single-slide Methods

  22. Discussion • M-A plots • Normalization • Robust local regression, e.g. lowess • Q-Q plots & Plots vs. absolute expression level • False discovery rate (FDR) • Replication is necessary • Design issues • Factorial experiments • Joint behavior of genes • R package SMA

More Related