1 / 68

Case Study I: Two-Sample Analysis

Case Study I: Two-Sample Analysis. Ru-Fang Yeh October 23, 2004 Genentech Hall Auditorium, Mission Bay, UCSF. Biological question. Experimental design. Microarray experiment. Failed. Quality Measurement. Image analysis. Preprocessing. Normalization. Pass. Sample/Condition

ailani
Download Presentation

Case Study I: Two-Sample Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Study I: Two-Sample Analysis Ru-Fang Yeh October 23, 2004 Genentech Hall Auditorium, Mission Bay, UCSF

  2. Biological question Experimental design Microarray experiment Failed Quality Measurement Image analysis Preprocessing Normalization Pass Sample/Condition Gene 1 2 3 4 … 1 0.46 0.30 0.80 1.51 … 2 -0.10 0.49 0.24 0.06 … 3 0.15 0.74 0.04 0.10 … : … Analysis Estimation Testing Annotation Clustering Discrimination Biological verification and interpretation Microarrays: Case Studies and Advanced Analysis

  3. Image analysis CEL, CDF files gpr, gal files UCSF spot file • Short-oligonucleotide chip data: • quality assessment, • background correction, • probe-level normalization, • probe set summary • Two-color spotted array data: • quality assessment; diagnostic plots, • background correction, • array normalization. • Array CGH data: • quality assessment; diagnostic plots, • , background correction • clones summary; • array normalization. Quality assessment Pre-processing probes by sample matrix of log-ratios or log-intensities • Analysis of expression data: • Identify D.E. genes, estimation and testing, • clustering, • discrimination, and etc. Analysis Microarrays: Case Studies and Advanced Analysis

  4. Biological Question: Molecular Phenotypic Difference in Rat Alveolar Type I and Type II Cells From “Freshly-isolated Rat Alveolar Type I Cells, Type II Cells, and Cultured Type II Cells Have Distinct Molecular Phenotypes.” (To appear, A J Phys) By Robert Gonzalez, Yee Hwa Yang, Chandi Griffin, Lennell Allen, Zachary Tigue, and Leland Dobbs.

  5. Pulmonary Alveolar Epithelium Type II Cells Type I Cells Microarrays: Case Studies and Advanced Analysis

  6. Alveolar Epithelial Type I and Type II Cells • Type I CellsType II Cells • % Lung cells ~8% ~15% • % Lung internal surface area ~98% ~2% • Volume / cell ~2000 µm3 ~400 µm3 • Surface / area ~5300 µm2 ~100 µm2 • Stone, AJRCMB 1992 • Morphologic characteristics conserved across the entire range of mammals. • Known/Possible - water and ion transport - surfactant metabolism • Functions - host defense (oxidants - ion transport • & microorganisms - produce immune • - tumor suppression effector molecules • - matrix preservation - Progenitor cells for TI cells after oxidant • injury (and in lung • development) Microarrays: Case Studies and Advanced Analysis

  7. Alveolar Epithelial Cell Lineage Following Lung Injury Type II cell Transdifferentiation The process by which one “stable” (differentiated) cellular phenotype changes into a different “stable” cellular phenotype. Proliferation Evans, 1975 Adamson, 1975 Type I cell Microarrays: Case Studies and Advanced Analysis

  8. Study Objectives Long term goals: Increase understanding of • alveolar epithelial cell lineages. • the mechanisms that regulate alveolar epithelial development and differentiation. Use microarrays to establish molecular profiles of TI and TII cells: • Identification of differences in expression of single genes to • provide additional marker genes • develop new hypotheses about cellular functions of each cell type • To determine changing patterns of expression of groups of genes • to understand processes of (trans)-differentiation in vivo and in vitro • to identify candidate factors (transcription cascades) important in regulating differentiation Microarrays: Case Studies and Advanced Analysis

  9. Gene Expression Experiment TII Cells Cultured TII Cells TI Cells

  10. Freshly Isolated TI and TII Cells TI cell fragment TII cell TII CELLS TI CELLS Microarrays: Case Studies and Advanced Analysis

  11. Matrix (TCP, fibronectin) • Apical surface covered by liquid • Mechanical distention • Matrix (EHS, contracted collagen gels) • Soluble factors (ex: KGF) • Apical surface exposed to air • Mechanical contraction Type II Cellsin vitro Microarrays: Case Studies and Advanced Analysis

  12. Experimental design • Probe: Affymetrix Rat U34 chip A, with 8799 probe sets. • Target: 4 biological replicates of each cell type: • TID0: freshly isolated TI cells • TIID0: freshly isolated TII cells • TIID7: cultured TII cells (for 7 days) [traditionally used as a model for TI day 0 cells] Cell purity criterion: < 2% cross-contamination Microarrays: Case Studies and Advanced Analysis

  13. Preparing mRNA samples: Dissection of tissue RNA Isolation Amplification Probe labelling Hybridization Microarrays: Case Studies and Advanced Analysis

  14. Preparing mRNA samples: Dissection of tissue Biological Replicate RNA Isolation Amplification Probe labelling Hybridization Microarrays: Case Studies and Advanced Analysis

  15. Preparing mRNA samples: Dissection of tissue RNA Isolation Technical replicate Amplification Probe labelling Hybridization Microarrays: Case Studies and Advanced Analysis

  16. Analysis Aims Main Questions:Establish molecular profiles of TI and TII cells: • Identification of differences in expression of single genes to: • provide additional marker genes • develop new hypotheses about cellular functions of each cell type. • To understand the process of (trans)-differentiation in vivo and in vitro • To identify candidate factors (transcription cascades) important in regulating differentiation. Approaches: • Identify differentially expressed (DE) genes between TID0 and TII D0. • Comparing TID0 and TIID7; are they similar? • Finding common regulatory element (transcription factor binding site) in groups of candidate co-regulated genes. Microarrays: Case Studies and Advanced Analysis

  17. Biological question Experimental design Microarray experiment Failed Quality Measurement Image analysis Preprocessing Normalization Pass Sample/Condition Gene 1 2 3 4 … 1 0.46 0.30 0.80 1.51 … 2 -0.10 0.49 0.24 0.06 … 3 0.15 0.74 0.04 0.10 … : … Analysis Estimation Testing Annotation Clustering Discrimination Biological verification and interpretation Microarrays: Case Studies and Advanced Analysis

  18. Preprocessing • Quality Assessment. • Background subtraction. • Normalization. • Summarization of probe sets value.

  19. * * * * * High Density Oligonucleotide Arrays (Affymetrix) Hybridized Probe Cell Single stranded, labeled RNA target Oligonucleotide probe 24µm GeneChipProbe Array Millions of copies of a specific oligonucleotide probe per probe cell ~500,000 probe cells on each chip 1.28cm Image of Hybridized Probe Array Microarrays: Case Studies and Advanced Analysis

  20. How Affymetrix Arrays Are Made Figure from Lipshutz et al. Nat. Gen. 1999. Microarrays: Case Studies and Advanced Analysis

  21. For one gene (probe set): 16 probes/gene for Rat U34 mRNA reference sequence 3’ 5’ …TCGTCTGTATCACAGACACAAAGTTGACTG… PM: CAGACATAGTGTCTGTGTTTCAACT MM: CAGACATAGTGTGTGTGTTTCAACT PM MM Fluorescent probe intensity Microarrays: Case Studies and Advanced Analysis

  22. Hybridization + Scanning DAT File Image analysis + CEL File CDF File • Preprocessing • 0. Quality Assessment. • Background subtraction (B). • Normalization (N). • Summarization of probe sets values (S). dChip MAS GCOS RMA GCRMA Text File Probe ID + Log2 (Intensity) CHP File Intensity value Absent / Present call Excel File Report File, quality Microarrays: Case Studies and Advanced Analysis

  23. Quantile NormalizationBolstad et al (2003) • Quantile normalization is a method to make the distribution of probe intensities the same for every chip. • The normalization distribution is chosen by averaging each quantile across chips. Microarrays: Case Studies and Advanced Analysis

  24. Probe Set Summarization:Robust Multi-array Average -- Irizarry et al (2003) • The RMA model assumes that each probe cell is made up of Log2 Normalized (Observed Intensity – Background) = Chip effect + Probe-specific effect + error • The expression level is estimated using a robust procedure (such as median polish or IRLS) to fit the above linear model. PM RMA values: log2 Expression for chip i Microarrays: Case Studies and Advanced Analysis

  25. Summarization Method Comparison: AffyComphttp://affycomp.biostat.jhsph.edu/ average false positives if we use fold-change > 2 as a cut-off Median SD across replicates Microarrays: Case Studies and Advanced Analysis

  26. Software • Affymetrix: MAS v5.1 or GCOS v1.0 • RMA (Robust Multi-array Average) / GCRMA / PLM: • Bioconductorhttp://www.bioconductor.org • affylmGUIhttp://bioinf.wehi.edu.au/affylmGUI/ • RMAExpresshttp://stat.www.berkeley.edu/~bolstad/RMAExpress/RMAExpress.html • Axon: Acuity (RMA only, commercial) • GeneTraffic (RMA only, commercial) • Li & Wong’s MBEI (Multiplicative Model-Based Expression Index): • dChiphttp://www.dchip.org/ Microarrays: Case Studies and Advanced Analysis

  27. Qualitative Quality Assessment Using PLM Residuals Weights More QC Examples: http://stat-www.berkeley.edu/users/bolstad/PLMImageGallery/index.html Microarrays: Case Studies and Advanced Analysis

  28. QC with affyPLM Microarrays: Case Studies and Advanced Analysis

  29. QC with boxplots Microarrays: Case Studies and Advanced Analysis

  30. RMAExpress Microarrays: Case Studies and Advanced Analysis

  31. affylmGUI Microarrays: Case Studies and Advanced Analysis

  32. Biological question Experimental design Microarray experiment Failed Quality Measurement Image analysis Preprocessing Normalization Pass Sample/Condition Gene 1 2 3 4 … 1 0.46 0.30 0.80 1.51 … 2 -0.10 0.49 0.24 0.06 … 3 0.15 0.74 0.04 0.10 … : … Analysis Estimation Testing Annotation Clustering Discrimination Biological verification and interpretation Microarrays: Case Studies and Advanced Analysis

  33. Analysis Identify differentially expressed (DE) genes between TID0 and TII D0. Compare TID0 and TIID7. Beyond expression.

  34. DE by Average Fold-Change (M): Freshly Isolated TI vs TII Cells ~ 8800 probe sets 50 + 131 > 4x (M>2) 163 + 401 > 2x Simple fold-change rules give no assessment of statistical significance  Need to construct test statistics incorporating variability estimates (from replicates). TI 4x 2x M: 2x 4x TII A: Microarrays: Case Studies and Advanced Analysis

  35. Two-sample t-statistic & p-value • The two-sample t-statistic is used to test equality of the group means 1, 2 • The p-valuep* is the probability that, under the null hypothesis (H0: 1=2), the test statistic is at least as extreme as the observed value t*. p*/2 p*/2 -t* t* Microarrays: Case Studies and Advanced Analysis

  36. More Two-Sample Statistics Perform statistical tests on normalized, log-transformed data: • Standard t-test: assumes normally distributed data in each class (!), equal variances within classes • Welch t-test: as above, but allows unequal variances • Wilcoxon test: non-parametric, rank-based • permutation test: estimate the distribution of the test statistic under the null hypothesis by permutations of the sample labels Microarrays: Case Studies and Advanced Analysis

  37. When there are few replicates… • (Fold-change) Averages can be driven by outliers. • T-statistics can be driven by tiny variances. Solution: “robust” version of t-statistic • Replace mean by median • Replace standard deviation by median absolute deviation Microarrays: Case Studies and Advanced Analysis

  38. Alternative Test Statistics 1. Penalized-t Trying to find a compromise between solely using t and solely using mean. There are several similar solutions of the following form: where s = standard deviation. Question: how to estimate a? - 90th percentile of standard deviations (s values). Efron et al (2000). - minimizes the coefficient of variation(cv) of the absolute t-values (SAM). Tusher et al (2001) Microarrays: Case Studies and Advanced Analysis

  39. Other Statistics (cont.) 2. Moderated t-statistics(G Smyth 2004, Limma): where is the shrinkage estimate of standard deviation Pooled sd from all genes sd for gene i Estimation is done using an extension to the empirical Bayes method in Lonnstadt &Speed (2002) Microarrays: Case Studies and Advanced Analysis

  40. Other Statistics (cont.) 3.B-statistic: log posterior odds ratios log Pr(gene i IS DE) / Pr(gene iIS NOT DE) Equivalent to moderated-t in terms of ranking genes. 4. Single-channel methods modeling absolute gene-expression levels: - Newton et al 2001: log-intensities ~ Gamma - Wolfinger et al 2001: linear mixed model on log-intensities 5.Composite methods:Differential Expressed genes via Distance Synthesis(Yang et al 2004) to choose genes that are extreme on all measures by defining a “distance” statistic based on measures of choice. Microarrays: Case Studies and Advanced Analysis

  41. DE by Fold Changes, (limma) Moderated-t, B (lods) Microarrays: Case Studies and Advanced Analysis

  42. Assessing Significance I: Diagnostic Plots Microarrays: Case Studies and Advanced Analysis

  43. Assessing Significance II: Testing Univariate hypothesis testing: For single gene, test the null hypothesis H0 : the gene is NOT differentially expressed. And p-value can be generated via theory or permutation tests. Is this p-value correct? • Yes if only looking at ONE gene… • Will expect 10000*0.01 = 100 genes with p-value < 0.01 in 10,000 non-DE genes! -- clearly we can’t just use standard p-value thresholds (.05, .01)! • Need to adjust p-values for meaningful interpretation! Microarrays: Case Studies and Advanced Analysis

  44. (Unadjusted) p-values of moderated-t Microarrays: Case Studies and Advanced Analysis

  45. Multiple Hypothesis Testing H0 Ha Microarrays: Case Studies and Advanced Analysis

  46. Type I Error Rates (False Positives) • Family-Wise Error Rate (FWER) Pr(V > 0) = Pr( At least one false positive ) • False Discovery Rate (FDR) -- The FDR (Benjamini & Hochberg 1995) is the expected proportion of type I errors among the rejected hypotheses. FDR = E(Q), With Q = V/R, if R > 0 0, if R = 0 Microarrays: Case Studies and Advanced Analysis

  47. Multiple Testing: Controlling a Type I Error Rate AIM: For a given type I error rate , use a procedure to select a set of “significant” genes that guarantee a type I error rate  . Microarrays: Case Studies and Advanced Analysis

  48. Adjusted p-values: Controlling the FWER • The Bonferroni correction: m pg ; most conservative adjustment. assume independence among genes. • Sidák: 1-(1-pg)m • minP (Westfall & Young): estimated through permutation; allow dependency between genes. • maxT:replace pgby test statistics Tg, min by max. Less computationally intensive than minP. • Step-down • Step-up Choosing all genes with adjusted p-value controls the FWER at level  Microarrays: Case Studies and Advanced Analysis

  49. Controlling the FDR (Benjamini/Hochberg) • Order unadjusted p-values: • To control FDR = E(V/R) at level , reject the hypothesis • Adjusted p-values: • Interpretation: expect 5% false positives among genes with < 0.05 FDR-adjusted p-values. Microarrays: Case Studies and Advanced Analysis

  50. Adjusted p-values p=0.01 Microarrays: Case Studies and Advanced Analysis

More Related