1 / 13

Michael D. Radmacher, Ph.D. Biometric Research Branch National Cancer Institute

Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments. Michael D. Radmacher, Ph.D. Biometric Research Branch National Cancer Institute. One Potential of Gene Expression Data.

keanu
Download Presentation

Michael D. Radmacher, Ph.D. Biometric Research Branch National Cancer Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class Prediction Based on Gene Expression DataIssues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research Branch National Cancer Institute

  2. One Potential of Gene Expression Data • Specimens will be distinguishable by their gene expression profiles • NCI Director’s Challenge: Toward a Molecular Classification of Tumors • “This challenge is intended to lay the groundwork for changing the basis of tumor classification from morphological to molecular characteristics.” • Purpose is “...to define comprehensive profiles of molecular alterations in tumors that can be used to identify subsets of patients.” • So one important goal is: Classification

  3. Class Discovery Identification of previously unknown classes of specimens Use of “unsupervised” methods Hierarchical Clustering k-means Clustering SOMs Others Prevalent method used in literature for analysis of gene expression data. Class Prediction Assignment of specimens into known classes Use of “supervised” methods Logistic Regression CART Discriminant Analysis Others Class prediction is more powerful than class discovery for distinguishing specimens based on a priori defined classes. What is meant by “Classification”?Two important and distinct answers:

  4. DLBCL is clinically heterogeneous Specimens were clustered based on their expression profiles of GC B-cell associated genes. Two subgroups were discovered: GC B-like DLBCL Activated B-like DLBCL Example of Class Discovery:Distinct Types of Diffuse Large B-Cell Lymphoma (Figures and information taken from Alizadeh et al., Nature403:503-11, 2000)

  5. Class Discovery Identification of previously unknown classes of specimens Use of “unsupervised” methods Hierarchical Clustering k-means Clustering SOMs Others Prevalent method used in literature for analysis of gene expression data. Class Prediction Assignment of specimens into known classes Use of “supervised” methods Logistic Regression CART Discriminant Analysis Others Class prediction is more powerful than class discovery for distinguishing specimens based on a priori defined classes. What is meant by “classification”?Two important and distinct answers:

  6. cDNA Microarrays Parallel Gene Expression Analysis 6526 genes /tumor Study of Gene Expression in Breast Tumors (NHGRI, J. Trent) • How similar are the gene expression profiles of BRCA1 and BRCA2 (+) and sporadic breast cancer patient biopsies? • Can we identify a set of genes that distinguish the different tumor types? • Tumors studied: • 7 BRCA1 + • 8 BRCA2 + • 7 Sporadic

  7. BRCA1 +/- and BRCA2 +/- Classification:Results from Hierarchical Clustering BRCA1 Clustering BRCA2 Clustering

  8. Class Prediction Paradigm • Begin with a data set that can be separated into known groups. • Choose a method of class prediction. • Perform class prediction on the data set using “leave-one-out” cross-validation. • Leave one specimen out of data set. • Build the class predictor using remaining data. • Predict class of the left out specimen. • Repeat so that a prediction is made for every specimen. • Use a permutation test to determine if there is a significant difference in expression patterns between the groups. • Permute class labels among specimens. • Perform class prediction on the permuted data. • Repeat many times. • Report the % of permuted sets with an error rate equivalent to or less than that for the actual data set.

  9. The Compound Covariate Predictor (CCP) • We consider only genes that are differentially expressed between the two groups (using a two-sample t-test with small a). • The CCP • Motivated by J. Tukey, Controlled Clinical Trials, 1993 • Simple approach that may serve better than complex multivariate analysis • A compound covariate is built from the basic covariates (log-ratios) tj is the two-sample t-statistic for gene j. xijis the log-ratio measure of sample i for gene j. Sum is over all differentially expressed genes. • Threshold of classification: midpoint of the CCP means for the two classes.

  10. BRCA1 +/- and BRCA2 +/- Classification: Results from Class Prediction with CCP

  11. Sample Size Considerations for Accurate Class Prediction

  12. Summary • Class discovery and prediction methods have distinct goals. • When class information is known, class prediction is a more powerful method for detecting differences. • BRCA1 and BRCA2 mutation positive tumors have distinguishable gene expression patterns. • BRCA1 distinction is stronger than BRCA2. • Some biological insight concerning misclassified specimens. • Not at level of clinical classification yet. • Sample size issues

  13. NCI Richard Simon NHGRI Mike Bittner Yidong Chen David Duggan Ingrid Hedenfalk Jeff Trent Collaborators

More Related