1 / 78

Correlating mRNA and protein abundance via genomic and proteomic characteristics

Correlating mRNA and protein abundance via genomic and proteomic characteristics. Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004. outline. Why analyze mRNA and protein correlations Background Disparate Data Sources Correlating mRNA and Protein Results Other analyses

fahim
Download Presentation

Correlating mRNA and protein abundance via genomic and proteomic characteristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004

  2. outline Why analyze mRNA and protein correlations Background Disparate Data Sources Correlating mRNA and Protein Results Other analyses Formalism – comparing genome, transcriptome and proteome in terms of broad categories New Data Sets Analysis via Broad Categories Analysis of factors affecting correlations Another reason to expect correlations  Expression and Protein Interactions

  3. Why Correlate mRNA & Protein? • Harness mRNA and protein • Quantitative analysis of global mRNA levels currently is a preferred method for the analysis of the state of cells and tissues. • mRNA level <= ? => protein level • Several methods which either provide absolute mRNA abundance or relative mRNA levels in comparative analyses are easy to apply. • * Fast * Very Sensitive • Look, we have so much mRNA – why even bother with the protien

  4. Both mRNA and Protein Levels are necessary for complete analysis Shown mathematically in Hatzimanikatis et al Biotechnology 1999 Combinations of RNA and protein detection approaches have recently aided in the identification of biomarkers in cancer Hegde et al Current Opinion in Biotech 2003

  5. Relationship between mRNA and Protein levels dPi dt = ks;i *mRNAi - kd;i Pi where ks,i and kd,i are the protein synthesis and degradation rate constants, respectively, ks;i * mRNAi At steady state: Pi = kdi Unfortunately K is difficult to find --- So lets look and see if we can find correlations instead of deriving.

  6. Methods for determining mRNA expressionEach have Strengths and Weaknesses

  7. Methods for determining protein abundance • 2DE Gel Electrophoresis • (Klose, 1975; O’Farrell, 1975) • Multiple staining options • Small dynamic range • limited in what it can detect

  8. Methods for determining protein abundance ICAT • ICAT reagent-- relative levels • VB dynamic range • Cannot detect post-translational modifications • it require proteinsto contain cysteine residues, & these residues must be inthe region of a peptide that is produced during proteolyticcleavage

  9. MudPit • Really only HT that can • detect PT modifications

  10. Other Methods for determining protein abundance DIGE • e.g. Cy3 vs cy5 labeling • Very big dynamic range Tap Tagging Weissman & O’Shea (Oct 2003) 2D-electrophoresis

  11. Other Methods for determining protein abundance

  12. Same mRNA levels yet protein data varied > 20XN ~100, r = 0.9 Protein Quantification via measurement ofradioactivity Gygi et alMolecular and Cellular Biology,1999.

  13. Same mRNA levels yet protein data varied > 20XDo some ORFs bias the results? 73 proteins (69%) R = 0.356

  14. mRNA vs Proteinr = 0.74 Protein Quantification via image analysis Futcher et alMolecular and Cellular Biology, 1999

  15. Jury is out… Gygi et al: “This study revealed that transcript levels provide little predictive value with respect to the extent of protein expression.” Futcher et al: “there is a good correlation between protein abundance and mRNAabundance for the proteins that we have studied”.

  16. mRNA vs Protein Greenbaum et alBioinformatics 2001 r =0.67 While mine isn’t first A) Largest at the time: integrated previous two results B) first to integrate diverse data to analyze

  17. 3 Genes in Lung AdenocarcinomasOp18, Annexin IV, and GAPD r = 0.025 Chen et alMolecular & Cellular Proteomics, 2002.

  18. murine hematopoietic precursor MPROchange in expression 0 - 72 hr

  19. murine hematopoietic precursor MPROchange in expression 0 - 72 hr R = 0.58 ~ 80% of the genes are located in the first and third quadrants

  20. Ratios of wt+gal to wt gal ICAT vs microarrayN ~ 290, r = 0.6 Ideker et al Science, 2001

  21. Yeast growth under two different mediar = 0.45 but almost 1.0 for same loci in same pathway Washburn et al PNAS 2003

  22. Integrating multiple sources of Information The challenge for computational biology is to provide methodologies for transforming high-throughput heterogeneous data sets into biological insights about the underlying mechanisms. Although high-throughput assays provide a global picture, the details are often noisy, hence conclusions should be supported by several types of observations. Integration of data from assays that examine cellular systems from different viewpoints (for instance, gene expression and protein-protein interactions) can lead to a more coherent reconstruction and reduce the effects of noise. Nir Friedman Science 2004 Anyone who hs worked with HT data– noise is huge!

  23. Sources of Data

  24. Reference mRNA Sets Young Church Samson SAGE

  25. Fitting Protein Data Original Set

  26. mRNA vs Protein Greenbaum et alBioinformatics 2001 r =0.67 mRNA expression Reference Set 3 Affy Chip sets and SAGE 6249 ORFs Protein Abundance Reference Set #1 two 2DE sets  ALL Available Date 181 ORFs

  27. Outliers (2STDEV from the mean) High Protein Metabolism (1) Energy(2) Low Protein Prot. Syn. (5) Prot. Fate (6)

  28. Later larger datasets concurred with these results in that Generally… Protein synthesis (~35% of all protein synthesis genes) and Protein fate (folding, modification, destination) are more likely to have low protein vs mRNA than the general population AA metabolism & Energy are 2X as likely to have high protein vs mRNA than the general population Alcohol dehydrogenase is also a stress induced protein in many organisms (Matton et al. 1990; An et al. 1991; Millar et al. 1994), Faster Ramp Up? Alternatively, it is possible to look into mRNA stability as a factor Presently there are many structures within mRNA that are thought to influence stability including, among others, stem loops, UTRs premature stops and uORFS (Klaff et al. 1996)

  29. Non-Outliers Generally…Tight Regulation by the cell Only 3% of transcription associated genes (n = 441) have significantly uncorrelated mRNA and protein levels (2STDEV from trendline) Transcription Assoc. genes are 25% of the essential genes in yeast. Essential Genes as a group have higher correlations than the general yeast population 7% of Cell Cycle associated genes (n = 432) have significant non-correlation

  30. Quick Summary • Why correlate mRNA and protein levels? • Merged Disparate Data Sets • Distinct but complimentary • Global Correlations • Outliers are interesting: • Metabolism & Energy Relatively high protein levels • Protein Synthesis & Protein Fate low protein levels

  31. ~170 ORFs2 DE-gel datasets ~6,000 ORFs5 Affymetrix GeneChips+ SAGE data ~6,000 ORFs Data Set Size

  32. Enrichments (F,[v,S]) -(F,[w,G]) (F,[w,G]) (Feature, [v,S], [w,G]) = V&W are weights (expression level) of SetsS & G

  33. ~170 ORFs ~6,000 ORFs Visual Formalism Two different subsets of data because of limited size!

  34. Depletion of Random Coil Secondary Structure STABILITY Concurrence with data from Perczelet al Chemistry 2003 Regarding stability of specific secondary structures

  35. Enrichment of Amino Acids STABILITY Alanine’s, Glycines, Valines result in more compact structures More compact = more stable (i.e. thermophilic enzymes tend to be very compact)

  36. Enrichment of Amino Acids Simple story: translatome is enriched in same way as transcriptome

  37. Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost Effect of transcription yeast cell favors the expression of shorter ORFs over longer ones (as opposed to long lightweight ORFs – see MW of aa) This selection is happening, for the most part at the transcriptome level -------------------------------------------------------------------------------------------------- Neg Correlation between ORF length and mRNA expressionJansen & Gerstein 2000 (And to a lesser degree with Protein Abundance)

  38. Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost Effect of transcription CONCURS with experimental results from Akashi, Genetics 2003 See also: Akashi,Genetics 1996 & Moriyama and Powell, NAR 1998 hypothesize that this trend exists in S. cerevisiae, D. melanogaster and E. coli. (although probably not in C. elegans)

  39. Enrichment of Functional Categories

  40. Depletion Functional Categories Transcription & Cell Growth Molecular switches Require only minimal expression

  41. Enrichment of localization - BIAS? (Drawid & Gerstein. 2000),

  42. Review Formalism Different gene sets b/c of limited data Enrichments concur with experimental results

  43. Fitting Protein Data Newer Set Mudpit fit first into mRNA space then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  44. Fitting Protein Data Newer Set Mudpit fit first into mRNA space then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  45. Fitting Protein Data Newer Set Mudpit fit first into mRNA space then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  46. Fitting Protein Data Newer Set Mudpit fit first into mRNA space then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  47. Fitting Protein Data Newer Set Mudpit fit first into mRNA space then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set    

  48. Global Correlation mRNA Set 6249 ORFs Protein Set # 2 2 2DE sets & 2 Mudpit ~2000 ORFs

  49. Functional Categories Co-regulated proteins High: ion transport , INTERACTION WITH THE CELLULAR ENVIRONMENT, CELL FATE LOW: METABOLISM ,FATE. CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM

More Related