350 likes | 601 Views
Computational Methods for Biomarker Discovery in Proteomics and Glycomics. Vijetha Vemulapalli School of Informatics Indiana University Capstone Advisor: Dr. Haixu Tang. METABOLITES. Glycoproteins. HORMONES. Cancer. Proteins. Kinase. ENZYMES. Asthma. PSA. Mutations.
E N D
Computational Methods for Biomarker Discovery in Proteomics and Glycomics Vijetha Vemulapalli School of Informatics Indiana University Capstone Advisor: Dr. Haixu Tang
METABOLITES Glycoproteins HORMONES Cancer Proteins Kinase ENZYMES Asthma PSA Mutations Gene Amplification What are Biomarkers? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Substances present in increased or decreased amounts in body fluids or tissues that indicate exposure, disease or susceptibility to disease.
Some Uses of Biomarkers • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Biomarkers are increasingly being used for the following purposes: • Prognosis / Diagnosis of disease • Monitoring response to medication • With high sensitivity and throughput, proteomics and glycomics is capable of identifying many potential biomarkers simultaneously.
More on Biomarkers • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • A lot of times biomarkers have not been identified clearly. But based on the signature pattern of glycans and proteins, samples can be classified as healthy and diseased.
Proteins: A chain of amino acids including hormones, enzymes and antibodies. • Proteome: All the proteins in a cell or bodily fluid at a given point of time under certain conditions. What is Proteomics? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Proteomics: Proteomics is the study of proteins and proteomes using high-throughput technology. • http://parasol.tamu.edu/groups/amatogroup/foldingserver/images/proteinL.gif http://biology.clc.uc.edu/graphics/bio104/cell.jpg
What is Glycomics? • Problem Definition • Background • LC-MS Method • Results • CE • Method • Results • Acknowledgements • References • Glycoproteins: Proteins with attached polysaccharides. • Glycans: Polysaccharide chain attached to a protein • Glycome: The entire set of glycans that are present in a cell or a bodily fluid at a certain point of time under certain conditions. • Glycomics: Study of structure and function of oligosaccharides in a cell or organism. http://www.glyfdis.org/images/bg_image.jpg
Genome Scale Scanning Genome level Micro - arrays Transcriptome level Proteomics Proteome level Glycomics Glycome level High Throughput Technologies to Identify Biomarkers • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • http://phy.asu.edu/phy598-bio/D4%20Notes%2006_files/image002.jpg
Why the Focus on Proteomics and Glycomics? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Information content Transcriptome Static Genome Transcriptome Dynamic Proteome Glycome
Liquid Chromatography Mass Spectrometry Protein sample Data Liquid Chromatography / Mass Spectrometry (LC/MS) • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Why LC/MS for analysis of proteomes? • LC spreads complexity of the sample over time. • MS identifies ions based on their mass/charge value. • Software exists currently to identify proteins in a sample using data from a LC-MS experiment.
Liquid Chromatography (LC) • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Liquid Chromatography is a technique that separates ions or molecules dissolved in a solvent based on size of the ion/molecule, adsorption, ion-exchange or other similar characteristics. http://wwwlb.aub.edu.lb/~webcrsl/high_p3.jpg
What is Mass Spectrometer? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Mass Spectrometry (MS) is an instrument that identifies ions based on their mass-to-charge ratio. Source: http://www.chemguide.co.uk/analysis/masspec/howitworks.html & http://www.bmms.uu.se/ltq-ft.htm
Visualization of LC/MS Data : 2D Map • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
Liquid Chromatography Mass Spectrometry Protein sample Data Identification software Quantities of peptides identified from the sample MS View How Do We Find Biomarkers From LC-MS Data? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Identified Proteins and Peptides
MSView How Do We Find Biomarkers From LC-MS Data? Continued… • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Sample 1 Quantification 1 Analyze to find Biomarkers Sample 2 Quantification 2 Sample 3 Quantification 3 Sample N Quantification N
MSView • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References MSView Relative Quantification Visualization Components Visual comparison /Analysis Further analysis for Biomarker Discovery Purpose
Extracted Ion Chromatogram (XIC) • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Chromatogram created by plotting the intensity of the signal observed at a chosen m/z value in a series of mass spectra recorded as a function of retention time. Source: http://www.lcpackings.com/applications/Probot/images/dual_fract04B.png
Visualization: XIC • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
Relative Quantification using Peptide Identification Results • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Identification of peptides Data from LC-MS experiment Extracted Ion Chromatogram of peptide MS View Peak selection Area calculation
After Smoothing: Actual data: Selecting peaks: Selecting local maxima and minima Minima Maxima Minima Maxima Quantification: Peak Selection Algorithm • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
Quantification: Sample Results • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
How does Capillary Electrophoresis (CE) work? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References http://faculty.washington.edu/dovichi/UBUBTpage/research/Methods/CEintro/ceintro.GIF&imgrefurl=http://faculty.washington.edu/dovichi/UBUBTpage/research/Methods/CEintro/CE_LIF.html&h=531&w=684&sz=25&hl=en&start=3&um=1&tbnid=_JDf4X3dJn170M:&tbnh=108&tbnw=139&prev=/images%3Fq%3Dcapillary%2Belectrophoresis%26svnum%3D10%26um%3D1%26hl%3Den
Samples from different CE experiments: What does the data look like? • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
Mapping areas corresponding the same glycan from different samples Quantification of mapped peaks Biomarker Discovery using Glycomics – Overview • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Data from different samples CE Analyze Analysis of quantification for identifying Biomarkers
Time Direct Comparison: Dynamic Time Warping (DTW) • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • DTW algorithm aligns two time series having similar curves but are skewed differently over time. Source: http://db-www.aist-nara.ac.jp/theme/bioinfo_kenji-h_dtw.png
Direct Comparison: DTW continued… • Sakoe-Chuba Band is used to reduce time & space complexity. • Parameters used in DTW: • - Band width - Peak extention penalty • - Difference in peak intensities. • - Difference in peak direction • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Stan Aslvador and Philip Chan. FastDTW:Toward Accurate Dynamic Time Warping in Linear Time and Space, KDD Workshop on Mining Temporal and Sequential Data, 2004
Align to consensus sample Consensus Sample Align next sample to consensus sample Method: Dynamic Time Warping • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
Unaligned sample Aligned sample Corresponding peaks Calculate Area Peak 1 Method continued… • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Corresponding peaks
Results • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References Corresponding peaks
LC-MS data Quantification results for Biomarker Discovery Identified Peptides Summary • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References • Proteomics - MSView • Glycomics - CE Analyze Quantification results for Biomarker Discovery CE Data
Acknowledgements Dr. Haixu Tang - My advisor Dr. Randy J.Arnold Dr. Yehia Mechref Dr. Milos Novotny Dr. David E.Clemmer Dr. Sun Kim Dr. Jeong-Hyeon Choi Dr. Stephen J. Valentine Yin Wu Manolo D.Plasencia School of Informatics Funding: NIH/NCRR MetaCyt Initiative @ Indiana University • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References
References • Problem Definition • Background • LC-MS • Method • Results • CE • Method • Results • Acknowledgements • References [1] Higgs, R.E., Knierman, M.D., Gelfanova, V., Butle,r J.P. and Hale, J.E. (2005) Comprehensive label-free method for the relative quantification of proteins from biological samples. J. Proteome Res.,4, 1442-1450. [2] Linsen, L., Locherbach, J., Berth, M., Becher, D. and Bernhardy, J. (2006) Visual Analysis of Gel-Free Proteome Data. IEEE Transactions on Visualization and Computer Graphics,12, 497-508. [3] Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., and Schwikowski, B. (2006) Signal maps for mass spectrometry-based comparative proteomics. Mol. Cell. Proteomics 5, 423 –432 [4] Leptos, K. C., Sarracino, D. A., Jaffe, J. D., Krastins, B., and Church, G. M. (2006) MapQuant: open-source software for large-scale protein quantification. Proteomics 6, 1770 –1782 [5] Aebersold, R., and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198 –207