1 / 14

XML Standards for Proteomics Data

XML Standards for Proteomics Data. Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and Life Sciences, University of Glasgow. Proteomics. 2D-PAGE. 1. 1. 2D-PAGE to separate proteins. 2. 3. Image Analysis.

lynton
Download Presentation

XML Standards for Proteomics Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Standards for Proteomics Data • Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt • Department of Computing Science and • the Institute of Biomedical and Life Sciences, University of Glasgow

  2. Proteomics 2D-PAGE 1. 1. 2D-PAGE to separate proteins 2. 3. Image Analysis Mass Spectrometry 2. Image analysis to determine the volume of protein spots 3. Mass spectrometry (MS) to characterise protein spots Database Search 4. Database searches to identify proteins 4.

  3. Proteomics Data Issues Instruments • Many different instruments for data collection • Great variety of software used for analysis • Access to external databases • For protein identification • Protein characterisation after ID • High-throughput techniques generate very large data sets Scanner, MS Software Image analysis, MS viewer Databases Genome, microarray, publications, more...

  4. A Standard Model for Proteomics • Improve management of laboratory workflows • Data Integration: link local data to external data sources • Development of public databases, enabling: • Queries over protocols, raw data and analysis • Experiments to be reproduced or re-analysed by other research groups • Co-analysis of proteome data with genome, transcriptome and other resources

  5. Biological Collaborators • Parasitology research group • Investigating host-parasite response with Toxoplasma gondii • Ras/Raf pathway research at the Beatson institute • Functional Genomics facility at the IBLS Functional Genomics Facility - http://www.gla.ac.uk/departments/ibls/ASU/fgf/

  6. MAGE model for Proteomics • The MAGE model has been developed to store microarray protocols, data and analysis • A similar model will facilitate integration between microarray and proteome data • Aspects of the model require few modifications to be applicable to proteomics • We are developing a new representation of 2D gel analysis and MS data

  7. Experimental Protocols in MAGE Protocol • MAGE model is extensible • Protocol is generated as an ordered list: events, materials and hardware • Few changes required to focus on protein extraction rather than mRNA production ArrayDesign BioEvent BioMaterial BioAssay Array

  8. Experimental Protocols for 2D gels Protocol • MAGE model is extensible • Protocol is generated as an ordered list: events, materials and hardware • Few changes required to focus on protein extraction rather than mRNA production 2D_PAGE_Setup BioEvent BioMaterial BioAssay 2D_PAGE

  9. Proteomics Data Model • Image analysis identifies spots observable on the gel • Important to store raw data and analysis from MS • Separate package for cross gel analysis e.g. time series MS_Setup MS_Data BioSequence Protein_Spots Data_Analysis 2D_PAGE Multiple_ Analysis Link From Protocol

  10. Proteomics Model Protocol Protocol BioEvent 2D_PAGE_Setup • Experimental protocol packages require few changes from MAGE • New data model includes MS data and statistical analysis between gels • Model incorporates storage of external database searches BioMaterial BioAssay Data 2D_PAGE Experiment Protein_ Spots Multiple_ Analysis Data_ Analysis MS_Setup MS_Data BioSequence Annotation Audit& Security Common BQS Description Measurement

  11. Proteomics Database and Indexing Technology • A prototype database for proteomics has been developed • We have developed a specialised index structure for XML, in order to improve query performance • The performance of the index has currently been tested with 800MB of protein data1 Data Stores XML Index 6 2 Data Path Tree 7 1 3 8 4 XML Dictionary 1 Experiment 2 gelImage 3 spots 4 spot … 9 1. Protein Information Resource - http://pir.georgetown.edu/

  12. Related Research Databases: • SWISS-2DPAGE, LIMS systems Standards: • Proteomics Standards Initiative (PSI) • Standards for protein-protein interactions and mass spectrometry • PEDRo system with PEML: Proteomics experiment markup language • PSI: http://psidev.sourceforge.net/

  13. Work In Progress • Work towards an XML standard for proteomics • Create standards for capturing statistical processing of large data sets • Developing XML indexing technology to improve data integration and query power • Developing a proteome database utilising XML indexing and a standard model

  14. Contact jonesa@dcs.gla.ac.uk Bioinformatics Research Centre - www.brc.dcs.gla.ac.uk Acknowledgements Researchers in Jonathan Wastling lab for input into the model. Dr Ashwin Kotiwaliwale at the Beatson for the collaboration on the prototype database. The Functional Genomics Facility is supported by a Wellcome Trust grant for £2.4M. My research is supported by an MRC Bioinformatics PhD studentship, Ela Hunt is supported by an MRC Fellowship.

More Related