1 / 45

Improving Sensitivity by Combining Results from Multiple Search Methodologies

Improving Sensitivity by Combining Results from Multiple Search Methodologies . Brian C. Searle Proteome Software Inc. Portland, OR Brian.Searle@ProteomeSoftware.com MBI workshop on Computational Proteomics and Mass Spectrometry (January 11-14, 2005) . The Analytical Challenge.

libitha
Download Presentation

Improving Sensitivity by Combining Results from Multiple Search Methodologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Sensitivity by Combining Results from Multiple Search Methodologies Brian C. Searle Proteome Software Inc. Portland, OR Brian.Searle@ProteomeSoftware.com MBI workshop on Computational Proteomics and Mass Spectrometry (January 11-14, 2005)

  2. The Analytical Challenge Biological Samples Control Experiments Q-TOF Unknown Spectra IonTrap

  3. The Analytical Challenge • Why can you only interpret half as much MS/MS data in experiments you actually care about? • What is going on with the remaining 90% unidentified spectra?

  4. The OpenSea Approach De Novo Sequence: YD[Cc]DD[220]GADHFTY[200]R OpenSea Alignment: Crystallin, S (CRBS_HUMAN) GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS || | | X X X || | || | | YD(Cc)D(D )([220])(G )AD(HF)TY([200])R

  5. de novo Sequence YD[Cc]DD[220]GADHFTY[200]R 163-115-160-115-115-220-57-71-…

  6. de novo Sequence … YD[Cc]DD[220]GADHFTY[200]R 163-115-160-115-115-220-57-71-… G-57 R-156 R-156 Y-163 D-115 C-160 D-115 C-160 D-115 C-160 A-71 Database Sequence …

  7. de novo Sequence … YD[Cc]DD[220]GADHFTY[200]R 163-115-160-115-115-220-57-71-… G-57 R-156 R-156 Y-163 D-115 C-160 D-115 C-160 D-115 C-160 A-71 Database Sequence …

  8. Auto-Interpretation of OpenSea Results OpenSea Alignment: GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS || | | X X X || | || | | YD(Cc)D(D )([220])(G )AD(HF)TY([200])R +14 AMU on either cysteine or -43 AMU on aspartic acid… Modification lookup table suggests methylation of cysteine! Auto-Interpretation: GRRYD(Cc)D( CmDCc )AD(FH)TY( LS )RCNS || | | : || | || | | YD(Cc)D(D[220]G)AD(HF)TY([200])R

  9. Spectrum Identification Overlap Between Search Methods SEQUEST 6% 17% 7% 41% X!Tandem 10% 10% OpenSea PTMs polymorphisms 9%

  10. Spectrum Identification Overlap Between Search Methods SEQUEST neutral losses 6% 17% 7% 41% X!Tandem semi-tryptic no ladder 10% 10% OpenSea 9%

  11. Scaffold Data Compiler • Combine SEQUEST, Mascot, X!Tandem, and OpenSea results • Utilize spectrum clustering and noise filters to remove uninteresting spectra • Export interesting, unidentified spectra for further analysis Search Wider Drill Deeper Remove Junk Focus Efforts Combine Database Searching IDs Cluster Spectra to Previously IDs Report Interesting, Unidentified Spectra Filter Electronic Noise For All Spectra

  12. Combining SEQUEST and X!Tandem Scores X!Tandem –log(E-Value) Score SEQUEST Descriminant Score (Peptide Prophet, ISB)

  13. Combining SEQUEST and X!Tandem Scores X!Tandem –log(E-Value) Score SEQUEST Descriminant Score (Peptide Prophet, ISB)

  14. Peptide Prophet (ISB) Incorrect IDs p=50% Correct IDs

  15. Protein Prophet (ISB) Protein 1 Protein 7 Peptide 1 Protein 4 Peptide 2 Peptide 3 Protein 2 Protein 8 Peptide 4 Protein 5 Peptide 5 Protein 3 Peptide 6 Protein 6 Peptide 7

  16. Protein Prophet (ISB) Protein 1 Protein 7 Peptide 1 Protein 4 Peptide 2 Peptide 3 Protein 2 Protein 8 Peptide 4 Protein 5 Peptide 5 Protein 3 Peptide 6 Protein 6 Peptide 7

  17. Incorrect IDs p(NSP|-) Correct IDs p(NSP|+) Normalized Distribution For each spectrum… IDs with: high NSP--p Low NSP--p NSP Bin Number Log p(NSP|+)/p(NSP|-) Correct IDs have higher NSP Values

  18. Peptide Prophet Protein Prophet Get SEQUEST IDs Calculate SEQUEST Probability Get Mascot IDs Calculate Mascot Probability Calculate Combined Peptide Probability For Each Spectrum Calculate Protein Probabilities Get X!Tandem IDs Calculate X!Tandem Probability Scaffold Merge Prophet Get OpenSea IDs Calculate OpenSea Probability …

  19. Peptide 1 Get SEQUEST Identification p=85% p=76% Get Mascot Identification Peptide 2 For Each Spectrum Get X!Tandem Identification p=54% Peptide 3 Get OpenSea Identification

  20. Peptide 1 Get SEQUEST Identification Peptide 4 p=27% Get Mascot Identification Peptide 2 p=81% For Each Spectrum Peptide 5 Get X!Tandem Identification p=35% Peptide 3 Get OpenSea Identification

  21. Peptide 1 Peptide 7 Get SEQUEST Identification Peptide 4 Get Mascot Identification Peptide 2 Peptide 8 For Each Spectrum Peptide 5 Get X!Tandem Identification Peptide 3 Peptide 6 Get OpenSea Identification

  22. Protein Prophet’s NSP value (number of sibling peptides) becomes… Merge Prophet’s number of sibling programs

  23. Incorrect IDs p(NSP|-) Correct IDs p(NSP|+) Normalized Distribution For each spectrum… IDs with: high NSP--p Low NSP--p NSP Bin Number Log p(NSP|+)/p(NSP|-) Correct IDs have higher NSP Values

  24. Accuracy of the Probability Combining Model Mascot X!Tandem Calculated Probability Combination SEQUEST Actual Probability

  25. Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By SEQUEST (40%) Unknown Spectra (60%)

  26. Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (60%) Unknown Spectra (40%)

  27. Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (73%) Unknown Spectra (27%)

  28. #1 #2

  29. #1 #2

  30. #1 #2

  31. #1 #2

  32. #2 #3

  33. Protein Prophet Find Spectra Similar to Previously Identified Report Interesting, Unidentified Spectra Calculate Combined Probability Calculate Protein Probabilities Filter Electronic Noise Scaffold Merge Prophet Scaffold Cluster Prophet

  34. Cluster Prophet Principle If an unidentified spectrum is 95% similar to a correctly identified spectrum… it is also considered to be identified.

  35. Rank-Based Cluster Similarity Score Incorrect IDs p=50% Correct IDs

  36. MS/MS Spectrum Filter • Dynamic range filter removes spectra from peptides with poor/no fragmentation • Signal to noise filter removes electronic noise

  37. Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (73%) Unknown Spectra (27%)

  38. Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (74%) Unknown Spectra (5%) Not Interesting (21%)

  39. Percentage of 2D-LC QTOF Spectra Correctly Identified as Lens Proteins Identified By Scaffold (48%) Unknown Spectra (21%) Not Interesting (31%)

  40. The Analytical Challenge Biological Samples Control Experiments IDed by SEQUEST IDed by SEQUEST Q-TOF Unknown Spectra Unknown Spectra IDed by SEQUEST IDed by SEQUEST IonTrap Unknown Spectra Unknown Spectra

  41. The Analytical Challenge Biological Samples Control Experiments IDed by Scaffold IDed by Scaffold Q-TOF Unknown Spectra Unknown Spectra 85% more IDs 95% comprehension 336% more IDs 79% comprehension IDed by Scaffold IDed by Scaffold IonTrap Unknown Spectra Unknown Spectra 48% more IDs 65% comprehension 227% more IDs 75% comprehension

  42. Conclusions • Using Scaffold technologies, you can drill deeper and search wider using multiple database searching approaches and MS/MS spectrum clustering • Scaffold and implementations of Peptide/Protein Prophet were written in platform-independent Java • Scaffold will be available at ASMS 2005

  43. OpenSea Team (OHSU) Srinivasa Nagalla Surendra Dasari Ashok Reddy Larry David Phil Wilmarth Ashley McCormack Contact: nagallas@ohsu.edu Scaffold Team (Proteome Software Inc.) Mark Turner James Brundege Contact: Brian.Searle@ ProteomeSoftware.com Acknowledgements

More Related