1 / 148

Proteomics via Mass Spectrometry (a bioinformatics perspective)

Proteomics via Mass Spectrometry (a bioinformatics perspective). Vineet Bafna www.cse.ucsd.edu/~vbafna. Nobel Citation 2002. Nobel Citation, 2002. Proteomics via MS. Enzymatic Digestion (Trypsin) +. Fractionation. Q: Sufficient to identify peptides?. Peptide MS.

xantha-buck
Download Presentation

Proteomics via Mass Spectrometry (a bioinformatics perspective)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteomics via Mass Spectrometry (a bioinformatics perspective) Vineet Bafna www.cse.ucsd.edu/~vbafna Bafna

  2. Nobel Citation 2002 Bafna

  3. Nobel Citation, 2002 Bafna

  4. Proteomics via MS Enzymatic Digestion (Trypsin) + Fractionation Q: Sufficient to identify peptides? Bafna

  5. Peptide MS • Instrument software usually detects peaks, and computes features (peak, area, m/z…) m/z Bafna

  6. Single Stage MS Mass Spectrometry Bafna

  7. MS versus Micro-array sample sample cDNA Protein/Peptide? • Unlike micro-array, peptide id is not trivial at the end of the MS experiment! • Identification is an important part of pre-processing Bafna

  8. MS based proteomics • Identification • Identify all the proteins in the proteome, specific organelles, specific pathways, complexes… • Quantitation • Is a protein differentially-expressed in certain conditions? • Others • Protein 3D structure, protein protein interactions,… We will consider an informatics-centered perspective Bafna

  9. Protein Identification • The preferred mode is through tandem mass spectrometry of peptides. • Is identifying peptides sufficient? • Rough probability for co-occurrence of a 15-aa peptide? With higher accuracy instruments, it may be possible to do intact proteins as well. Bafna

  10. Tandem MS of peptides Secondary Fragmentation Ionized parent peptide Bafna

  11. The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei Bafna

  12. Ionization H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei The peptide backbone breaks to form fragments with characteristic masses. Ionized parent peptide Bafna

  13. Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CONH-CH-CO-NH-CH-CO-…OH Ri Ri+1 Ri-1 C-terminus N-terminus AA residuei-1 AA residuei AA residuei+1 Ionized peptide fragment Bafna

  14. Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H]2+ 0 250 500 750 1000 Bafna m/z

  15. Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 Peak assignment implies Sequence (Residue tag) Reconstruction! y7 % Intensity [M+2H]2+ y5 b3 b4 y2 y3 b5 y4 y8 b8 b9 b6 b7 y9 0 250 500 750 1000 Bafna m/z

  16. Ion types, and offsets H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei • P = prefix residue mass • S = Suffix residue mass • b-ions = P+1 • (NH2-CHR-CO-..-NH-CHR-CO(+)) • y-ions = S+19 • (NH3(+)-CHR-CO-..NH-CHR-COOH) • a-ions = P-27, and so on.. Bafna

  17. MS Quiz: • Why aren’t all tandem MS peaks of the same intensity? • Do the intensities for a peptide vary from spectrum to spectrum? Bafna

  18. Database Searching for peptide ID • For every peptide from a database • Reject if it has the wrong mass, else: • Generate a hypothetical spectrum • Compute a correlation between observed and experimental spectra • Choose the best • Database searching is very powerful and is the de facto standard for MS. • Sequest, Mascot, Inspect, and many others …SARLSQETFSDLWKLLPENNVLSPLP…. Bafna

  19. So what’s new? • The Id picture is very simplistic. Only 20-30% of spectra are conclusively identified. • Many reasons: • Spectra are noisy. • Databases are incomplete. Sometimes, we need to do a de novo interpretation • Post-translational modifications. • Instrument performance is critical. • The algorithms for identification must be sensitive to these issues. • We present a systematic look at identification software. Bafna

  20. Modules for Peptide Id D S V I/F • Interpretation (D) • Input Spectrum • Output: all that can be extracted from the spectrum (peptides/tags/parent mass/charge) • Indexing/Filtering • Input: Db (set of peptides) • Output: pre-processing of the database, peptide subset. • Scoring • Input; peptide set, spectrum • Output: ranked list of scores • Validation • Significance of the top hit. Db Bafna

  21. De novo interpretation of mass spectra D S V I/F • The so called de novo algorithms focus exclusively on the D module. • There is no database (I/F). • Limited scoring and validation • Important when no database exists! • Also important for db search Bafna

  22. De Novo Interpretation: Example 100 200 300 400 500 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions y 2 y 1 b 1 b 2 M/Z Bafna

  23. The simplest case • Suppose only (and all) the prefix ions were visible. Would identification be easy? • We have two problems: • There is a mix of b and y ions. Separating them is critical! • Other ions besides b,y, including neutral losses, noise and so on. We need to account for them. 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions S 88 G 145 E 274 K 402 Bafna

  24. Separating b-, and y-ions is solved using a combinatorial formulation (forbidden pairs) • Separating b,y from all others is solved using a statistical approach. • Together, they form the basis for a de novo sequencer. Bafna

  25. De Novo Interpretation: Example 100 200 300 400 500 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions Ion Offsets b=P+1 y=S+19=M-P+19 y 2 y 1 b 1 b 2 M/Z Bafna

  26. Computing possible prefixes • We know the parent mass M=401. • Consider a mass value 88 • Assume that it is a b-ion, or a y-ion • If b-ion, it corresponds to a prefix of the peptide with residue mass 88-1 = 87. • If y-ion, y=M-P+19. • Therefore the prefix has mass • P=M-y+19= 401-88+19=332 • Compute all possible Prefix Residue Masses (PRM) for all ions. Bafna

  27. Putative Prefix Masses • Only a subset of the prefix masses are correct. • The correct mass values form a ladder of amino-acid residues Prefix Mass M=401 b y 88 87 332 145 144 275 147 146 273 276 275 144 S G E K 0 87 144 273 401 Bafna

  28. Spectral Graph • Each prefix residue mass (PRM) corresponds to a node. • Two nodes are connected by an edge if the mass difference is a residue mass. 87 G 144 Bafna

  29. Spectral Graph 0 273 332 401 87 144 146 275 100 200 300 S G E K • Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. • Spectral graph: • Each node u defines a putative prefix residue M(u). • (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. • Paths in the spectral graph correspond to a interpretation Bafna

  30. Re-defining de novo interpretation 0 273 332 401 87 144 146 275 100 200 300 S G E K • Find a subset of nodes in spectral graph s.t. • 0, M are included • Each peak contributes at most one node (interpretation)(*) • Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) • An appropriate objective function (ex: the number of peaks interpreted) is maximized 87 G 144 Bafna

  31. Two problems 0 273 332 401 87 144 146 275 100 200 300 S G E K • Too many nodes. • A. Only a small fraction correspond to b/y ions (leading to true PRMs). • B. Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). • In general, the forbidden pairs problem is NP-hard Bafna

  32. However,.. • The b,y ions have a special non-interleaving property • Consider pairs (b1,y1), (b2,y2) • Note that b1+y1 = b2+y2 • If (b1 < b2), then y1 > y2 Bafna

  33. Non-Intersecting Forbidden pairs 100 0 400 200 • If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, • The de novo problem can be solved efficiently using a dynamic programming technique. 332 300 87 S G E K Bafna

  34. The forbidden pairs method • There may be many paths that avoid forbidden pairs. • We choose a path that maximizes an objective function, • EX: the number of peaks interpreted • Here we assume a function , which gives a score to a PRM. The score captures the likelihood that the PRM is correct. Bafna

  35. The forbidden pairs method 332 100 300 0 400 200 87 • Sort the PRMs according to increasing mass values. • For each node u, f(u) represents the forbidden pair • Let m(u) denote the mass value of the PRM. f(u) u Bafna

  36. D.P. for forbidden pairs • Consider all pairs u,v • m[u] <= M/2, m[v] >M/2 • Define S(u,v) as the best score of a forbidden pair path from 0->u, v->M • Is it sufficient to compute S(u,v) for all u,v? 332 100 300 0 400 200 87 u v Bafna

  37. D.P. for forbidden pairs • Note that the best interpretation is given by 332 100 300 0 400 200 87 u v Bafna

  38. D.P. for forbidden pairs • Denote the forbidden pair of node v by f(v). • What is f(f(v))? • Note that we have one of two cases. • Either u < f(v) (and f(u) > v) • Or, u > f(v) (and f(u) < v) • Case 1. • Extend v, do not touch f(u) 100 300 0 f(u) 400 200 u Bafna v w

  39. The complete algorithm for all u /*increasing mass values from 0 to M/2 */ for all v /*decreasing mass values from M to M/2 */ if (u > f[v]) else if (u < f[v]) If (u,v)E /*maxI is the score of the best interpretation*/ maxI = max {maxI,S[u,v]} Bafna

  40. De Novo: Second issue • Given only b,y ions, a forbidden pairs path will solve the problem. • However, recall that there are MANY other ion types. • Typical length of peptide: 15 • Typical # peaks? 50-150? • #b/y ions? • Most ions are “Other” • a ions, neutral losses, isotopic peaks…. Bafna

  41. De novo: Weighting nodes in Spectrum Graph • Factors determining if the ion is b or y • Intensity • Support ions • b- and y-ions are the most likely ions to lose water/ammonia • Isotopic peaks Bafna

  42. Offset frequency function • b, and y-ions show offsets due to neutral losses Bafna

  43. De novo: Weighting nodes • A probabilistic network to model support ions (Pepnovo) Bafna

  44. De Novo Interpretation Summary • The main challenge is to separate b/y ions from everything else (weighting nodes), and separating the prefix ions from the suffix ions (Forbidden Pairs). • As always, the abstract idea must be supplemented with many details. • Noise peaks, incomplete fragmentation • A PRM is first scored on its likelihood of being correct, and the forbidden pair method is applied subsequently. Bafna

  45. Db search versus de novo interpretation Db 55M peptides Filter Score Validation Traditional db search simply have the scoring module. De novo is useful when the peptide is not in the database, but not as accurate. It can be thought of as a database search over a much larger database. PT modifications change the picture . De novo Bafna

  46. Filtering Candidate Peptides (700) Db 55M peptides Filter extension Score Validation De novo Db indexing/filtering is a key mechanism for reducing the search space Bafna

  47. Filtering • Define a filter as a computational tool that rapidly screens a database, removing much of it but retaining the true peptide. • Can you suggest commonly used filters? • Parent mass • Trypsin digested peptides Bafna

  48. Parent Mass filter • Sort all peptides in the database by their parent mass. • Search only the peptides that are within some mass tolerance. • The filter does not work when you have modifications. Bafna

  49. The dynamic nature of the proteome • The proteome of the cell is changing • Various extra-cellular, and other signals activate pathways of proteins. • A key mechanism of protein activation is PT modification • These pathways may lead to other genes being switched on or off • Mass Spectrometry is key to probing the proteome Bafna

  50. Db search for putatively modified peptides. • Ex:YFDSTDYNMAK • 25=32 possibilities, with 2 types of modifications! • In contrast, de novo search space does not change significantly. oxidation Phosphorylation? For each peptide, generate all mods. Score each modification Is parent mass still a good filter? Bafna

More Related