1 / 24

Gaurav Chadha Deepak Desore

Inferring Functional Information from Domain co-evolution Yohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama and Shankar Subramaniam. Gaurav Chadha Deepak Desore. Layout. Motivation Computational Methods and Algorithms Results Conclusion Questions. Motivation (1 of 2..). Prior Work

rafi
Download Presentation

Gaurav Chadha Deepak Desore

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring Functional Information from Domain co-evolutionYohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama andShankar Subramaniam Gaurav Chadha Deepak Desore

  2. Layout • Motivation • Computational Methods and Algorithms • Results • Conclusion • Questions

  3. Motivation (1 of 2..) • Prior Work • Focused on understanding Protein function at the level of entire protein sequences • Assumption: Complete Sequence follows single evolutionary trajectory • It is well known that a domain can exist in various contexts, which invalidates the above assumption for multi-domain protein sequences

  4. Motivation (2 of 2 ..) • Our approach • Improvement of Multiple Profile method • Constructs Co-evolutionary Matrix to assign phylogenetic similarity scores to each protein pair • Identifies Co-evolving regions using residue-level conservation

  5. Computational Methods & Algorithms • Constructing phylogenetic profiles • Protein(single) phylogenetic profiles • Segment(Multiple) phylogenetic profiles • Residue phylogenetic profiles • Computing Co-evolutionary matrices • Deriving phylogenetic similarity scores

  6. Protein phylogenetic profiles • Phylogenetic profile is a vector which tells about the existence of a protein in a genome. • Let P = {P1,P2,…,Pn} be the set of proteins and, G = {G1,G2,…,Gm} be the set of Genomes • Every row represents binary phylogenetic profile of a protein.

  7. Protein phylogenetic profiles(contd.) • Single phylogenetic profile ψi for protein Pi is, ψi(j) = - 1 , 1 <= j <= m log(Eij) where Eij is minimum BLAST E-value of local alignment between Pi and Gj • Advantage: gives degree of sequence divergence

  8. Protein phylogenetic profiles(contd.) • Mutual Information I(X,Y) defined as, I(X,Y) = H(X) + H(Y) – H(X,Y), where H(X), Shannon Entropy of X is defined as, H(X) = ∑ px * log(px), x Є X and px = P[X = x] • Phylogenetic similarity between ψi(j) and ψi(j) is, μs(Pi,Pj) = I(ψi, ψi)

  9. Segment phylogenetic profiles • Single profile based methods could miss significant interactions. • Domain D12 of P2 follows evolutionary trajectory similar to P1 and P3 which single profile method didn’t capture.

  10. Segment phylogen. profiles(contd.) • Dividing each protein Pi into fixed size segments S1i,S2i,…,Ski • Phylogenetic similarity between two proteins, μM(Pi,Pj) = max I(ψsi, ψtj), s,t where ψsi is phylogenetic profile of segment Ski of protein Pi

  11. Residue phylogenetic profiles • Problem with multiple phylogenetic profiles: • Both domains covered together by the segment S22, overriding their individual phylogenetic profiles. • Significant local alignment between two proteins corresponds to the residues covered in the alignment rather than the whole sequences.

  12. Residue phylog. profiles(contd.) • A(Pi,Gj) – set of significant local alignments between Protein Pi and Genome Gj • T(A) = [rb,re] – interval of residues on Pi corresponding to each alignment A Є A(Pi,Gj) • For each residue r on Pi phylogenetic profile is ψri(j) = min - 1 , 1 <= j <= m A Є A r log(E(A)) Ar = {A Є A(Pi,Gj): r Є T(A)} is the set of local alignments that contain r

  13. Computing co-evolutionary matrices • For each protein pair Pi and Pj with lengths li and lj, co-evolutionary matrix entry Mij(r,s) is, Mij(r,s) = I (ψri, ψsj), where 1 <= r <= li and 1 <= s <= lj • The Co-evolutionary Matrix contains • Information about which regions of the two proteins co- evolved • The co-evolved domain(s) appear as a block of high mutual information scores in the matrix

  14. Deriving phylogenetic similarity scores • Phylogenetic similarity scores between two proteins Pi and Pj is, μC(Pi,Pj) = max min Mij(a,b) 1<= r <= li r <= a <= r + W 1<= s <= lj s <= a <= s + W where W is the window parameter that quantifies the minimum size of the region on a protein to be considered as a conserved domain.

  15. Results • Implemented and tested on 4311 E.coli proteins • 152 Genomes(131 Bacteria,17 Archaea,4 Eukaryota) • Value of f (down-sampling factor) = 30, W = 2 • These values translate in overlapping segments of 60 residue long • Excluded homologous proteins from analysis • Define p-value as fraction of non-homologous protein pairs (N)

  16. Results (contd.) • MIS – Mutual Information Score • PP – No. of predicted protein pairs • PPV = TP / (TP + FP) • For all μ*, coverage = TP + FP • TN and FN are the no. of protein pairs that do not meet the threshold

  17. Results (contd.) • Co-evolutionary matrix has 1.5 times greater coverage at PPV = 0.7 than the single profile method • At same no. of PP, Co-evolutionary matrix has better PPV and sensitivity values than single profile method

  18. Results (contd.) Mutual Information score distribution for interacting and non-interacting protein pairs • At 0 MIS, SP shows a peak while CM doesn’t. In other ways, at low MIS scores, SP scores over CM

  19. Results (contd.) • Shows p-values of Single Profile method v/s Co-evolutionary Matrix method • Scattered circles show that the two methods can predict very differently

  20. Results (contd.) – Phosphotransferase system • Domain IIA(residues 1-170) and domain IIB(residue 170-320) • Darker region shows that the domains have co-evolved. So we can conclude that IIB evolved with IIC rather than IIA • Top-20 predicted interacting partners of protein IIAB for both methods

  21. Results (contd.) - Chemotaxis • N-terminus of CheA(residues 1-200) and C-terminus of CheA(residues 540-670) co-evolved with C-terminus region of CheB (residues 170-340) • Top-20 predicted interacting partners of protein CheA using both methods

  22. Results (contd.) – Kdp System • N-terminal domain of KdpD (residues 1-395) co-evolved with KdpC • Top-10 predicted interacting partners of protein KdpD using both methods

  23. Conclusion • Results in this paper strongly suggest that co-evolution of proteins should be captured at the domain level • Because domains with conflicting evolutionary histories can co-exist in a single protein sequence • Regions that are important for supporting both functional and physical interactions between proteins can be detected

  24. Questions Thank You !!

More Related