1 / 30

Storytelling and Clustering for Cellular Signaling Pathways

Storytelling and Clustering for Cellular Signaling Pathways. M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia Tech, Blacksburg, VA 24061. Objective. STKE Dataset Cell interactions through chemical signals

spike
Download Presentation

Storytelling and Clustering for Cellular Signaling Pathways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storytelling and Clustering for Cellular Signaling Pathways M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia Tech, Blacksburg, VA 24061.

  2. Objective • STKE Dataset • Cell interactions through chemical signals • Discover relationships between the pathways • Graph structure • Subgraph discovery problem • Pathways relationships • Clustering • Storytelling

  3. Myocyte Adrenergic Pathway (CMP_9043)

  4. Dataset properties

  5. Design Pipeline Pathway Graphs STKE Dataset Preprocessor Frequent Subgraphs Frequent Subgraph Discovery Clustering NN Storytelling

  6. Subsequent Candidate Generation l q q l p p p m o m o m o n n n • Apriori – incremental approach [17] • FSG [2] • Generate a (k+1)-edge candidate subgraph by combining two k-edge subgraphs where these two k-edge subgraphs have a common core subgraphof (k-1)-edges. • Cost of comparison between subgraphs (and core subgraphs) is reduced using hash-code of each subgraph object.

  7. Subsequent Candidate Generation q l p q l p p o m o o m n m n n z t r o l l l m p p p n p o o o m m m o m n n n r n l p o m n l p p s s o o m m n n • Instance: • Number of 5-edge subgraphs: 21 • Core subgraph comparisons for s1: 20 Not generated …………………………………………. ………………………………................ ………………………………………….

  8. Master Pathway Graph (MPG) Total Unique Nodes:1205 Total Relations:1376

  9. SEG - Subgraph Extension Generation q l p n m m o s o n p l l q r p r l m p o m o n n l p s m o n • Neighborhood Extension • Neighborhood list : {q, r, s} • Comparison is not required. • Subgraph is extended from physical evidence

  10. Design Pipeline Pathway Graphs STKE Dataset Preprocessor Frequent Subgraphs Frequent Subgraph Discovery Clustering NN Storytelling

  11. Subgraph Discovery • What so novel about pruning edges? min_sup=2%

  12. ‘Importance Factor’ of a subgraph: sfipf • For i-th subgraph j-th pathway: Subgraph frequency, Inverse pathway frequency,

  13. Dataset Properties (sfipf) Number of edges in MPG=1376 Total pathways=50

  14. Subgraph Discovery

  15. Subgraph Discovery

  16. Subgraph Discovery Overall attempts saved = 89.52% Overall time saved = 99.39%

  17. Clustering • Hierarchical Agglomerative Clustering (HAC) • k-means • Unsupervised measure of clusters’ validity • Average Silhouette Coefficient (ASC) [19]

  18. Clustering

  19. Clustering

  20. Design Pipeline Pathway Graphs STKE Dataset Preprocessor Frequent Subgraphs Frequent Subgraph Discovery Clustering NN Storytelling

  21. Pathway Relations (StoryTelling) p1 p7 p2 p8 S T p3 p9 • Bidirectional Search • Cover tree for NN

  22. Day-to-day life example From Roman Holiday From Terminator 3 From: Roman Holiday To: Terminator 3

  23. Examples in STKE • http://people.cs.vt.edu/msh/infoviz/3/

  24. Pathway Relations (StoryTelling)

  25. Pathway Relations (StoryTelling)

  26. Pathway Relations (StoryTelling)

  27. Future Directions • Compare our SEG graph methods with text based clustering and storytelling • Examine costs and benefits for combining text and graph mining techniques

  28. References [1] Science Signaling, The signal Transduction Knowledge Environment (STKE), "The Database of Cell Signaling", http://stke.sciencemag.org/cm/ [2] Kuramochi, M. and Karypis, G., "An efficient algorithm for discovering frequent subgraphs", IEEE Transactions on KDE, Vol. 16(9), September 2004, pp. 1038-1051. [3] Breslin, T., Krogh, M., Peterson, C., and Troein, C., "Signal transduction pathway profiling of individual tumor samples", BMC Bioinformatics, June 29, 2005. [4] Kumar, D., Ramakrishnan, N., Helm, R. F., and Potts, M., "Algorithms for Storytelling", IEEE Transactions on KDE, Vol. 20(6), June 2008, pp. 736-751. [5] Ratprasartporn, N., Cakmak, A., and Ozsoyoglu, G., "On Data and Visualization Models for Signaling Pathways", 18th SSDBM, 2006, pp. 133-142. [6] Xu, X., and Yu, Y., "Modeling and Verifying WNT Signaling Pathway", 3rd Intl. Conf. on ICNC. 2007, Vol. 2, pp. 319 - 323. [7] Schreiber, F., "Comparison of metabolic pathways using constraint graph drawing", 1st Asia-Pacific bioinformatics Conf. on Bioinfo., Australia, Vol. 19, 2003, pp. 105 - 110. [8] Abello, J., van Ham, F., and Krishnan, N., "ASKGraphView: A Large Scale Graph Visualization System", IEEE Transactions on Visualization and Computer Graphics, Vol. 12(5), 2006, pp. 669 - 676. [9] Miyake, S., Tohsato, A., Takenaka, Y., and Matsuda, H. "A clustering method for comparative analysis between genomes and pathways", 8th Intl. Conf. on Database Systems for Advanced Applications, March 2003 pp. 327 - 334.

  29. References [10] Yan, X., and Han, J. "gSpan: graph-based substructure pattern mining", IEEE ICDM, 2002, pp. 721-724. [11] Moti, C., and Ehud, G. "Diagonally Subgraphs Pattern Mining", 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2004, pp. 51-58. [12] Ketkar, N., Holder, L., Cook, D., Shah, R., and Coble, J. "Subdue: Compression-based Frequent Pattern Discovery in Graph Data", ACM KDD Workshop on Open-Source Data Mining, August 2005, pp. 71-76. [13] Zhang, T., Ramakrishnan, R., and Livny, M., "BIRCH: An Efficient Data Clustering Method for Very Large Databases", ACM SIGMOD Intl. Conf. on Management of Data, Canada, 1996, pp. 103-114. [14] Wagsta, K., Cardie, C., Rogers, S., and Schroedl, S., "Constrained K-means Clustering with Background Knowledge", ICML 2001, pp. 577-584. [15] Lin, F., and Hsueh, C. M., "Knowledge map creation and maintenance for virtual communities of practice", Intl. Journal of Information Processing and Management, ACM, Vol. 42(2), 2006, pp. 551-568. [16] Beygelzimer, A., Kakade, S., Langford, J., "Cover trees for nearest neighbor", ICML 2006, pp. 97-104. [17] Agrawal, R., and Srikant, R. "Fast Algorithms for Mining Association Rules", Intl. Conf. on Very Large Data Bases, Santiago, Chile, September 1994, pp. 487-499. [18] Agrawal, R., Mehta, M., Shafer, J., Srikant, R., Arning, A. and Bollinger, T. "The Quest Data Mining System", KDD'96, USA, 1996, pp. 244-249. [19] Tan, P. N., Steinbachm, M., and Kumar, V., "Introduction to Data Mining", Addison-Wesley, ISBN: 0321321367, April 2005, pp. 539-547. [20] http://people.cs.vt.edu/amonika/infoviz/

  30. Thank You

More Related