1 / 7

Bi-Clustering

Bi-Clustering. COMP 790-90 Seminar Spring 2011. Definition of OP-Cluster.

brook
Download Presentation

Bi-Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bi-Clustering COMP 790-90 Seminar Spring 2011

  2. Definition of OP-Cluster • Let I be a subset of genes in the database. Let J be a subset of conditions. We say <I, J> forms an Order Preserving Cluster (OP-Cluster),if one of the following relationships exists for any pair of conditions. Expression Levels A1 A2 A3 A4 when

  3. Problem Statement • Given a gene expression matrix, our goal is to find all the statistically significant OP-Clusters. The significance is ensured by the minimal size threshold nc and nr.

  4. Conversion to Sequence Mining Problem Sequence: Expression Levels A1 A2 A3 A4

  5. Ming OP-Clusters: A naïve approach root • A naïve approach • Enumerate all possible subsequences in a prefix tree. • For each subsequences, collect all genes that contain the subsequences. • Challenge: • The total number of distinct subsequences are a a b c d b b c d a c d … c d d b d b c c d a d … d c d b c b d c d a … A Complete Prefix Tree with 4 items {a,b,c,d}

  6. a:3 d:2 d:3 c:2 c:3 Mining OP-Clusters: Prefix Tree • Goal: • Build a compact prefix tree that includes all sub-sequences only occurring in the original database. • Strategies: • Depth-First Traversal • Suffix concatenation: Visit subsequences that only exist in the input sequences. • Apriori Property: Visit subsequences that are sufficiently supported in order to derive longer subsequences. Root a:1,2 a:1,2,3 a:1,2 a:1,2,3 b:3 d:1 d:1,2,3 d:1,2,3 d:1,3 d:1,3 b:2 a:3 b:1 c:1,3 c:1,2,3 d:2 d:3 c:1 c:2 c:3

  7. References • J. Yang, W. Wang, H. Wang, P. Yu, Delta-cluster: capturing subspace correlation in a large data set, Proceedings of the 18th IEEE International Conference on Data Engineering (ICDE), pp. 517-528, 2002. • H. Wang, W. Wang, J. Yang, P. Yu, Clustering by pattern similarity in large data sets, to appear in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2002. • Y. Sungroh,  C. Nardini, L. Benini, G. De Micheli, Enhanced pClustering and its applications to gene expression data Bioinformatics and Bioengineering, 2004. • J. Liu and W. Wang, OP-Cluster: clustering by tendency in high dimensional space, ICDM’03.

More Related