1 / 27

Mining Significant Graph Patterns by Leap Search

This paper discusses the mining of significant graph patterns using leap search and objective functions, such as frequency, discriminative measures, and significance. The authors explore challenges such as non-monotonicity and propose a direct mining framework for graph clustering, classification, and database indexing. They also introduce the concept of optimal patterns and address scalability and efficiency. Additionally, the paper highlights the application of direct mining to itemsets, sequences, and trees. Thank you to the authors for their valuable contributions.

siegel
Download Presentation

Mining Significant Graph Patterns by Leap Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

  2. Graph Patterns • Interestingness measures / Objective functions • Frequency: frequent graph pattern • Discriminative: information gain, Fisher score • Significance: G-test • …

  3. Frequent Graph Pattern

  4. Optimal Graph Pattern (this work)

  5. Objective Functions Challenge: Not Anti-Monotonic X

  6. Challenge: Non Anti-Monotonic Non Monotonic Anti-Monotonic Enumerate subgraphs : small-size to large-size Non-Monotonic: Enumerate all subgraphs then check their score?

  7. Frequent Pattern Based Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph Database Optimal Patterns Frequent Patterns (SIGMOD’04, ’05) (ISMB’05, ’07) 1. Bottleneck : millions, even billions of patterns 2. No guarantee of quality

  8. Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Direct Graph index Graph Database Optimal Patterns How?

  9. Upper-Bound

  10. Upper-Bound: Anti-Monotonic (cont.) Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.

  11. Vertical Pruning Large <- small

  12. Horizontal Pruning: Structural Proximity

  13. Structural Proximity: Another Perspective # of frequent patterns >> # of possible frequency pairs Many patterns share the same score

  14. Frequency Envelope

  15. Structural Leap Search

  16. Frequency Association Significant patterns often fall into the high-quantile of frequency Starting with the most frequent patterns

  17. Descending Leap Mine 1. Structural Leap Search with frequency threshold 2. Support-Descending Mining F(g*) converges 3. Structural Leap Search

  18. Results: NCI Anti-Cancer Screen Datasets Chemical Compounds: anti-cancer or not # of vertices: 10 ~ 200 Link: http://pubchem.ncbi.nlm.nih.gov

  19. Efficiency Vertical Pruning Horizontal Pruning

  20. Effectiveness (runtime) frequency descending frequency descending + leap mine

  21. Effectiveness (accuracy) slightly different

  22. Graph Classification (6x) (6x) *OA Kernel: Optimal Assignment Kernel LEAP: LEAP search

  23. Scalability Means Something ! ~8000sec OA(6X) Quadratic OA ~200sec LEAP(6X) ~100sec Linear ~20sec LEAP

  24. Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Direct Graph index Graph Database Optimal Graph Patterns

  25. Beyond Graph Patterns 1. Direct mining can be applied to itemsets, sequences, and trees Exploratory task Clustering Classification Direct Index itemset/sequence/tree Database Optimal Patterns • Existing algorithms can be recycled to mine patterns with • sophisticated measures. • Pattern-based methods including indexing and classification • are competitive.

  26. Thank you Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree SIGKDD’08 @ Las Vegas

  27. Graph Classification: Kernel Approach • Kernel-based Graph Classification • Optimal Assignment Kernel(Fröhlich et al. ICML’05)

More Related