1 / 19

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns. Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser:Jia-Ling Koh Speaker: Yu-ting Kung. Introduction.

masao
Download Presentation

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser:Jia-Ling Koh Speaker: Yu-ting Kung

  2. Introduction • In this paper, the main tasks (for a multi-user environment) are: • Constructing an initial tree for a transactional database (in memory) • Mining using the tree constructed in memory • Converting in-memory tree  a disk-based tree • Loading a portion of the tree on disk into main memory for mining (mining is the same as 2)

  3. Introduction(Cont.) • Data structures─PP-tree • A novel coded prefix-path tree • Two representations: • Memory–based pp-tree • Disk-based pp-tree • Mining algorithm─PP-Mine • Upon the memory-based pp-tree • Outperforms FP-growth

  4. Transaction Database • Example: (min_sup threshold 2 ) ( a:3, b:1, c:3, d:3, e:3, f:1, g:2, h:1, i:1)

  5. Node: labelled for a frequent item in F A Coded Prefix-Path Tree • PP-tree: an order tree F: a set of frequent 1-items in total order (like frequency order) Children of a node: listed following the order The rank Nof a PP-tree: (N= 5) the number of frequent 1-itemset

  6. A Complete Prefix-Path Tree • tree (rank N): a PP-tree with nodes Node is encoded in: pre-order traversal Shaded subtree: a PP-tree

  7. PP-tree Representations • Memory-based representation ─ PPM-tree • Disk-based representation ─ PPD-tree • Represented as • T: tree structure in disk • F: stores N frequent 1-itemset • I: index indicating the ranges of codes in disk-pages • : min_sup uesd to build PPD-tree on disk • See Figure 3 (next page)

  8. item:count Code of range code:count PP-tree Representation-Fig3

  9. How to built a PPD-tree? • Construction • A PPM-tree with in memory (task1) • Conversion • PPM-tree  PPD-tree • Using coding scheme

  10. PP-Mine: Mining in-Memory • Based on two properties: (ij, ik: a single item prefix-path) ( : a prefix-path in general which are possible empty) • Property1 (push-down)

  11. PP-Mine (Cont.) • Property 2 (push-right) • Example: Figure 4 (next page)

  12. PP-Mine (Cont.)

  13. PP-Mine Algorithm: Example

  14. Experiment(1) • Data Sourse • Sparse dataset─T25I20D100K(10K items) • Dense dataset ─ T40I10D1K(101 items) • Three Algorithms to be compared • PP-Mine • FP-growth • H-Mine • Compare the only mining-phase

  15. Experiment Result(1)

  16. Experiment Result(2) • Data Sourse─T40I10D100K(59 items) • = 50% • Two Algorithms to be compared • PP-Mine • FP-growth • Compare • t(FP)─the time for FP-growth to construct a FP-tree • t(PP) ─the time for PP-load to load a sub PPD-tree + the timetoconstructa small PPM-tree

  17. Experiment Result(2)

  18. Conclusion • PP-Mine algorithm outperformsFP-tree • Reduce both I/O cost and CPU cost • PP-Mine algorithm outperforms H-mine • Minimizescountingcost

  19. Coverage • Definition A coverage of a prefix-path-prefix is defined as all the -prefixes that contain -prefix (including -prefix itself)

More Related