1 / 14

Data Mining Association Rules

Data Mining Association Rules. Yao Meng Hongli Li 91.574 Database II Fall 2002 . Outline. Overview Apriori AprioriTid DIC Data Structure Experiment Environment Experiment Result and Analysis. Overview – Apriori Algorithm. Overview – AprioriTid. Overview – DIC . Read M transaction

bond
Download Presentation

Data Mining Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Association Rules Yao Meng Hongli Li 91.574 Database II Fall 2002

  2. Outline • Overview • Apriori • AprioriTid • DIC • Data Structure • Experiment Environment • Experiment Result and Analysis

  3. Overview – Apriori Algorithm

  4. Overview – AprioriTid

  5. Overview – DIC • Read M transaction • Increment those itemset that are current counting • If all the child of a itemset turned to large, begin to counting this itemset • If an itemset has been counted through all the transaction, remove it from the current counting list • If at the end of the DB, go to the first step • Stop if no itemset are need to counting

  6. Hypothesis of Performance Analysis Given a memory size • AprioriTid generally has better performance than Apriori due to I/O saving • DIC has better performance than Apriori in fairly homogenenous data environment. • DIC performance should approach that of Apriori while M approaches number of total transaction.

  7. Experiment Environment • Data Sets • IBM Synthetic Dataset Generation Code for Association Rules • Enviroments • Operating System: Microsoft Windows XP Professional • Computer • Intel Pentium III processor 550MHz • RAM 384 MB • Source code written in Java

  8. Data Structure • Apriori and DIC • Candidate Itemset stored in a hash-tree • Each internal node is are hashtables • The leaves stored the candidate itemset • AprioriTid • Use array to keep candidates

  9. Number of Items = 8 Avg transaction length = 5 M = 500 Size vs. Execution Time

  10. Support Threshold Size = 16410 transaction Number of Items = 8 Average Length per transaction = 5 M = 500

  11. DIC – Different M value Size = 12291 transaction Number of Items = 8 Average Length per transaction = 5

  12. DIC – “Non-Homogeneous” Dataset Size = 6000 transaction Number of Items = 8 M = 500

  13. Conclusions • AprioriTid is the best in our experiment • I/O saving • AprioriTid use small Data structure • Apriori and DIC are very similar • Apriori is Special Case of DIC • They use same data structure • DIC • Sensitive to data • M affects performance

  14. Reference • Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Mining Association Rules between Sets of Items in Large Database. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993 • Rakesh Agrawal, Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. Proc. 20th Int. Conf. Very Large Data Bases, VLDB, page 487-499. 1994 • Ashok Savasere, Edward Omiecinski, Shamkant Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. Proc. of the 21st VLDB Conf., pp. 432-444, 1995. • Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data. Tucson, Arizona, USA. 1997. • J. Hipp, U. Güntzer, G. Nakhaeizadeh. Mining Association Rules: Deriving a Superior Algorithm by Analysing Today's Approaches. Proceedings of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '00), Lyon, France. 2000. • Jochen Hipp, Ulrich Güntzer, Gholamreza Nakhaeizadeh. Algorithms for Association Rule Mining – A General Survey and Comparison. SIGKDD Explorations. 2(1): 58-64. 2000. • R. Srikant, R. Agrawal. Mining Generalized Association Rule. In Proc. of the VLDB Conference, September 1995

More Related