1 / 17

An Efficient Algorithm for Incremental Mining of Association Rules

An Efficient Algorithm for Incremental Mining of Association Rules. Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA ’ 05 Speaker :董原賓 Advisor : 柯佳伶. Introduction. Previous incremental mining algorithms FUP (Fast Update Algorithm)

danton
Download Presentation

An Efficient Algorithm for Incremental Mining of Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Algorithm for Incremental Mining of Association Rules Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA’05 Speaker:董原賓 Advisor:柯佳伶

  2. Introduction • Previous incremental mining algorithms • FUP (Fast Update Algorithm) • FUP2 • negative border ※They all have to rescan the originally database • Problem • Publication-like database • EX:Publication database, web log records, etc. • The original database is normally much larger than the incremental database • Solution • NFUP (New Fast Update Algorithm)

  3. Definition • DB:original database • db:the set of newly added transactions • DB+:DB + db • n, Pn:db is divided into n partitions, db = P1UP2U,…,UPn-1UPn • dbm,n = PmUPm+1U,…,UPn-1UPn

  4. Definition • α set: frequent itemsets in DB+ • β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n • γ set: frequent in dbm,m, but infrequent in dbm+1,n • X.count:occurrence count • X.start:partition number when X becomes frequent • X.type:denotes one of the three types α,β, and γ

  5. FUP (Fast Update Algorithm) • In case2, itemset is easily calculated • In case3, FUP needs to rescan the original database

  6. NFUP (New Fast Update Algo.) • A backward method that only requires scanning incremental database • A frequent itemset in the incremental database is also important even if it is infrequent in the updated database • Partition the incremental database (db) by the time interval

  7. NFUP • The frequent set of itemsets of DB is known in advance • NFUP scans each partition backward, the last partition is scanned first • In each partition, the process is performed like that of Apriori.

  8. NFUP

  9. Scan from Pn to P1 and find the α,β,γ itemsets in db After P1 is scanned, the occurrence count is accumulated with itemsets of DB

  10. The latest partition is scanned first, initialize variables and accumulate the occurrence Still frequent in Pmthen accumulate count Still frequent in dbm,n then accumulate count Only frequent in dbm+1,n then Remove from α set and add Into β set Not belong to any set and frequent in Pm then check if Pm is the latest partition Yes  α set No  γ set

  11. Example Min sup = 50% {AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2} {ABC: 2} {A: 2} {B: 2} {C: 3} {D: 1} {E: 1} {F: 2} Check if itemset belongs to α set Check if itemset’s count >= 1.5 Check if P2 is the latest partition yes α no  γ Else check itemset doesn’t belongs to any set Check if itemset belongs to α set Else check itemset doesn’t belong to any set Check if P2 is the latest partition yes α no  γ Run Apriori-gen Scan p2 : 1-itemset Check if itemset’s count >= 1.5 scan P2 : 2-itemset Scan P2 : 3-itemset 3 x 0.5 = 1.5 α set start count β set start count γ set start count {A} 2 2 {B} 2 2 {C} 2 3 {F} 2 2 {AB} 2 2 {AC} 2 2 {BC} 2 2 {CF} 2 2 {ABC} 2 2

  12. Example Min sup = 50% 3 x 0.5 = 1.5 {AB: 1} {AC: 0} {BC: 2} {BE: 3} {CE: 2} {A: 1} {B: 3} {C: 2} {D: 1} {E: 3} {F: 0} Check if itemset belongs to α set Run Apriori-gen Check if P1 is the latest partition yes α no  γ Check itemset doesn’t belongs to any set Else check if itemset’s count >= 1.5 scan P1 : 2-itemset Yesaccumulate count Count < s*|dbm,n| = 0.5x6 = 3  β set Check if P1 is the latest partition yes α no  γ Else check if itemset’s count >= 1.5 Check itemset doesn’t belong to any set Check if itemset belongs to α set Yesaccumulate count Count < s*|dbm,n| = 0.5x6 = 3  β set Scan p1 : 1-itemset α set start count β set start count γ set start count {A} 2 1 3 2 {F} 2 2 {E} 1 3 {B} 1 2 2 5 {AC} 2 2 {BE} 1 3 {C} 2 1 5 3 {CF} 2 2 {CE} 1 2 {F} 2 2 {ABC} 2 2 {AB} 1 2 3 2 {AC} 2 2 {BC} 2 1 2 4 {CF} 2 2 {ABC} 2 2

  13. Example α set start count β set start count γ set start count {A} 1 0 3 7 {F} 2 2 {E} 1 3 {B} 0 1 5 8 {AC} 2 2 {BE} 1 3 {C} 0 1 9 5 {CF} 2 2 {CE} 1 2 {AB} 1 3 {ABC} {AB} 2 1 2 3 {AE} 0 3 {BC} 1 4 {BC} 1 4 {ABC} 2 2

  14. Experiment • Intel Pentium IV 1.5GHz CPU, 640 MB main memory • Microsoft Windows 2000 Professional • Synthetic datasets:

  15. Experiment

  16. Experiment

  17. Experiment

More Related