An Efficient Algorithm for Incremental Mining of Association Rules

An Efficient Algorithm for Incremental Mining of Association Rules Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA’05 Speaker：董原賓 Advisor：柯佳伶

Introduction • Previous incremental mining algorithms • FUP (Fast Update Algorithm) • FUP2 • negative border ※They all have to rescan the originally database • Problem • Publication-like database • EX：Publication database, web log records, etc. • The original database is normally much larger than the incremental database • Solution • NFUP (New Fast Update Algorithm)

Definition • DB：original database • db：the set of newly added transactions • DB+：DB + db • n, Pn：db is divided into n partitions, db = P1UP2U,…,UPn-1UPn • dbm,n = PmUPm+1U,…,UPn-1UPn

Definition • α set: frequent itemsets in DB+ • β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n • γ set: frequent in dbm,m, but infrequent in dbm+1,n • X.count：occurrence count • X.start：partition number when X becomes frequent • X.type：denotes one of the three types α,β, and γ

FUP (Fast Update Algorithm) • In case2, itemset is easily calculated • In case3, FUP needs to rescan the original database

NFUP (New Fast Update Algo.) • A backward method that only requires scanning incremental database • A frequent itemset in the incremental database is also important even if it is infrequent in the updated database • Partition the incremental database (db) by the time interval

NFUP • The frequent set of itemsets of DB is known in advance • NFUP scans each partition backward, the last partition is scanned first • In each partition, the process is performed like that of Apriori.

NFUP

Scan from Pn to P1 and find the α,β,γ itemsets in db After P1 is scanned, the occurrence count is accumulated with itemsets of DB

The latest partition is scanned first, initialize variables and accumulate the occurrence Still frequent in Pmthen accumulate count Still frequent in dbm,n then accumulate count Only frequent in dbm+1,n then Remove from α set and add Into β set Not belong to any set and frequent in Pm then check if Pm is the latest partition Yes  α set No  γ set

Example Min sup = 50% {AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2} {ABC: 2} {A: 2} {B: 2} {C: 3} {D: 1} {E: 1} {F: 2} Check if itemset belongs to α set Check if itemset’s count >= 1.5 Check if P2 is the latest partition yes α no  γ Else check itemset doesn’t belongs to any set Check if itemset belongs to α set Else check itemset doesn’t belong to any set Check if P2 is the latest partition yes α no  γ Run Apriori-gen Scan p2 : 1-itemset Check if itemset’s count >= 1.5 scan P2 : 2-itemset Scan P2 : 3-itemset 3 x 0.5 = 1.5 α set start count β set start count γ set start count {A} 2 2 {B} 2 2 {C} 2 3 {F} 2 2 {AB} 2 2 {AC} 2 2 {BC} 2 2 {CF} 2 2 {ABC} 2 2

Example Min sup = 50% 3 x 0.5 = 1.5 {AB: 1} {AC: 0} {BC: 2} {BE: 3} {CE: 2} {A: 1} {B: 3} {C: 2} {D: 1} {E: 3} {F: 0} Check if itemset belongs to α set Run Apriori-gen Check if P1 is the latest partition yes α no  γ Check itemset doesn’t belongs to any set Else check if itemset’s count >= 1.5 scan P1 : 2-itemset Yesaccumulate count Count < s*|dbm,n| = 0.5x6 = 3  β set Check if P1 is the latest partition yes α no  γ Else check if itemset’s count >= 1.5 Check itemset doesn’t belong to any set Check if itemset belongs to α set Yesaccumulate count Count < s*|dbm,n| = 0.5x6 = 3  β set Scan p1 : 1-itemset α set start count β set start count γ set start count {A} 2 1 3 2 {F} 2 2 {E} 1 3 {B} 1 2 2 5 {AC} 2 2 {BE} 1 3 {C} 2 1 5 3 {CF} 2 2 {CE} 1 2 {F} 2 2 {ABC} 2 2 {AB} 1 2 3 2 {AC} 2 2 {BC} 2 1 2 4 {CF} 2 2 {ABC} 2 2

Example α set start count β set start count γ set start count {A} 1 0 3 7 {F} 2 2 {E} 1 3 {B} 0 1 5 8 {AC} 2 2 {BE} 1 3 {C} 0 1 9 5 {CF} 2 2 {CE} 1 2 {AB} 1 3 {ABC} {AB} 2 1 2 3 {AE} 0 3 {BC} 1 4 {BC} 1 4 {ABC} 2 2

Experiment • Intel Pentium IV 1.5GHz CPU, 640 MB main memory • Microsoft Windows 2000 Professional • Synthetic datasets:

Experiment

An Efficient Algorithm for Incremental Mining of Association Rules

An Efficient Algorithm for Incremental Mining of Association Rules

Presentation Transcript

Data Mining Association Rules

Mining Association Rules

Mining Association Rules

DATA MINING - ASSOCIATION RULES-

Mining Association Rules

Mining Causal Association Rules

Efficient Mining of Both Positive and Negative Association Rules

Data Mining Association Rules

Association Rules Mining

CBW: An Efficient Algorithm for Frequent Itemset Mining

Incremental Mining Association Rules

Incremental Mining of Association Rules

A Classical Apriori Algorithm for Mining Association Rules

A Parameterised Algorithm for Mining Association Rules

Incremental Maintenance of Ontology-Exploiting Association Rules

Mining Generalized Association Rules

Algorithms for Mining Association Rules

Hash-Based Algorithm for Mining Association Rules

FastANOVA: an Efficient Algorithm for Genome-Wide Association Study

Mining Negative Association Rules

Incremental Mining of Association Rules

Mining Association Rules