120 likes | 265 Views
Efficient Mining of High Utility Itemsets from Large Datasets. Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08. Outline. Introduction Definition
E N D
Efficient Mining of High Utility Itemsets from Large Datasets Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08
Outline • Introduction • Definition • Method–Compressed Transaction Utility-Prol • Experiments • Conclusions
Introduction • Frequent itemset mining is to find items that co-occur in a transaction database above a user given frequency threshold, without considering the quantity or weight such as profit of the items. • TwoPhase based on Apriori issuitable for sparse data sets with short patterns, CTU-Mine based on the pattern growth is suitable for dense data.
Definition • u(3 4, t1) =$60u(3 4, t3)=$60u(3 4) = $120 ,
Definition • Transaction Utility : • Transaction weighted Utility: • tu(1) = 80 twu(3 4)=$190
Compressed Transaction Utility-Prol 99<min_Utility(129.9)
Compressed Utility Pattern-Tree • Parallel projection of transaction database
CUP-tree • Traverse index 1 (110) from 5, 2 (310) from (2,3,4), • 3 (195) from 2, and 4 (190)from (3,5)
ProCUP-tree • index 1 (110) from 5, cause 110<min_Utility(129.9) • 2 (310) from (2,3,4),3 (195) from 2, and 4 (190)from (3,5)
ProCUP-tree • oriUtility*itemQuantity + proUtility*proQuantity = Utility • 35*2+25*2=120, 150*1+25*1=175,10*5+25*3=125 • High_Utility_Itemset = (3,2) (3,2,1)
Conclusion • CTU-Pro algorithm to mine the complete set of high utility itemsets from both sparse and relatively dense datasets with short or longer high utility patterns. • The algorithm adapts to large data by constructing parallel subdivisions on disk that can be mined independently.