1 / 26

TEMPORAL ASSOCIATION RULE MINING

TEMPORAL ASSOCIATION RULE MINING. Prepared by : Ajit Padukone , Komal Kapoor. Outline. Association Rule Mining Applications Temporal Association Rule Mining Existing Techniques and their Limitations Problem Statement Proposed Approach Finding Maximal Valid Time Intervals

ion
Download Presentation

TEMPORAL ASSOCIATION RULE MINING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TEMPORAL ASSOCIATION RULE MINING Prepared by : AjitPadukone, KomalKapoor

  2. Outline • Association Rule Mining • Applications • Temporal Association Rule Mining • Existing Techniques and their Limitations • Problem Statement • Proposed Approach • Finding Maximal Valid Time Intervals • Finding All Temporally Frequent Itemset • Future Work

  3. Motivation Association Rule Mining {onion, potatoes} => {burgers} {bread, milk} => {butter} Transaction Data Frequent itemsets : {onion,potatoes,burgers}, {bread,milk,butter}

  4. Applications • Retail Data Analysis • Web Usage Mining • Intrusion Detection • Bioinformatics

  5. Spatial Association Rule Mining • Extract spatial predicates • Find all frequent patterns/predicates/sets • Generate strong rules E.g. {Contains(Port),crosses(WaterBody)} Source : VaniaBorgony, Enhancing Spatial Association Rule Mining in Geographic Databases, 2006 - lume.ufrgs.br

  6. Temporal Association Rule Mining Chapter 10 of the reference book defines two types of temporal references: • Transaction Time • Valid Time Time attribute for association rules can also be defined in an analogous way.

  7. Existing Technique – Apriori Algorithm • Apriori Algorithm finds the frequent item sets in a set of transaction which satisfy the minimum support threshold. • Support of the item set is defined as the proportion of transactions in the data set which contain the itemset. Algorithm: • Find all k-itemsets that have transaction support above minimum support (frequent k-itemsets) • Generate candidatek+1-itemsets using large k-itemsets • Prune the candidate k+1-itemsets to obtain frequent k+1-itemsets which have a transaction support above minimum support • If size(frequent k+1-itemsets) > 0, Repeat

  8. Apriori Algorithm (contd.) Universal Set of Items = { A, B, C, D, E, F, G } Minimum support = 30 % (3 transactions) Step 3: 3 – itemsets. All 3 itemsetswith non-frequent 2-item sets as subsets have been pruned. Non-struck out ones are frequent. Step 1: 1 – itemsets. Non-struck out ones are frequent. Table 1: Transaction Database Step 2: 2 – itemsets. All 2 itemsetswith { D } or { E } as one of the subsets are pruned. Non-struck out ones are frequent.

  9. Limitation • The Apriori Algorithm finds the frequent itemsets in the transaction database which satisfy the minimum support threshold for the entire transaction database. • What about those itemsets which are highly frequent over a limited period of time and not over the entire set of transactions? For e.g. – Turkey-> Pumpkin Pie (Halloween) • The itemsets extracted using the Apriori Algorithm, might not be valid for the entire period over which association rule mining has been performed.

  10. Related Work • X. Chen and I. Petrounias, Mining Temporal Features in Association Rules, Proc. Third European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '99). • YingjiuLi, PengNing, X. Sean Wang, SushilJajodia, Discovering Calendar-based Temporal Association Rules , journal Data & Knowledge Engineering - Special issue: Temporal representation and reasoning archive Volume 44 Issue 2, February 2003. • Kang et. al., Discovering Flow Anomalies: A SWEET Approach, Eighth IEEE International Conference on Data Mining, 2008. ICDM

  11. Temporal Association Rule Mining The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12thDec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12th Dec-2009, 11thhr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12th Dec-2009, 12thhr{{soap, shampoo, comb, toothbrush}}

  12. Temporal Association Rule Mining The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12thDec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12th Dec-2009, 11thhr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12th Dec-2009, 12thhr{{soap, shampoo, comb, toothbrush}} Time Unit (chronon)

  13. Problem Statement Definitions : • Support of an itemset I over interval (ti,tj) = frequency of I in the interval (ti,tj)/Total number of transaction during the interval (ti,tj) • Valid Time Interval for itemset I: the time interval during which the support of I over the interval is greater than a threshold (lmin_sup) • Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. • Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = 0.5 Valid Time Intervals

  14. Problem Statement Definitions : • Support of an itemset I over interval (ti,tj) = frequency of I in the interval (ti,tj)/Total number of transaction during the interval (ti,tj) • Valid Time Interval for itemset I: the time interval during which the support of I is greater than a threshold (lmin_sup) • Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. • Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = 0.5 Maximal Valid Time Intervals

  15. Problem Statement (contd.) Given: Transaction data D in the format (TU, {T1,T2,…,Tk}) Where TU-> Time Unit Ti-> Transaction Find: All temporally frequent itemsets along with their maximal valid time intervals.

  16. Problem Statement (contd.) So now, along with finding the frequent itemsets we have to find the maximal valid time intervals for each frequent itemset. Complexity of the naive approach for finding maximal valid time intervals for each frequent itemset: O(n2) Where, n= |D|

  17. Finding Maximal Valid Time Intervals Definition : • Valid/Supporting Time Unit for I: Time Unit during which the support of I is greater than lmin_supp. • Non-valid/Non-Supporting Time Unit for I: Time Unit during which the support of I is less than lmin_supp. .

  18. Finding Maximal Valid Time Intervals Lemma 1: Each valid time interval TUi,TUj should contain atleast 1 valid/supporting time unit for I. Lemma 2: If an interval (TUi,TUj) is not valid for I then the interval (TUi,TUj+1) where TUj+1 is a non-valid time unit cannot be valid. Lemma 3: If an interval (TUi,TUj) is valid for I then the interval (TUi,TUj+1) where TUj+1is a valid time unit would be valid. Using Lemma 3, collapse continuous runs of supporting time units into 1 unit with the average density

  19. Finding Maximal Valid Time Intervals (contd.) Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_sup Part 1: Find_maximal_valid_time_intervals(I,D,lmin_sup) Find STU={TUa1,TUa2,…,TUan} such than TUak is a supporting time unit for I For i = 1 to n For j=n to i+1 IF is_valid_time_interval(TUai,TUaj,D,lmin_sup) break; End End End Lemma 1,3

  20. Finding Maximal Valid Time Intervals (contd.) Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_sup Part 2: start = TUai-1+1 , finish=TUaj+1-1 low = start, high = TUaj While low <= TUai and end < = finish IF is_valid_time_interval(low,high) high = high +1 Else low = low+1 End End Lemma 2

  21. Finding Maximal Valid Time Intervals (contd.)

  22. Finding Maximal Valid Time Intervals (contd.) Further iterations… Complexity: O(n’2 +n)

  23. Finding All Temporally Frequent Itemset Given: Transaction data D <TUi,{T1, T2, …,Tn}>, lmin_sup, UI (Universal Itemset) C->Generate_1-item_candidate_sets(UI,D) Interval = (1, |D|) While (|C|>0) For each candidate set c in C max_valid_intervals-> find_maximal_valid_time_interval(c,D,lmin_sup) If |max_valid_intervals|>0 temp_freq_sets.add(<c,max_valid_intervals>) End End If |temp_freq_sets| > 0 C-> generate_new_candidate_sets(temp_freq_sets , D,lmin_sup) Else C-> null End End

  24. Pruning in Candidate Set Generation

  25. Future Work • Find cyclic valid time intervals • Identify interesting maximal valid time intervals

  26. Questions?

More Related