1 / 20

Pattern Directed Mining Of Sequence Data

Pattern Directed Mining Of Sequence Data. By Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava Presenter : Jyothsna R Nayak. contents. Introduction Sequential Patterns Data Structure and Algorithm Experimental Evaluation SP Tree Optimization Conclusions References. Introduction.

gefjun
Download Presentation

Pattern Directed Mining Of Sequence Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Directed Mining Of Sequence Data By Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava Presenter : Jyothsna R Nayak

  2. contents • Introduction • Sequential Patterns • Data Structure and Algorithm • Experimental Evaluation • SP Tree Optimization • Conclusions • References

  3. Introduction • Sequence data • event has an associated time of occurrence • Episode is a collection of events • Frequent Episodes : Episodes occurring with a frequency above a certain threshold

  4. Steps involved in mining of frequent episodes • Present a language for specifying episodes of interest • Describe a data structure: Sequential Pattern Tree • Mining algorithm to generate frequent episodes • Optimize SP Tree

  5. Sequential Patterns • Pattern language • A = {A1,A2,….Am} • D1, D2,…,Dm = Domains • e over A is a (m + 2) tuple(a1, a2,..,am, tbeg, tend)

  6. Example of Events in the Stock Market Domain Activenes Event ID Date Comp Type Comp Name Movement Volatility e1 Low 01/02/91 Computer Microsoft Down High e2 Medium 01/03/91 Computer Microsoft Up High e3 High 01/02/91 Computer Low Microsoft NoMovmt e4 01/03/91 Computer Down High High Microsoft

  7. Definitions • Ordering Constraint • Serial Occurrence e -> f , e.tend < f.tbegin • Parrallel Occurrence (e || f) • Attribute constraint • Selection Constraint e.type = ‘computer’ • Join Constraint e.name = f.name

  8. Event specification • Partial specifications e[(e.type = ‘computer’ v e.type = ‘electronic’) ^ e.movement_direction = ‘down’] • comparing some characteristics e[e.movement_direction = ‘up’] -> [e.name = f.name] f[f. movement_direction = ‘down’]

  9. Data Structure • Leaf node represents an event • An interior node represents an ordering constraint • If is an ordering constraint labeling some interior node, and if e and f are the left and right children of that node then e f is a sequential pattern. • Associated with each node is a table of matching events • Attached to each node is a Boolean expression tree representing attribute constraints . .

  10. SP Tree Matching episodes Matching events Matching events = e f = e.name f.name = e.mvmt up f.mvmt down SP Tree for e[e.mvmt = ‘up’] -> [e.name = f.name]f[f.mvmt = ‘down’] User specified pattern

  11. Bottom-up algorithm Intialize queue Q to empty for (each leaf 1 in T) do begin generate events from S that match constraints of 1 if(the parent p of 1 is not ready in Q) then put p in Q end While (Q is not empty) do begin Remove node n from Q Generate_Events(n) if(for n’s parent p another child was processed) then put p in Q end

  12. Generate-events Algorithm • for(each episode e from left child l of n) do begin for (each episode f from right child r of n) do begin if(node n is serial) then if(e.tend >= f.tbegin) then continue if(events in e and f match the join constraint) then form new episode g from events from e and f end end

  13. Experimental evaluation • Results • window size variation • data set size • number of event specifications • attribute constraints

  14. Time in Secs Window Size in Days Minimum Frequency = 0.8

  15. Time in Secs Number of Event specifications Minimum Frequency = 0.8 Window size = 11

  16. Time in Secs Number of constraints Minimum Frequency = 0.8 window size = 5

  17. Time in Secs Number of Events in Data sets Minimum Frequency = 0.7 Window size = 5

  18. SP Tree Optimization • If two event nodes represent the same event, then only one of the nodes can be used. • If two ordering nodes have the same join constraints, and they both have the left and right children representing the same events then one such node is sufficient.

  19. Conclusions Approach is • Robust • Flexible • Efficient • Complex pattern • Good performance

  20. References • Discovering frequent episodes in sequences by Mannila. H., Toivonen, H and Verkamo • Agarwal, R., and Srikanth “Mining sequential patterns” • Mannila. H., Toivonen, H “ Discovering generalised episodes using minimal occurences • Agarwal, R., and Srikanth”Mining generalised association rules

More Related