190 likes | 280 Views
Identifying Follow-correlation Itemset-pairs. Shichao Zhang, Jilian Zhang , Xiaofeng Zhu , Zifang Huang Department of Computer Science, Guangxi Normal University, China. Made in ICDM ’ 06. Outline. Introduction Definition P3.1 (FCIP) Algorithm Conclusion. Introduction.
E N D
Identifying Follow-correlation Itemset-pairs Shichao Zhang, Jilian Zhang , Xiaofeng Zhu , Zifang Huang Department of Computer Science, Guangxi Normal University, China Made in ICDM’06
Outline • Introduction • Definition P3.1 (FCIP) • Algorithm • Conclusion
Introduction • Denoted as P3.1 Itemset Pairs or Follow-Correlation Itemset-pairs(FCIP), which will be defined in detail • This paper proposes this new kind of interesting patterns and aims to develop techniques for mining them.
Definition 1. • Itemoccurring sequence . SI=<I1,I2,I3,…,It,…,I> where It {0,1} and t [1,T ] SI= <Im,…,In> whereIt =1, t =m,…,n and 1 m n T SI= <Im,…,In> whereIt =0, t =m,…,n and 1 m n T Len(SI )=n-m+1 Len(SI )=n-m+1 T 1 0 1 0
Definition 2. • Follow-Correlation Itemset-Pairs <C ,A > C =SC =< Cm , …,Cn> 1 m n T A =SA =< Ak , …,Al >where k {n,n+1} k l T and Cm - 1 = Cn +1 = 0 , if 1 m n T Ak - 1 = Al +1 = 0 , if 1 k l T 1 1
Definition 2.(cont.) • The pair <C, A> is called the Lag Follow-Correlation Itemset-Pairs (LFCIP) If k = n+1 • Strong Follow-Correlation Itemset-Pairs (SFCIP) If k = n
Definition 2.(cont.) • for sequence A=’101010101010’ B=’010101010101’ Both <A, B> and <B, A> are different FCIP FCIP <A, B> is LFCIP and its frequency is 6but <B, A> frequency is 5 1 1 1 1 1 1 1 1
Definition 4. • Longest P3.1 pattern • P =<C, A> mk , k n
Example 1. • Consider a given database D Let A and B be two items in D
Example 1.(cont.) • Using our method we can identify an interesting follow-correlation: itemset-pairs < A , B > with frequency of 10. 1 1
Example 2. • Consider the same database D Using support-confidence framework, we can obtain the association rule A B with confidence 0.333.
Example 2.(cont.) • Using our method we can discover an interesting follow-correlation: itemset-pairs < A , B > 3 1 4 2 < A , B >
Example 3. • IDIIIODDDIIIIODDD for stock A IODDOIDODODDODODD for stock B A is 10111 00001 11100 00 D:representing more than 10% of the daily value B is 10000 10000 00000 00 Decrease omit those zero values I:representing morethan 20% of the daily value A is 111101111 Increase B is 100010000 O:Other kinds of changes
Example 5. Given a customer transactional database of a supermarket 2 2 <d ,c > we call it the Strong Follow-Correlation Itemset-Pairs (LFCIP). 2 3 <d ,c > we call it the Lag Follow-Correlation Itemset-Pairs (LFCIP).
Example 5.(cont.) 2 3 2 2 • <{d ,e }, c > and <{d ,e }, c > This kind of P3.1 pattern contains more than one items
Example 5.(cont.) • For ease of discussion in this paper we consider the situation that there is only one item in the Action itemset of a P3.1pattern. <f ,a > <f ,a > <a ,f > <a ,f > <g ,b > <b ,g > <b ,g > <b ,g > 1 3 1 1 1 1 1 3 3 1 3 1 1 3 1 2
Algorithm step1 • ‘S’,’E’ and ‘P’ denote the Start position, End position and the successive Pointer to next node respectively
Conclusion • The method is trivial to find interesting pattern.