1 / 52

Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229. Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan.

ilyssa
Download Presentation

Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bit-parallel approximate string matching algorithms with transpositionHeikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

  2. This paper is based on “Fast text searching allowing errors” which there are three operation which are insertion, deletion and substitution. This paper uses a new operation transposition. We will use those operations to solve the approximate string matching problem under bit-parallel. Transposition: It is transposing two adjacent characters in the string. Example: Insertion: cat → cast Deletion: cat → at Substitution: cat → car Transposition: cta → cat

  3. But now, first we will use the three operations insertion, deletion and substitution to apply in the approximate string matching problem. Then the transposition will be introduced after the three operations.

  4. Our approximate string matching problem is defined as follows: We are given a text T1,n, a pattern P1,m and an error bound k. We want to find all locations Ti when 1≦i≦n such that there exists a suffix A of T1,i and d(A,P)≦k where d(x,y) is the edit distance (or difference) between x and y. Example: T=deaabeg, P=aabac and k=2. For i=5. T1,5= deaab. We note that there exists a suffix A=aab of T1,5 such that d(A,P)=d(aab,aabac)=2.

  5. Our approach is based upon the following observation: Case 1 : S S1 S2 T P P1 P2 Let S be a substring of T. If there exists a suffix S2 of S and a suffix P2 of P such that d(S2, P2) = 0, and d(S1, P1) ≦k, we haved(S, P) ≦ k.

  6. Example: A=addcd and B=abcd. k=2. We may decompose A and B as follows: A=add+cd. B=ab+cd. d(add,ab)=2. Thus d(A,B)=2.

  7. Case 2 : S S1 S2 T P P1 P2 Let S be a substring of T. If there exists a suffix S2 of S and a suffix P2 of P such that d(S2, P2) = 1, and d(S1, P1) ≦k-1, we haved(S, P) ≦ k.

  8. Example: A=addb and B=dc. k=3. We may decompose A and B as follows: A=add+b. B=d+c. d(b,c)=1 d(add,d)=2. Thus d(A,B)=3.

  9. To solve our approximate string matching problem, we use a table, called . where 1≦i≦n and 1≦j≦m. = 1 if there exists a suffix A of T1,i such that d(A,P1,j)≦k. = 0 otherwise. Example: T:aabaacaabacab, P:aabac and k=1. Consider i=9, j=4. T1,9=aabaacaab P1,4=aaba A=T7,9=aab d(A,P1,4)=d(aab,aaba)=1 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 0

  10. It will be shown later that is obtained from . Thus, it is important for us to know how to find . will be obtained by Shift-And algorithm later under bit-parallel. But now, we will explain it in dynamic programming [S80](String Matching with Errors). Example: Text = aabaacaab pattern = aabac can be found by dynamic programming. for all i. for all j. = 1 if and ti=pj . = 0 if otherwise. From the above, we can see that the ith column of can be obtained from the (i-1)th column of , ti and pj. 1 2 3 4 5 6 7 8 9 a a b a a c a a b 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 a a b a c 1 2 3 4 5

  11. Example: T : aabaacaabacab, P : aabac and k=0. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 For instance, the 9th-colume is 1 2 3 4 5 The 8th-colume is because and t9=p3=‘b’. because and t9≠p2. But, we do not use dynamic programming approach because we want to use the bit-parallel approach.

  12. Example: T : aabaacaabacab, P : aabac and k=0. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Let us denote the ith column of by whose length equals to m. For instance =( 1,0,0,1,0 ) =( 1,1,0,0,0 ) To find we consider two cases: Case 1: j=1 If ti=pj, otherwise, Case 2: j>1 If and ti=pj, otherwise,

  13. We now define a right shift operation >>i on a vector A=(a1,a2,…,am) as follows: That is, we delete the last i elements and add i 0’s at the front. For instance, A=(1,0,0,1,1,1). Then A>>1=(0,1,0,0,1,1). We use & and v to represent AND and OR operation. We further use to denote For example

  14. Example: T : aabaacaabacab, P : aabac and k=0. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Consider , m=5. =( 1,0,0,1,0 )

  15. Given an alphabet α and a pattern P, we need to know where α appears in P. This information is contained in a vector Σ(α) which is defined as follows: Σ(α)[i]=1 if pi=α . Σ(α)[i]=0 if otherwise. Example: P : aabac Σ(a) =( 1,1,0,1,0 ) Σ(b) =( 0,0,1,0,0 ) Σ(c) =( 0,0,0,0,1 )

  16. Example: T : aabaacaabacab, P : aabac and k=0. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Σ(a) =( 1,1,0,1,0 ) and p4=t10=‘a’ Since and p4=t10, we have . Since , we have . Using the following rule: Case 1: j=1 If ti=pj, otherwise, Case 2: j>1 If and ti=pj, otherwise,

  17. Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Example: T : aabaacaabacab, P : aabac and k=0. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Σ[t10] =((0,0,0,1,0)v(1,0,0,0,0))&(1,1,0,1,0) =(1,0,0,1,0) Similarly, we can conclude that .

  18. Shift-And In general we will update the i-thcolumn of , which is , by using the formula for each new character of i.If , report a match at position i.

  19. a c b a a c a 1 a 3 4 5 2 6 a 7 a a b 8 a b a a a a a b a a b c a a b 9 a Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=1 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a a b a c 10000 11010 10000 & Σ(a) . . . j=1 j=2 The operation “&” will examine whether the Σ(ti) equals to the pj or not currently. j=3 j=4 j=5

  20. b a b c a a c 1 a 3 4 5 6 a b 2 a a 7 8 a b a a a a a b a a a c a a b 9 a Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=2 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a a b a c 11000 11010 11000 & Σ(a) . . . j=1 j=2 j=3 j=4 j=5

  21. b c a a c a a 1 a 3 4 5 6 a b 2 b a 7 8 a b a a a a a b a a a c a a b 9 a Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=3 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a a b a c 11100 00100 00100 & Σ(b) . . . j=1 j=2 j=3 j=4 j=5

  22. b b c a a c a 1 a 3 4 5 6 a b 2 a a 7 8 a b a a a a a b a a a c a a b 9 a Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=4 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a a b a c 10010 11010 10010 & Σ(a) . . . j=1 j=2 j=3 j=4 j=5

  23. b c a a c a a 1 a 3 4 5 6 a b 2 b a 7 8 a b a a a a a b a a a c a a b 9 a Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=5 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 2 3 4 5 a a b a c 11001 11010 11000 & Σ(a) . . . j=1 j=2 j=3 j=4 j=5

  24. Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Example: Text = aabaacaabacab pattern = aabac 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 2 3 4 5 a a b a c aabaacaabacab aabac

  25. Approximate String Matching The approximate string matching problem is exact string matching problem with errors. We can use like Shift-And to achieve the table. Here, we will introduce three operation which are insertion, deletion and substitution in approximate string matching. = 1 if there exists a suffix A of T1,i such that d(A,P1,j)≦k. = 0 otherwise. where 1≦i≦n and 1≦j≦m. Let , and denote the related to insertion, deletion and substitution respectively. , and indicate that we can perform an insertion, deletion and substitution respectively without violating the error bound which is k.

  26. If ti=pj , and Otherwise, Example: T:aabaacaabacab, P:aabac and k=1. Consider i=6 and j=5. S=T1,6=aabaac and P1,5=aabac ∵t6=p5=‘c’ and R1(5,4)=1. There exists a suffix A of S such that d(A,P)≦1. i-1 i T: P: aabaa c aaba c j-1 j

  27. Insertion if ti≠pj and otherwise. Example: when k=1. i-1 i i-1 i aabac b aabac b T: P: T: P: aabac b aabac insertion j j

  28. if ti≠pj and otherwise. Example: Text = aabaacaabacab. Pattern = aabac. k=1. (1) When i=6 and j=4. t6=‘c’≠p4=‘a’, (2) When i=11 and j=4. t11=‘c’≠p4=‘a’, 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 0 1 1 0 1 0 0

  29. Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The insertion formula: Example: Text = aabaacaabacab, Pattern = aabac and k=1 1 2 3 4 5 6 7 8 9 10111213 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 0 1 1 0 1 0 0 aabaacaabacab aaba aabaacaabacab aabaac insertion insertion 11110 11010 11010 00100 11110 11101 00001 00001 11000 11001 Σ(a) & v & v Σ(c)

  30. Deletion if ti≠pj and otherwise. Example: when k=1. i i aabac aabac T: P: T: P: b b aabac aabac deletion j-1 j j-1 j

  31. if ti≠pj and otherwise. Example: Text = aabaacaabacab. Pattern = aabac. k=1. 1 2 3 4 5 6 7 8 9 10111213 (1) When i=6 and j=4. t6=‘c’≠p4=‘a’, (2) When i=3 and j=4. t3=‘b’≠p4=‘a’, a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0

  32. Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The deletion formula: Example: Text = aabaacaabacab, Pattern = aabac and k=1. 1 2 3 4 5 6 7 8 9 10111213 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 aabaacaabacab aaba aabaacaabacab aab deletion deletion 11110 00100 00100 10010 10110 11100 00100 00100 10000 10100 & v Σ(b) & v Σ(b)

  33. Substitution if ti≠pj and otherwise. Example: when k=1. i-1 i i-1 i aabac b aabac b T: P: T: P: a b aabac aabac substitution j-1 j j-1 j

  34. if ti≠pj and otherwise. Example: Text = aabaacaabacab. Pattern = aabac. k=1. 1 2 3 4 5 6 7 8 9 10111213 (1) When i=6 and j=4. t6=‘c’≠p4=‘a’, (2) When i=5 and j=5. t3=‘b’≠p4=‘a’, a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0

  35. Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 The substitution formula: Example: Text = aabaacaabacab, Pattern = aabac and k=1. 1 2 3 4 5 6 7 8 9 10111213 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 aabaacaabacab cab aabaacaabacab aabac aabaacaabacab aabaa aabaacaabacab aab 11100 00100 00100 11000 11100 11101 11010 11000 11001 11001 & v & v Σ(b) Σ(a)

  36. The algorithm which contains insertion, deletion and substitution allows k error. In the physical meaning, we just combine the formulas of insertion, deletion and substitution. So now, we combine the three formula using the “or” operation as follows: The insertion : The deletion : The substitution : Initially, ti=pj and insertion deletion substitution

  37. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Example: Text = aabaacaabacab pattern = aabac k=1 01111 11010 01010 00100 00010 01001 10000 11111 & v Σ(a) ←for deletion 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 aabaacaabacab aabac

  38. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Example: Text = aabaacaabacab pattern = aabac k=1 01111 11010 01010 10010 01001 01100 10000 11111 & v Σ(a) ←for insertion 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 aabaacaabacab aabaa

  39. 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Example: Text = aabaacaabacab pattern = aabac k=1 01111 00001 00001 11000 01100 00000 10000 11101 & v Σ(c) ←for substitution 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 aabaacaabacab aab aabaacaabacab aac

  40. Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Example: Text = aabaacaabacab pattern = aabac k=1 1 2 3 4 5 6 7 8 9 10111213 a a b a a c a a b a c a b 1 2 3 4 5 a a b a c 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 0

  41. Transposition Example: A=badec and B=bdace. If transposition is allowed, we may transpose B as follow: B=bdace → A=badec Define for transposition: if & ti=pj-1 & ti-1=pj otherwise.

  42. We combine the operations of insertion, deletion, substitution and transposition for k error. The formula is as follows: Initially, ti=pj and V insertion deletion substitution V V transposition V V

  43. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=1. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 2 3 4 5 a c b c b 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 the original partial text: a b c 01110 01010 01010 00000 00000 00000 10000 00100 11110 & a Σ(c) 00100 00101 00101 00100 transpose “bc” to be “cb” b Σ(b) Σ(c)>>1 c & a c b k≦0 + k≦1 v So we can tackle this within k≦1. a bc bcaccbb a bc bcaccbb a cb a bc

  44. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=1. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 2 3 4 5 a c b c b 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 the original partial text: a c c b 01111 00101 00101 00000 00000 00000 10000 00010 10111 a c & Σ(b) transpose “cb” to be “bc” 00010 01010 00010 00010 c Σ(c) Σ(b)>>1 & b a c b c k≦0 + k≦1 v abcbcac cb b So we can tackle this within k≦1. abcbcac cb b ac cb ac bc

  45. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=2. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 2 3 4 5 a c b c b 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 the original partial text: a b c 01111 01010 01010 11100 01110 01111 10000 00100 11111 a & transpose “bc” to be “cb” Σ(c) 00110 00101 00101 00100 b Σ(b) Σ(c)>>1 & c a c b k≦1 + k≦1 v a bc bcaccbb So we can tackle this within k≦2. a bc bcaccbb a bc a cb

  46. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=2. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 2 3 4 5 a c b c b 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 the original partial text: a b c b 01111 00101 00101 11110 01111 01010 10000 00010 11111 a b & transpose “cb” to be “bc” 00111 01010 00010 00010 Σ(b) c Σ(c) Σ(b)>>1 & b a b b c k≦1 + k≦1 v So we can tackle this within k≦2. ab cb caccbb ab cb caccbb ac bc ab bc

  47. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=2. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 2 3 4 5 a c b c b 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 the original partial text: c b c 01111 01010 01010 10101 01010 01101 10000 00101 11111 c & transpose “cb” to be “bc” 00111 00101 00101 00101 Σ(c) b Σ(b) Σ(c)>>1 & c c c b k≦1 + k≦1 v abc bc accbb So we can tackle this within k≦2. abc bc accbb a cb c bc

  48. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=2. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 2 3 4 5 a c b c b 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 the original partial text: a b c b c 01111 01010 01010 10101 01010 01101 10000 00101 11111 a b c & transpose “cb” to be “bc” 00111 00101 00101 00101 Σ(c) b Σ(b) Σ(c)>>1 & c a b c c b k≦1 + k≦1 v abc bc accbb So we can tackle this within k≦2. abc bc accbb acb cb abc bc

  49. Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000 Example: T=abcbcaccbb, P=acbcb and k=2. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a b c b c a c c b b a b c b c a c c b b 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 a c b c b 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 2 3 4 5 a c b c b 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 the original partial text: a b c b c 01111 00101 00101 10101 01010 01101 10000 00010 11111 a b c & transpose “cb” to be “bc” 00111 01010 00010 00010 Σ(b) b Σ(c) Σ(b)>>1 & c a b c c b k≦1 + k≦1 v abcbcac cb b So we can tackle this within k≦2. ac bc

  50. Time complexity k: error value m: the length of pattern n : the length of text w : the size of word of computer

More Related