1 / 66

Tree Pattern Matching to Subset Matching in Linear Time

Tree Pattern Matching to Subset Matching in Linear Time. R. Cole and R. Hariharan. Tree Pattern Matching. Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. Output: All nodes in T where P matches. p. t. Subset Matching.

tannar
Download Presentation

Tree Pattern Matching to Subset Matching in Linear Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tree Pattern Matching to Subset Matching in Linear Time R. Cole and R. Hariharan

  2. Tree Pattern Matching • Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. • Output: All nodes in T where P matches. p t

  3. Subset Matching • Input: A set-string T and a set-string P . • Output: All occurrences of P in T. a b c a c b c a c c e f b f b T = a c c b P =

  4. History • Hoffman and O’Donell, 1982, O(nm). • Kosaraju, 1989, O(nm0.75logm). • Dubiner, Galil, and Magen, 1994,O(nm0.5logm). • Cole and Hariharan, 1997, randomized O(nlog3m). • Indyk, 1998, randomized O(nlogn). • Cole, Hariharan, and Indyk, 1999, O(nlog3m). • Cole and Hariharan, 2002, O(nlog2m).

  5. Period • Def : The period of a string s is the smallest number j such that s[i]=s[i+j]. S = 0 0 1 0 0 1 0 0 1 0 0 1 j = 3

  6. 非正式用語(1) • 後面的投影片如果說週期為 θ,意思是以 θ `` 開頭 ”,並且週期為 | θ |。 • | θ | 有時會省略為 θ。 Let θ = 0 0 1, |θ| = 3. S = 0 0 1 0 0 1 0 0 1 0 0 1 Yes θ S = 0 1 0 0 1 0 0 1 0 0 1 0 No

  7. Classical Lemma (1) • s: a string with periodθ s = 把s切兩半,如果切的地方距離開頭不是 θ的整數倍, 則後面那一半的開頭不會是θ。 Ex: s= 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1

  8. Period in linear time • Exercise 1: Design an algorithm to compute the period in linear time.

  9. θ-Path • Def: A path p is a θ-path if its string representation has period θ. Let θ = 0 0 1. p s= 0 0 1 0 0 1 p is a θ-path.

  10. Maximalθ-Path • Def: A θ-path p is maximal if it can not be extended. θ = 0 0 1. not not

  11. Maximalθ-Paths in linear time • Exercise 2: Design an linear time algorithm to find all maximalθ-Paths in a tree.

  12. 非正式用語(2) • 一個node的大小 = 以這個node為root的subtree的大小。 7 4 2 1 2 1 1

  13. Centroid of a tree m = 19 m/2 = 9

  14. Spine of a Tree • Spine = centroid 加強版

  15. Spine of a Tree 0 1 0 1 0 1 0 Link node = Centroid 的最後一個node = Spine上最後一個 ≥ m/2的點 < m/2 ≥ m/2

  16. A Special Case of Tree Pattern Matching • Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. • Output: All nodes in T where P matches Additional constraint: T has only one maximal θ-path, where θis the period of the spine of P.

  17. A Special Case of Tree Pattern Matching P T

  18. Reduce to Subset Matching (1) P 0 0 1 0 0 1 0 B2 B6

  19. Reduce to Subset Matching (2) B6 B2 a b e c d f 0 0 0 a b c e 1 0 0 1 1 a b d e f 0

  20. Reduce to Subset Matching (3) T 肋2 0 0 1 0 0 1 0 0 1 0 0 肋5 肋6

  21. Reduce to Subset Matching (4) 肋2 a a b e b e c f f c d 0 a b c e f 0 0 1 0 0 1 0 0 1 0 0 Time: O( min{ m, 肋2} )

  22. Reduce to Subset Matching (5) • Total Time: ∑i O( min{m, 肋i} ) = O(n)

  23. How about the general case? • 如果找出來的 maximaθ-paths不只一條該怎麼 reduce呢? • 暴力法: 對每一條maximal θ-paths都用剛才的方法reduce成subset matching problem。 • Time?

  24. Where is the intuition come from? • Truncation lemma: If the first |θ| edges of eachmaximal θ-paths are removed, then those truncated paths are disjoint。

  25. Truncation Lemma(1) • 重疊開始的地方和u的距離不可能是θ的整數倍。 u p’ p θ θ θ θ θ

  26. Truncation Lemma(2) • By Classical Lemma: p p’ < |θ|

  27. Warm up is over! • 接下來要做的事: Part 1: 證明 truncated maximal θ-paths 可以在linear time reduce 成 subset matching。 Part 2: 考慮被砍掉的部分該如何解決。

  28. Step 1: Find all maximal θ-paths T P θ Link node

  29. Step 2: Filtering(1) • 把不符合以下三個property的maximal θ-pahts 過濾掉。 • Property 1: ≥ m

  30. Step 2: Filtering (2) Propety 2: ∵ P ≥ m/2 ≥ m/2

  31. Step 2: Filtering (3) • Propety 3: ∵ P ≥θ ≥θ ≥ m/2 ≥ m/2

  32. Step 3: Truncation • 將過濾後每一條maximal θ-paths開頭的θ條edges去掉。

  33. Step 4: Filtering again • 把 truncated maximal θ-paths 再過濾一遍,剩下的這些paths在之後將簡稱為truncated paths.

  34. Step 5: 一條一條reduce成 subset matching Time: ∑truncated paths ∑iO( min{m, 肋i} )

  35. Analysis of Step 5 (1) ∑truncated paths ∑iO( min{m, 肋i} ) = ∑O( min{m, 肋} )

  36. Analysis of Step 5 (2) ∑O( min{m, 肋} ) (大肋 = 大於或等於 m 的肋骨,小肋 = 小於m的肋骨) =∑O( min{m, 大肋} ) + ∑O( min{m, 小肋} ) = O(m * (#大肋)) + ∑O(小肋)

  37. Analysis of Step 5 (3) O(m * (#大肋)) + ∑O(小肋) 剩下只需證明 Part 1. (#大肋) = O( n/m ) Part 2. ∑O(小肋) = n

  38. Analysis of Step 5 (4) < m • Part 2: ∑O(小肋) = n ∵小肋骨 are disjoint 大 小 ≥ m 小 小 小 大 大 小

  39. Marked nodes • Def: A node in t is marked if its left and right subtrees both contain ≥ m nodes.

  40. # marked nodes is O(n/m) m = 2 ≥m ≥m ≥m ≥m ≥m ≥m ≥m

  41. # marked nodes is O(n/m) ≥m ≥m ≥m ≥m ≥m ≥m ≥m ∵(# external nodes) * m ≤ n ∴# external nodes≤ n/m ⇒ # marked nodes = # internal nodes≤ n/m - 1

  42. Analysis of Step 5 (5) • Part 1. (#大肋) = O( n/m ) 一條truncated path上如果有k > 1根大肋骨, 則有k-1 個maked nodes。 大 大 大 大 大

  43. Analysis of Step 5 (6) • 擁有 k > 1根大肋骨的truncated paths上的大肋骨全部加起來是O(n/m)。 • 剩下的問題: 有多少條擁有 k = 1根大肋骨的truncated paths?

  44. Analysis of Step 5 (7) • O(n/m) 條 小 小 ≥ m/2 小 小 大

  45. An observation • 擁有 k > 1根大肋骨的truncated paths只有O(n/m)條。 • 擁有 k = 1根大肋骨的truncated paths只有O(n/m)條。 • 擁有 k = 0根大肋骨的truncated paths只有O(n/m)條。 • 所以truncated paths只有 O(n/m)條。

  46. Disjoint Lemma • Let C be a set of disjoint θ-paths and these θ-paths satisfy property 1~3. Then there are O(n/m) θ- paths in C. • Pf: • 擁有 k > 1根大肋骨的θ-paths只有O(n/m)條。 • 擁有 k = 1根大肋骨的θ-paths只有O(n/m)條。 • 擁有 k = 0根大肋骨的θ-paths只有O(n/m)條。

  47. Review • Step 1: Finding all maximal θ-paths • Step 2: Filtering • Step 3: Truncation • Step 4: Filtering again • Step 5: Reduce to subset mathching

  48. How about the removed parts? P θ θ θ θ Time: O(m)

  49. The Last Job • Step 1: Finding all maximal θ-paths • Step 2: Filtering only O(n/m) paths left. • Step 3: Truncation • Step 4: Filtering again • Step 5: Reduce to subset mathching

  50. Tail Lemma • path的尾巴不會被其他path碰到。

More Related