660 likes | 923 Views
Tree Pattern Matching to Subset Matching in Linear Time. R. Cole and R. Hariharan. Tree Pattern Matching. Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. Output: All nodes in T where P matches. p. t. Subset Matching.
E N D
Tree Pattern Matching to Subset Matching in Linear Time R. Cole and R. Hariharan
Tree Pattern Matching • Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. • Output: All nodes in T where P matches. p t
Subset Matching • Input: A set-string T and a set-string P . • Output: All occurrences of P in T. a b c a c b c a c c e f b f b T = a c c b P =
History • Hoffman and O’Donell, 1982, O(nm). • Kosaraju, 1989, O(nm0.75logm). • Dubiner, Galil, and Magen, 1994,O(nm0.5logm). • Cole and Hariharan, 1997, randomized O(nlog3m). • Indyk, 1998, randomized O(nlogn). • Cole, Hariharan, and Indyk, 1999, O(nlog3m). • Cole and Hariharan, 2002, O(nlog2m).
Period • Def : The period of a string s is the smallest number j such that s[i]=s[i+j]. S = 0 0 1 0 0 1 0 0 1 0 0 1 j = 3
非正式用語(1) • 後面的投影片如果說週期為 θ,意思是以 θ `` 開頭 ”,並且週期為 | θ |。 • | θ | 有時會省略為 θ。 Let θ = 0 0 1, |θ| = 3. S = 0 0 1 0 0 1 0 0 1 0 0 1 Yes θ S = 0 1 0 0 1 0 0 1 0 0 1 0 No
Classical Lemma (1) • s: a string with periodθ s = 把s切兩半,如果切的地方距離開頭不是 θ的整數倍, 則後面那一半的開頭不會是θ。 Ex: s= 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
Period in linear time • Exercise 1: Design an algorithm to compute the period in linear time.
θ-Path • Def: A path p is a θ-path if its string representation has period θ. Let θ = 0 0 1. p s= 0 0 1 0 0 1 p is a θ-path.
Maximalθ-Path • Def: A θ-path p is maximal if it can not be extended. θ = 0 0 1. not not
Maximalθ-Paths in linear time • Exercise 2: Design an linear time algorithm to find all maximalθ-Paths in a tree.
非正式用語(2) • 一個node的大小 = 以這個node為root的subtree的大小。 7 4 2 1 2 1 1
Centroid of a tree m = 19 m/2 = 9
Spine of a Tree • Spine = centroid 加強版
Spine of a Tree 0 1 0 1 0 1 0 Link node = Centroid 的最後一個node = Spine上最後一個 ≥ m/2的點 < m/2 ≥ m/2
A Special Case of Tree Pattern Matching • Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. • Output: All nodes in T where P matches Additional constraint: T has only one maximal θ-path, where θis the period of the spine of P.
Reduce to Subset Matching (1) P 0 0 1 0 0 1 0 B2 B6
Reduce to Subset Matching (2) B6 B2 a b e c d f 0 0 0 a b c e 1 0 0 1 1 a b d e f 0
Reduce to Subset Matching (3) T 肋2 0 0 1 0 0 1 0 0 1 0 0 肋5 肋6
Reduce to Subset Matching (4) 肋2 a a b e b e c f f c d 0 a b c e f 0 0 1 0 0 1 0 0 1 0 0 Time: O( min{ m, 肋2} )
Reduce to Subset Matching (5) • Total Time: ∑i O( min{m, 肋i} ) = O(n)
How about the general case? • 如果找出來的 maximaθ-paths不只一條該怎麼 reduce呢? • 暴力法: 對每一條maximal θ-paths都用剛才的方法reduce成subset matching problem。 • Time?
Where is the intuition come from? • Truncation lemma: If the first |θ| edges of eachmaximal θ-paths are removed, then those truncated paths are disjoint。
Truncation Lemma(1) • 重疊開始的地方和u的距離不可能是θ的整數倍。 u p’ p θ θ θ θ θ
Truncation Lemma(2) • By Classical Lemma: p p’ < |θ|
Warm up is over! • 接下來要做的事: Part 1: 證明 truncated maximal θ-paths 可以在linear time reduce 成 subset matching。 Part 2: 考慮被砍掉的部分該如何解決。
Step 1: Find all maximal θ-paths T P θ Link node
Step 2: Filtering(1) • 把不符合以下三個property的maximal θ-pahts 過濾掉。 • Property 1: ≥ m
Step 2: Filtering (2) Propety 2: ∵ P ≥ m/2 ≥ m/2
Step 2: Filtering (3) • Propety 3: ∵ P ≥θ ≥θ ≥ m/2 ≥ m/2
Step 3: Truncation • 將過濾後每一條maximal θ-paths開頭的θ條edges去掉。
Step 4: Filtering again • 把 truncated maximal θ-paths 再過濾一遍,剩下的這些paths在之後將簡稱為truncated paths.
Step 5: 一條一條reduce成 subset matching Time: ∑truncated paths ∑iO( min{m, 肋i} )
Analysis of Step 5 (1) ∑truncated paths ∑iO( min{m, 肋i} ) = ∑O( min{m, 肋} )
Analysis of Step 5 (2) ∑O( min{m, 肋} ) (大肋 = 大於或等於 m 的肋骨,小肋 = 小於m的肋骨) =∑O( min{m, 大肋} ) + ∑O( min{m, 小肋} ) = O(m * (#大肋)) + ∑O(小肋)
Analysis of Step 5 (3) O(m * (#大肋)) + ∑O(小肋) 剩下只需證明 Part 1. (#大肋) = O( n/m ) Part 2. ∑O(小肋) = n
Analysis of Step 5 (4) < m • Part 2: ∑O(小肋) = n ∵小肋骨 are disjoint 大 小 ≥ m 小 小 小 大 大 小
Marked nodes • Def: A node in t is marked if its left and right subtrees both contain ≥ m nodes.
# marked nodes is O(n/m) m = 2 ≥m ≥m ≥m ≥m ≥m ≥m ≥m
# marked nodes is O(n/m) ≥m ≥m ≥m ≥m ≥m ≥m ≥m ∵(# external nodes) * m ≤ n ∴# external nodes≤ n/m ⇒ # marked nodes = # internal nodes≤ n/m - 1
Analysis of Step 5 (5) • Part 1. (#大肋) = O( n/m ) 一條truncated path上如果有k > 1根大肋骨, 則有k-1 個maked nodes。 大 大 大 大 大
Analysis of Step 5 (6) • 擁有 k > 1根大肋骨的truncated paths上的大肋骨全部加起來是O(n/m)。 • 剩下的問題: 有多少條擁有 k = 1根大肋骨的truncated paths?
Analysis of Step 5 (7) • O(n/m) 條 小 小 ≥ m/2 小 小 大
An observation • 擁有 k > 1根大肋骨的truncated paths只有O(n/m)條。 • 擁有 k = 1根大肋骨的truncated paths只有O(n/m)條。 • 擁有 k = 0根大肋骨的truncated paths只有O(n/m)條。 • 所以truncated paths只有 O(n/m)條。
Disjoint Lemma • Let C be a set of disjoint θ-paths and these θ-paths satisfy property 1~3. Then there are O(n/m) θ- paths in C. • Pf: • 擁有 k > 1根大肋骨的θ-paths只有O(n/m)條。 • 擁有 k = 1根大肋骨的θ-paths只有O(n/m)條。 • 擁有 k = 0根大肋骨的θ-paths只有O(n/m)條。
Review • Step 1: Finding all maximal θ-paths • Step 2: Filtering • Step 3: Truncation • Step 4: Filtering again • Step 5: Reduce to subset mathching
How about the removed parts? P θ θ θ θ Time: O(m)
The Last Job • Step 1: Finding all maximal θ-paths • Step 2: Filtering only O(n/m) paths left. • Step 3: Truncation • Step 4: Filtering again • Step 5: Reduce to subset mathching
Tail Lemma • path的尾巴不會被其他path碰到。