140 likes | 232 Views
More Specialized Data Structures. String data structures Spatial data structures. String Data Structures. String Operations. String indexing Pattern matching Find pattern P in text T Find common substrings among a set of a strings Application Domains Bioinformatics Google search!.
E N D
More Specialized Data Structures String data structures Spatial data structures Cpt S 223. School of EECS, WSU
String Data Structures Cpt S 223. School of EECS, WSU
String Operations • String indexing • Pattern matching • Find pattern P in text T • Find common substrings among a set of a strings • Application Domains • Bioinformatics • Google search! Cpt S 223. School of EECS, WSU
A simplified hash table for strings 0.Build a lookup table of size |Σ|wfor all w-length words in D 1 2 3 4 5 6 7 Σ={A,C,G,T} w = 2 42 (=16) entries in lookup table S1: C A G T C C T S2: C G T T C G C Lookup table: AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT S1,4 S1,2 S1,1 S1,5 S1,3 S1,6 S2,1 S2,6 S2,3 S2,2 S2,4 S2,5 Cpt S 223. School of EECS, WSU
PATRICIA trees • “Practical Algorithm to Retrieve Information Coded in Alphanumeric” • Compacted trie of a set of strings • Dictionary searches made easy Cpt S 223. School of EECS, WSU
Suffix Tree • Compacted trie of all suffixes of a string 1 2 3 4 5 6 B A N A N A Find Pattern: “ANAN” Think how to implement Google Search? Cpt S 223. School of EECS, WSU
Generalized Suffix Tree (GST) WINDOW$ INDIGO$ 1234567 1234567 $ D ND I $OG O W (1, 7) (2, 7) (2, 5) ND OW$ $ $OGI OW$ $OGI $OG $W INDOW$ $ (2, 4) (2, 2) (1, 3) (1, 5) (2, 6) (2, 3) (1, 4) $OGI OW$ (1, 6) (1, 1) (2, 1) (1, 2) Cpt S 223. School of EECS, WSU
Spatial Data Structures Cpt S 223. School of EECS, WSU
Spatial Data Structures Bounding rectangle Points in 2-D Cpt S 223. School of EECS, WSU
c … F D E G …. Recursive Bisection Quad trees(4-way trees) • Technique for spatial domain decomposition root Cpt S 223. School of EECS, WSU Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005
Compact path into single edge Compacted Quad-trees (for 2D data) 2D space with data Quad-tree decomposition N E • Each node has exactly 4 children (for 4 quadrants) • For 3D data, the corresponding tree is called an oct-tree Cpt S 223. School of EECS, WSU Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005
(a1,b1) Range Query Result (a2,b2) Range Queries on Quad-trees (0,0) Cpt S 223. School of EECS, WSU
Oct-Trees (for 3D data) • Issue: • What happens if • the data is unevenly • (ie., non-uniformly)distributed ? • Most of the levels in the tree will be empty Solution: “Compacted Oct-trees” Cpt S 223. School of EECS, WSU
k-d trees (for k dimensions) • Maintain a combined binary search tree for all dimensions • Recursively bisect each dimension, alternating dimensions at each level of the tree Cpt S 223. School of EECS, WSU