1 / 27

Fast Trie Data Structures

Fast Trie Data Structures. Seminar On Advanced Topics In Data Structures Jacob Katz December 1, 2001 Dan E. Willard, 1981, “ New Trie Data Structures Which Support Very Fast Search Operations ”. Agenda. Problem statement Existing solutions and motivation for a new one

alida
Download Presentation

Fast Trie Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Trie Data Structures Seminar On Advanced Topics In Data Structures Jacob Katz December 1, 2001 Dan E. Willard, 1981, “New Trie Data Structures Which Support Very Fast Search Operations”

  2. Agenda • Problem statement • Existing solutions and motivation for a new one • P-Fast tries & their complexity • Q-Fast tries & their complexity • X-Fast tries & their complexity • Y-Fast tries & their complexity

  3. Problem statement • Let S be a set of N records with distinct integer keys in range [0, M], with the following operations: • MEMBER(K) – does the key K belong to the set • SUCCESSOR(K) – find the least element which is greater than K • PREDECESSOR(K) – find the greatest element which is less than K • SUBSET(K1, K2) – produce a list of elements whose keys lie between K1 and K2 • The problem: efficient data structure supporting this definition

  4. Existing solutions • AVL trees, 2-3 trees use O(N) space and O(log N) time in worst case • With no restriction on the keys better performance is impossible • Expected O(log log N) time is possible when keys are uniformly distributed • Stratified trees use O(M * log log M) space and O(log log M) time in worst case for integer keys in range [0, M] • Disadvantage: O(M * log log M) space is much larger when O(N), if M >> N

  5. Motivation for another solution • More space-efficient data structure is wanted for restricted keys, which still maintains the time efficiency…

  6. The way to the solution • We first define P-Fast Trie: • O( ) time; O(N * * 2 ) space • Then show Q-Fast Trie • improvement to the space requirement to O(N) • Then show X-Fast Trie • O(log log M) time; O(N*log M) space; no dynamic operations • Then show Y-Fast Trie • O(log log M) time; O(N) space; no dynamic operations

  7. root 2 4 3 0 4 1 2 2 3 2 20 22 24 31 32 42 43 What’s Trie • Trie of size (h, b) is a tree of height h and branching factor b • All keys can be regarded as integers in range [0, bh] • Each key K can be represented as h-digit number in base b: K1K2K3…Kh • Keys are stored in the leaf level; path from the root resembles decomposition of the keys to digits

  8. Trivial Trie • In each node store vector of branches • MEMBER(K) – O(h) • visits O(h) nodes, spends O(1) time in each • SUCCESSOR(K)/PREDECESSOR(K) – O(h*b) • visits O(h) nodes, spend O(b) time in each node • this is too much time • Observation: increasing b (the base of key representation, the branching factor) decreases h (number of digits required to represent a key, the height of the tree) and vice versa

  9. root b-1 b-1 b-1 bh-1 Example for worst case complexity

  10. P-Fast Trie Idea • Improve SUCCESSOR(k)/PREDECESSOR(k) time by overcoming the linear search in every intermediate node

  11. P-Fast Trie • Each internal node v has additional fields: • LOWKEY(v)– leaf node containing the smallest key descending from v • HIGHKEY(v)– leaf node containing the largest key descending from v • INNERTREE(v)– binary tree of worst-case height O(log b) representing the set of digits directly descending from v • Each leaf node points to its immediate neighbors on the left and on the right • CLOSEMATCH(K)– query returning the node with key K if it exists in the trie; returning PREDECESSOR(K) or SUCCESSOR(K) otherwise

  12. CLOSEMATCH(k) Algorithm Intuitively • Starting from Root, look for k=k1k2..kh • If found, return it • If not, then v is the node at depth j from which there’s no way down any more: kj Ï INNERTREE(v) • Looking for kj in INNERTREE(v), find D – existing digit in INNERTREE(v) that is either: • the least digit greater than kj • the greatest digit less than kj • If D > kj, then return LOWKEY(d’s child of v), else if D < kj, then return HIGHKEY(d’s child of v)

  13. P-Fast Trie Complexities • CLOSEMATCH(K) time complexity is O(h + log b) • Other queries require O(1) addition to the CLOSEMATCH(K) complexity • Space complexity of such trie is O(h*b*N) • Representing the input keys in base 2 requires digits, therefore with such h and b the desired complexities are achieved

  14. Q-Fast Trie Idea • Improve space by splitting the set of keys into subsets • How to split is the problem: • To preserve the time complexity • To decrease the space complexity

  15. Q-Fast Trie • Let S’ denote the ordered list of keys from S: 0 = K1 < K2 < K3 < … < KL < M • Define: Si = {K Î S | Ki£ K £ Ki+1} for i < L SL = {K Î S | K ³ KL} • S’ is a c-partition of S iff each Si has cardinality in range [c, 2c-1] • Q-Fast Trie of size (h, b, c) is a two-level structure: • Upper part: p-fast trie T of size (h, b) representing set S’ which is a c-partition of S • Lower part: forest of 2-3 trees, where ith tree represents Si • The leafs of 2-3 trees are connected to form an ordered list

  16. 0 35 71 10 17 33 35 70 77 81 95 99 Example of Q-Fast Trie

  17. CLOSEMATCH(k) Algorithm Intuitively • Look for D=PREDECESSOR(k) in the upper part • O(h + log b) • Then search the D’s 2-3 tree for k • O(log c)

  18. Q-Fast Trie Complexities • CLOSEMATCH(K) time complexity is O(h + log b + log c) • Other queries require O(1) addition to the CLOSEMATCH(K) complexity • Space complexity is O(N+N*h*b/c) • By choosing h = , b = 2 , c = h*b, the desired complexities are achieved

  19. P/Q-Fast Trie Insertion/Deletion • P-fast trie • Use AVL trees for INNERTREEs • O(h + log b) for insertion/deletion • Q-fast trie • O(h + log b + log c) for insertion/deletion • Maintenance of c-partition property through trees splitting/merging in O(log c) time

  20. X-Fast Trie Idea • P/Q-Fast trie uses top-down search to get to the wanted level, making binary search in each node on the way. • Thus, P/Q-Fast Trie relies on the balance between the height of the tree and the branching factor • X-Fast trie idea: Use binary search of the wanted level • Requires to be possible to find the wanted node by knowing its level without top-down pass • For the purpose of worst case complexity the branching factor is not important any more, since it only affects the basis of the log

  21. X-Fast Trie • Part 1: Trie of height h and branching factor 2 (representing all keys in binary) • Each node has additional field DESCENDANT(v): • If v has only right branch, it points to the largest leaf descending from v (thru the left branch) • If v has only left branch, it points to the smallest leaf descending from v (thru the right branch) • All leaves form doubly-linked list • Node v at height j may have descending leaves only in range [(i-1)*2j+1, i*2j] for some integer i; this i is called ID(v) • Node v at height j is called ancestor of key K, if K/2j=ID(v) • BOTTOM(k) is the lowest ancestor of K

  22. X-Fast Trie • Part 2: h+1 Level Search Structures (LSS), each of which uses perfect hashing as we have seen in the first lecture: • Linear space & constant time

  23. BOTTOM(k) Algorithm Intuitively • Make binary search among the h+1 different LSSs • Searching each LSS is O(1) • h = log M, therefore binary search of h+1 LSSs is O(log log M)

  24. X-Fast Trie Complexities • BOTTOM(k) is O(log log M) • All queries require O(1) addition to BOTTOM(k), with assistance of the DESCENDANT field and the doubly-linked list: • BOTTOM(K) is either K itself, or its DESCENDANT is PREDECESSOR(K)/SUCCESSOR(K) • Space is O(N * log M) • No more than h * N nodes in the trie (h=log M) • log M LSSs each using O(N) space

  25. Y-Fast Trie Idea • Apply similar partitioning technique, as done for P-Fast trie to move to Q-Fast trie: c-partitioning of all the keys to L subsets each containing [c, 2c-1] keys • Upper part: X-Fast trie representing S’ • Lower part: forest of binary trees of height log c

  26. Y-Fast Trie Complexities • Upper part can be searched within O(log log M) time and occupies no more than O((N/c) * log M) space • Each binary tree can be searched within O(log c) and they all together occupy O(N) space • Choosing c=log M: O(N) space; O(log log M) time

  27. X/Y-Fast Trie Insertion/Deletion • LSSs have practically uncontrolled time complexity for dynamic operations • At least at the time the article was presented • Therefore, X/Y-Fast tries inherit this limitation

More Related