490 likes | 511 Views
Learn about Splay Trees, invented by Sleator and Tarjan, for fast data access and management. Explore their amortized analysis, operations like insert, find, delete, and the benefits of splaying. Discover their applications in practical scenarios and their importance in optimizing data management tasks like word counting, range queries, and more.
E N D
CSE 326: Data Structures Trees Lecture 8: Friday, Jan 24, 2003
Today: Splay Trees • Fast both in worst-caseamortized analysis and in practice • Are used in the kernel of NT for keep track of process information! • Invented by Sleator and Tarjan (1985) • Details: • Weiss 4.5 (basic splay trees) • 11.5 (amortized analysis) • 12.1 (better “top down” implementation)
Basic Idea “Blind” rebalancing – no height info kept! • Worst-case time per operation is O(n) • Worst-case amortized time is O(log n) • Insert/find always rotates node to the root! • Good locality: • Most commonly accessed keys move high in tree – become easier and easier to find
Since you’re down there anyway, fix up a lot of deep nodes! Idea move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary 10 You’re forced to make a really deep access: 17 5 2 9 3
Helped Unchanged Hurt Zig-Zag* g n up 2 X p g p down 1 down 1 up 1 n W X Y Z W Y Z *This is just a double rotation
Zig-Zig g n W p p Z X n g Y Y Z W X
Why Splaying Helps • Node n and its children are always helped (raised) • Except for last step, nodes that are hurt by a zig-zag or zig-zig are later helped by a rotation higher up the tree! • Result: • shallow nodes may increase depth by one or two • helped nodes decrease depth by a large amount • If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay • Exceptions are the root, the child of the root, and the node splayed
6 5 4 Splaying Example 1 1 2 2 zig-zig 3 3 Find(6) 4 5 6
6 5 4 Still Splaying 6 1 1 2 6 zig-zig 3 3 2 5 4
6 1 Almost There, Stay on Target 1 6 zig 3 3 2 5 2 5 4 4
6 6 1 1 Splay Again zig-zag 3 4 Find(4) 2 5 3 5 4 2
6 1 Example Splayed Out 4 1 6 zig-zag 3 5 4 2 3 5 2
Locality • “Locality” – if an item is accessed, it is likely to be accessed again soon • Why? • Assume mn access in a tree of size n • Total worst case time is O(m log n) • O(log n) per access amortized time • Suppose only k distinct items are accessed in the m accesses. • Time is O(n log n + m logk ) • Compare with O( m log n ) for AVL tree those k items are all at the top of the tree getting those k items near root
Splay Operations: Insert • To insert, could do an ordinary BST insert • but would not fix up tree • A BST insert followed by a find (splay)? • Better idea: do the splay before the insert! • How?
Split Split(T, x) creates two BST’s L and R: • All elements of T are in either L or R • All elements in L are x • All elements in R are x • L and R share no elements Then how do we do the insert?
Split Split(T, x) creates two BST’s L and R: • All elements of T are in either L or R • All elements in L are x • All elements in R are > x • L and R share no elements Then how do we do the insert? Insert as root, with children L and R
Splitting in Splay Trees • How can we split? • We have the splay operation • We can find x or the parent of where x would be if we were to insert it as an ordinary BST • We can splay x or the parent to the root • Then break one of the links from the root to a child
could be x, or what would have been the parent of x Split split(x) splay T L R if root is x if root is > x OR L R L R • x > x < x • > x
split(x) L R Back to Insert Insert(x): Split on x Join subtrees using x as root x L R x > x
Insert(5) Insert Example 6 4 4 6 1 9 split(5) 1 6 1 9 9 4 7 2 2 7 7 2 5 4 6 1 9 2 7
find(x) L R Splay Operations: Delete x delete x L R < x > x Now what?
splay L R R Join • Join(L, R): given two trees such that L < R, merge them • Splay on the maximum element in L then attach R L
find(x) L R Delete Completed x T delete x L R < x > x Join(L,R) T - x
Delete(4) Delete Example 6 4 6 1 9 find(4) 1 6 1 9 9 4 7 2 2 7 Find max 7 2 2 2 1 6 1 6 9 9 7 7
Splay Trees, Summary • Splay trees are arguably the most practical kind of self-balancing trees • If number of finds is much larger than n, then locality is crucial! • Example: word-counting • Also supports efficient Split and Join operations – useful for other tasks • E.g., range queries
Dictionary & Search ADTs • Dictionary ADT (aka map ADT) Stores values associated with user-specified keys • keys may be any (homogenous) comparable type • values may be any (homogenous) type • Search ADT: (aka Set ADT)stores keys only
Dictionary & Search ADTs create : dictionary insert : dictionary key values dictionary find : dictionary key values delete : dictionary key dictionary insert(kohlrabi, upscale tuber) find(kreplach) kreplach:tasty stuffed dough
Dictionary Implementations • Arrays: • Unsorted • Sorted • Linked lists • BST • Random • AVL • Splay
The last dictionary we discuss:B-Trees • Suppose we want to store the data on disk • A disk access is a lot more expensive than one CPU operation • Example • 1,000,000 entries in the dictionary • An AVL tree requires log(1,000,000) 20 disk accesses – this is expensive • Idea in B Trees: • Increase the fan-out, decrease the hight • Make 1 node = 1 block
B-Trees Basics • All keys are stored at leaves • Nonleaf nodes have guidance keys, to help the search • Parameter d = the degree book uses theorder M = 2d+1) • Rules for Keys: • The root is either a leaf, or has between 1 and 2d keys • All other nodes (except the root) have between d and 2d keys • Rule for number of children: • Each node (except leaves) has one more children than keys • Balance rule: • The tree is perfectly balanced !
B-Trees Basics • A non-leaf node: • A leaf node: Keys k < 30 30<=k<120 120<=k<240 Keys 240<=k Then called a B+ tree Next leaf Record with key 40 Record with key 50 Record with key 60
B+Tree Example d = 2 (M = 5) Find the key 40 40 80 20 < 40 60 30 < 40 40 10 15 18 20 30 40 50 60 65 80 85 90
B+Tree Design • How large d ? • Example: • Key size = 4 bytes • Pointer size = 8 bytes • Block size = 4096 byes • 2d x 4 + (2d+1) 8 <= 4096 • d = 170
B+ Trees Depth • Assume d = 170 • How deep is the B-tree ? • Depth = 0 (just the root) at least 170 keys • Depth = 1 at least 170+170171 30103 keys • Depth = 2 170+170171+1701712 5106 keys • Depth = 3 170+...+1701713 860 106 keys • Depth = 4 170+...+1701714 147 109 keys Nobody has more keys ! With a B tree we can find any data item with at most 5 disk accesses !
Insertion in a B+ Tree Insert (K, P) • Find leaf where K belongs, insert • If no overflow (2d keys or less), halt • If overflow (2d+1 keys), split node, insert in parent: • If leaf, keep K3 too in right node • When root splits, new root has 1 key only parent K3 parent
Insertion in a B+ Tree Insert K=19 10 15 18 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree Now insert 25 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree But now have to split ! 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After the split 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree Delete 30 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 30 May change to 40, or not 10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree Now delete 25 10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree Now delete 40 10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 10 15 18 19 20 50 60 65 80 85 90
Deletion from a B+ Tree Final tree 10 15 18 19 20 50 60 65 80 85 90