490 likes | 509 Views
CSE 326: Data Structures Trees. Lecture 8: Friday, Jan 24, 2003. Today: Splay Trees. Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep track of process information! Invented by Sleator and Tarjan (1985) Details:
E N D
CSE 326: Data Structures Trees Lecture 8: Friday, Jan 24, 2003
Today: Splay Trees • Fast both in worst-caseamortized analysis and in practice • Are used in the kernel of NT for keep track of process information! • Invented by Sleator and Tarjan (1985) • Details: • Weiss 4.5 (basic splay trees) • 11.5 (amortized analysis) • 12.1 (better “top down” implementation)
Basic Idea “Blind” rebalancing – no height info kept! • Worst-case time per operation is O(n) • Worst-case amortized time is O(log n) • Insert/find always rotates node to the root! • Good locality: • Most commonly accessed keys move high in tree – become easier and easier to find
Since you’re down there anyway, fix up a lot of deep nodes! Idea move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary 10 You’re forced to make a really deep access: 17 5 2 9 3
Helped Unchanged Hurt Zig-Zag* g n up 2 X p g p down 1 down 1 up 1 n W X Y Z W Y Z *This is just a double rotation
Zig-Zig g n W p p Z X n g Y Y Z W X
Why Splaying Helps • Node n and its children are always helped (raised) • Except for last step, nodes that are hurt by a zig-zag or zig-zig are later helped by a rotation higher up the tree! • Result: • shallow nodes may increase depth by one or two • helped nodes decrease depth by a large amount • If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay • Exceptions are the root, the child of the root, and the node splayed
6 5 4 Splaying Example 1 1 2 2 zig-zig 3 3 Find(6) 4 5 6
6 5 4 Still Splaying 6 1 1 2 6 zig-zig 3 3 2 5 4
6 1 Almost There, Stay on Target 1 6 zig 3 3 2 5 2 5 4 4
6 6 1 1 Splay Again zig-zag 3 4 Find(4) 2 5 3 5 4 2
6 1 Example Splayed Out 4 1 6 zig-zag 3 5 4 2 3 5 2
Locality • “Locality” – if an item is accessed, it is likely to be accessed again soon • Why? • Assume mn access in a tree of size n • Total worst case time is O(m log n) • O(log n) per access amortized time • Suppose only k distinct items are accessed in the m accesses. • Time is O(n log n + m logk ) • Compare with O( m log n ) for AVL tree those k items are all at the top of the tree getting those k items near root
Splay Operations: Insert • To insert, could do an ordinary BST insert • but would not fix up tree • A BST insert followed by a find (splay)? • Better idea: do the splay before the insert! • How?
Split Split(T, x) creates two BST’s L and R: • All elements of T are in either L or R • All elements in L are x • All elements in R are x • L and R share no elements Then how do we do the insert?
Split Split(T, x) creates two BST’s L and R: • All elements of T are in either L or R • All elements in L are x • All elements in R are > x • L and R share no elements Then how do we do the insert? Insert as root, with children L and R
Splitting in Splay Trees • How can we split? • We have the splay operation • We can find x or the parent of where x would be if we were to insert it as an ordinary BST • We can splay x or the parent to the root • Then break one of the links from the root to a child
could be x, or what would have been the parent of x Split split(x) splay T L R if root is x if root is > x OR L R L R • x > x < x • > x
split(x) L R Back to Insert Insert(x): Split on x Join subtrees using x as root x L R x > x
Insert(5) Insert Example 6 4 4 6 1 9 split(5) 1 6 1 9 9 4 7 2 2 7 7 2 5 4 6 1 9 2 7
find(x) L R Splay Operations: Delete x delete x L R < x > x Now what?
splay L R R Join • Join(L, R): given two trees such that L < R, merge them • Splay on the maximum element in L then attach R L
find(x) L R Delete Completed x T delete x L R < x > x Join(L,R) T - x
Delete(4) Delete Example 6 4 6 1 9 find(4) 1 6 1 9 9 4 7 2 2 7 Find max 7 2 2 2 1 6 1 6 9 9 7 7
Splay Trees, Summary • Splay trees are arguably the most practical kind of self-balancing trees • If number of finds is much larger than n, then locality is crucial! • Example: word-counting • Also supports efficient Split and Join operations – useful for other tasks • E.g., range queries
Dictionary & Search ADTs • Dictionary ADT (aka map ADT) Stores values associated with user-specified keys • keys may be any (homogenous) comparable type • values may be any (homogenous) type • Search ADT: (aka Set ADT)stores keys only
Dictionary & Search ADTs create : dictionary insert : dictionary key values dictionary find : dictionary key values delete : dictionary key dictionary insert(kohlrabi, upscale tuber) find(kreplach) kreplach:tasty stuffed dough
Dictionary Implementations • Arrays: • Unsorted • Sorted • Linked lists • BST • Random • AVL • Splay
The last dictionary we discuss:B-Trees • Suppose we want to store the data on disk • A disk access is a lot more expensive than one CPU operation • Example • 1,000,000 entries in the dictionary • An AVL tree requires log(1,000,000) 20 disk accesses – this is expensive • Idea in B Trees: • Increase the fan-out, decrease the hight • Make 1 node = 1 block
B-Trees Basics • All keys are stored at leaves • Nonleaf nodes have guidance keys, to help the search • Parameter d = the degree book uses theorder M = 2d+1) • Rules for Keys: • The root is either a leaf, or has between 1 and 2d keys • All other nodes (except the root) have between d and 2d keys • Rule for number of children: • Each node (except leaves) has one more children than keys • Balance rule: • The tree is perfectly balanced !
B-Trees Basics • A non-leaf node: • A leaf node: Keys k < 30 30<=k<120 120<=k<240 Keys 240<=k Then called a B+ tree Next leaf Record with key 40 Record with key 50 Record with key 60
B+Tree Example d = 2 (M = 5) Find the key 40 40 80 20 < 40 60 30 < 40 40 10 15 18 20 30 40 50 60 65 80 85 90
B+Tree Design • How large d ? • Example: • Key size = 4 bytes • Pointer size = 8 bytes • Block size = 4096 byes • 2d x 4 + (2d+1) 8 <= 4096 • d = 170
B+ Trees Depth • Assume d = 170 • How deep is the B-tree ? • Depth = 0 (just the root) at least 170 keys • Depth = 1 at least 170+170171 30103 keys • Depth = 2 170+170171+1701712 5106 keys • Depth = 3 170+...+1701713 860 106 keys • Depth = 4 170+...+1701714 147 109 keys Nobody has more keys ! With a B tree we can find any data item with at most 5 disk accesses !
Insertion in a B+ Tree Insert (K, P) • Find leaf where K belongs, insert • If no overflow (2d keys or less), halt • If overflow (2d+1 keys), split node, insert in parent: • If leaf, keep K3 too in right node • When root splits, new root has 1 key only parent K3 parent
Insertion in a B+ Tree Insert K=19 10 15 18 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree Now insert 25 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree But now have to split ! 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After the split 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree Delete 30 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 30 May change to 40, or not 10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree Now delete 25 10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree Now delete 40 10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 10 15 18 19 20 50 60 65 80 85 90
Deletion from a B+ Tree Final tree 10 15 18 19 20 50 60 65 80 85 90