1 / 30

Designing Concurrent Search Structure Algorithms

Designing Concurrent Search Structure Algorithms. Dennis Shasha. What is a Search Structure?. Data structure (typically a B tree, hash structure, R-tree, etc.) that supports a dictionary. Operations are insert key-value pair, delete key-value pair, and search for key-value pair.

vpena
Download Presentation

Designing Concurrent Search Structure Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Concurrent Search Structure Algorithms Dennis Shasha

  2. What is a Search Structure? • Data structure (typically a B tree, hash structure, R-tree, etc.) that supports a dictionary. • Operations are insert key-value pair, delete key-value pair, and search for key-value pair.

  3. How to make a search structure algorithm concurrent • Naïve approach: use two phase locking (but then at the very least the root is read-locked so lock conflicts are frequent). • Semi-naïve algorithm: use hierarchical tree locking: lock root; afterwards lock node n only if you hold lock on parent of n. (Still tends to hold locks high in tree.)

  4. How can we do better: fundamental insight • In a search structure algorithm, all that we really care about is that we implement the dictionary operations correctly. • Operations on structure need not even be serializable provided they maintain certain constraints.

  5. Train Your Intuition:parable of the library • Imagine a library with books. • It’s a little old fashion so there are still card catalogues that identify the shelf where a book is held. • Bob wants to get a book B. • Alice is working on reorganizing the library by moving books from shelf to shelf and then changing the card catalogue.

  6. Parable of the library: interleaving of ops • Bob 1. look up book B in catalogue. • Bob 2. read “go to shelf S” • Bob 3. Start walking but see friend. • Alice 1: move several books from S to S’, leaving a note. • Alice 2: change catalogue so B maps to S’ • Bob 4: go to S, follow note to S’

  7. Parable of the library: observations • Not conflict-preserving serializable:Bob  Alice (Bob reads catalog then Alice changes it)Alice  Bob(Alice modifies S before Bob reads) • Indeed in no serial execution would Bob go to two shelves. • Yet execution is completely ok!

  8. Parable of the library: what’s going on? • All we care about is that 1. structure is ok after Alice finishes.2. Bob gets his book if it’s there • We want to find a general theory for this. • Ref: Vossen Weikum book and``Concurrent Search Structure Algorithms'‘ D. Shasha and N. Goodman, ACM Transactions on Database Systems, vol. 13, no. 1,pp. 53-90, March 1988.

  9. Good Structure for any Dictionary Data Structure • Dictionary holds a set of key-value pairs. Values don’t matter for our theory so consider just the set of keys that could be present, denoted keyspace. Example: all natural numbers. • From the root (in general, any root), must be able to navigate to a node n such that n either has a key being sought or no node has that key.

  10. Example: binary search tree 50 Inset = Keyspace Inset = {x| x > 50} Inset = {x| x < 50} 70 10 Inset = {x| x < 50 and x > 10} 35

  11. Inset, Outset, Keyset Inset(n) is the subset of Keyspace that are either in n or could be reachable (according to the rules of the structure) from n • Edgeset(n,n’) is the subset of Keyspacedirected to descendant n’ of n. Union of all edgesets with source n is outset(n) • Keyset(n) = Inset(n) – Outset(n). The set of keys that are in node n or nowhere.

  12. Notes Inset(n) = union over all edges (m,n) of inset(m) ^ edgeset(m,n). • Note that Edgeset(n,n’) need not always be a subset of Inset(n). You’ll see why this is good later.

  13. Example: binary search treeKeyspace is all integers 50 Inset = Keyspace; keyset = {50} Outset = {x|x!=50} Inset = {x| x < 50} Keyset = Inset – {x| x > 10} = {x| x <= 10} 70 Inset = {x| x > 50} = edgeset(node 50, node 70) Keyset = Inset 10 Inset = {x| x < 50 and x > 10} edgeset (node 10, node 35) = {x|x > 10} Keyset = Inset 35

  14. Structure Goodness Conditions • The keysets of the nodes partition the keyspace.So U {Keyset(n) | n is a node} = Keyspaceand if n!=n’ then keyset(n) is disjoint from keyset(n’). • Edgsets leaving node n are disjoint • Let Existkeys(n) be the keys actually present at node n. Existkeys(n) is a subset of keyset(n).

  15. Structure Goodness Conditions(applies to each root) • In the library, suppose that initially, inset(shelf S) = {books | authors begin with “S”}.Afterwards, outset(S) = {books|author names begin with “Sh” or later} • At end keyset(S) = books having names starting with Sa through Sg. Inset(S’)= books having names starting with Sh through Sz.

  16. Example: library at beginning Cat Inset of catalog = Keyspace Outset = Keyspace; keyset = {} Inset = {x| x begins with “S”} = edgeset(cat,S) Keyset = Inset Inset = {x| x begins with “A”}= edgeset(cat,S) S A …

  17. Example: library after reshelving Cat Inset of catalog = Keyspace Outset = Keyspace; keyset = {} Inset = {x| x begins with “A”} Inset = {x| x begins with “S”} = edgeset(cat,S) Outset = {x |x begins with “Sh” or greater} S A … S’ Inset = {x| x begins with “Sh” .. “Sz”} Keyset = Inset

  18. Example: library after reshelvingand catalog change Cat Inset of catalog = Keyspace Outset = Keyspace; keyset = {} Inset = {x| x begins with “A”} Inset = {x| x begins with “S” through “Sg”} = edgset(cat, S) Outset = {x |x begins with “Sh” or greater} S A … S’ Inset = {x| x begins with “Sh” .. “Sz”} = edgeset(Cat, S’) Keyset = Inset

  19. Observe • Without the note from S to S’, there would be keys on S’ yet S’ would have a null inset and hence a null keyset. • This violates the Existkeys part of the structural condition. • Note also that we can’t eliminate the note from S to S’ even after the catalog is updated. Why?

  20. Execution Goodness • For a search for an item B beginning at node m, the following invariant holds: • After any operation of any process, if the search for item B is at node x, then B is in keyset(x) or there is a path from x to node y such that B is in keyset(y) and every edge E along that path has B in its edgeset.

  21. Execution Goodness Proof Sketch • Provided the search reaches the node having B in its keyset, the search will find B there or will find it nowhere. • The invariant ensures that the search will not end its search anywhere else.

  22. Execution Goodness Proof • Why is it that Bob is fine in spite of the fact that the Bob and Alice concurrent execution could never execute serially? • Because even when Bob is at shelf S, the book Bob is looking for is in edgeset(S,S’) and B is in keyset(S’).

  23. Practical Applications • Most sophisticated database management systems use some version of the library parable in their B-trees, hash structures, etc. • Reason: locks need not be held as long and can be held lower in the tree. • B trees for example have links at the leaf level. So a split looks like this:

  24. B tree simplified (two vals per node) 50 Inset = {x | 0 <=90}; keyset = {} Outset = inset Inset = {x| x < 50} Keyset = Inset 70 Inset = {x| x > 50 and x <= 90} = edgeset(node 50, node 70) Keyset = Inset 1, 7

  25. B tree insert(32): split left leaf at 15Only 1,7 node needs to be locked 50 Inset = {x | 0 <=90}; keyset = {} Outset = inset Inset = {x| x < 50} Keyset = Inset – {x| x > 15} = {x| x <= 15} 70 Inset = {x| x > 50 and x <= 90} = edgeset(node 50, node 70) Keyset = Inset 1, 7 32 Edgeset = {x|x > 15}

  26. Readjust parent (so lock it briefly) 15, 50 Inset = {x | 0 <=90}; keyset = {} Outset = inset Inset = {x| x < 50} Keyset = Inset – {x| x > 15} = {x| x <= 15} 70 Inset = {x| x > 50 and x <= 90} = edgeset(node 50, node 70) Keyset = Inset 1, 7 32 Edgeset = {x|x > 15}

  27. Can Generalize Using Model • Above algorithm is due to Lehman and Yao and is called the B-link algorithm. Long journal article to present and prove. • Now can generalize to any structure. Ensure structure works and invariant holds on execution. • Also possible to invent a new algorithm making direct use of the model.

  28. High Concurrency Without Links:Give-up algorithm • Explicitly record the description of inset of each node in the node. • Search(B) descends. If B is ever not in the inset of the current node, then give up and start over. • Happens rarely enough that performance is as good as B-link for searches. Less work for deletions. • Proof is immediate.

  29. Conclusion • Simple framework for all search structures. Handful of concepts: keyspace, inset, edgeset, outset, keyset. • Can be a guide to coding.

  30. Exercise • When can Alice remove the note directing those seeking certain books to go from S to S’? • Try to design a merge algorithm for a B-tree in the give-up setting. Lock as little and as low as possible.

More Related