1 / 40

Effect of Node Size on the Performance of Cache-Conscious B+ Trees

Effect of Node Size on the Performance of Cache-Conscious B+ Trees. Written by: R. Hankins and J.Patel. Presented by: Ori Calvo. Introduction. Who cares about cache improvement Traditional databases are designed to reduce IO accesses. But … Chips are cheap. Chips are big.

xaria
Download Presentation

Effect of Node Size on the Performance of Cache-Conscious B+ Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effect of Node Size on the Performance of Cache-Conscious B+ Trees Written by: R. Hankins and J.Patel Presented by: Ori Calvo

  2. Introduction • Who cares about cache improvement • Traditional databases are designed to reduce IO accesses. But… • Chips are cheap. • Chips are big. • Why not store all the database in memory? • Reducing main memory accesses is the next challenge.

  3. Objectives • Introduction to cache-conscious B+Trees. • Provide a model to analyze the effect of node size. • Examine “real-life” results against our model’s conclusions.

  4. B+Tree Refresher • d Ordered B+Tree has between d and 2d keys in each node. • Root has between 1 and 2d keys. • Every node must be at least half full. • 2*(d+1)^(h-1) <= N <= (2d+1)^h • Fill percentage is usually ln2 ~ 69%

  5. B+Tree Refresher (Cont…) • Good search performance. • Good incremental performance. • Better cache behavior than T-Tree. • What is the optimal node size ?

  6. Improving B+Tree Question: Assuming node size = cache line size, how can we make B+Tree algorithm to utilize better the cache? Hint: Locality !!!

  7. Pointer Elimination • Node size = cache line size. • Only half of a node is used for storing keys. • Get rid of pointers and store more keys. • Instead of pointers to child nodes use offsets.

  8. Introducing CSB+Tree • Balanced search tree. • Each node contains m keys, where d<=m<=2d and d is the order of the tree. • All child nodes are put into a node group. • Nodes within a node group are stored contiguously. • Each node holds: • pFirstChild - pointer to first child • nKeys - number of keys • arrKeys[2d] - array of keys

  9. CSB+Tree P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2 P N K1 K2

  10. CSB+Tree vs. B+Tree • Assuming, node size = 64B • B+Tree: 7 Keys + 8 Pointers + 1 Counter • CSB+Tree: 1 Pointer + 1 Counter + 14 Keys • Results: • A cache line can satisfy almost one more level of comparisons • The fan out is larger  Less space

  11. CSS Tree • Can we do more elimination ?

  12. Shaking our foundations • Should node size be equal to cache line size ? • What about instructions count ? • How can we measure the effect of node size on the overall performance ?

  13. Building Execution Time Model • We need to take into account: • Instruction executed. • Data cache misses. • Instruction cache misses (Only 0.5%). • Mis-predicted branches. • Model the above during an equality search. • Should be independent of implementation and platform details, but …

  14. Execution Time Model T = I*cpi + M*miss_latency + B*pred_penalty

  15. CPI – 0.63 ? • Can be extracted from a processor’s design manual, but.. • Modern processor are very complex • Some instructions require more time to retire than others • On Pentium 3 CPI is between 0.33 to 14

  16. Other PSV – Where do they come from? • Miss_latency • Same problems as CPI • Pred_penalty • The manual provides tight upper and lower bounds.

  17. PSV Experiment For(I=0; I<Queries; I++) { address = origin + random offset val = *address; for(j=0; j<Instructions; j++) { /* Computing involving “val” */ } }

  18. PSV Results

  19. Calculate I • I is depended upon the actual implementation of the CSB+Tree • Two main components: • I_search - Searching inside a node • I_trav - Node traversals • Analyzing code leads to the following conclusions: • I_search ~ 5 • I_trav ~ 30

  20. Calculate I_Serach BinarySearch: middle = (p1+p2)/2; comp *middle,key; jle less; p1 = middle; less: p2 = middle; jump BinarySearch;

  21. Calculate T_Trav Node *Find(Node *pNode,int key) { int *pKeysBegin = pNode->Keys; (1) int *pKeysEnd = pNode->Keys + pNode->nKeys; (3) int *pFoundKey,foundKey; pFoundKey = BinarySearch(pKeysBegin,pKeysEnd,key); (8) ? if( pFoundKey < pKeysEnd ) {foundKey = *pFoundKey;} (3,1) else {foundKey = INFINITE;} (1) int offset = (int)(pFoundKey - pKeysBegin); (2) Node *pChild = NULL; if( key < foundKey ) {pChild = pNode->pChilds + offset;} (4,1) else {pChild = pNode->pChilds + offset + 1;} (3) return pChild; -------- } (23-25)

  22. Calculate I (Finishing) • h - Height of the tree • f - Fill percentage • e - Max number of keys in a node

  23. Calculate M • M_node – Cache misses while searching inside a node When L is the number of cache line inside a node

  24. Calculate M (Cont…) • Cache misses per tree traversal is bounded by: TreeHeight * M_node • What about q traversal ?

  25. Calculate M for q traversals • Let’s assume there are no cache conflicts and no capacity misses • On first traversal there are M_node cache misses per node access • On subsequent traversals • Nodes near the root will have high probability of being found in the cache • Leaf nodes will have substantially lower probability

  26. Calculate M for q traversals (Cont..) • Suppose, • q is the number of queries • b is the number of blocks • Then, the number of Unique Blocks that are visited is:

  27. Calculate M for q traversals (Finishing) • Assuming q*M_node queries is performed by each tree traversal, then: M is the sum of UB at each level of the tree:

  28. Calculate B • h - Height of the tree • f - Fill percentage • e - Max number of keys in a node

  29. Mid year evaluation • We built a simple model • T = I*cpi + M*miss_latency + B*pred_penalty • Now, we want to use it

  30. Our model’s prediction • We want to look at the performance behavior that our model predicts on Pentium 3 • The following parameters are used • 10,000,000 items • Number of queries = 10000 • Fill percentage = 67% • Cache line size = 32 bytes

  31. Effect of node size on cache misses count

  32. Effect of node size on instructions count

  33. Effect of node size on execution time

  34. Numbers • Best cache utilization at small node sizes: 64-256 bytes • For larger node sizes there ate fewer instructions executed, the minimum is reached at 1632 bytes. • Optimal node size is 1632 bytes, performing 26% faster over a node size of 32 bytes.

  35. Our Model Conclusions • Conventional wisdom suggests: Node size = Cache line size • We show: Using large node size can result in better search performance.

  36. Experimental Setup • Pentium 3 • 768MB of main memory • 16KB of L1 data cache • 512KB of L2 data/instruction cache • 4-way, set associative • 32 byte of cache line • Linux, kernel version 2.4.13 • 10,000,000 entries in database • The database is queried 10,000 times

  37. Effect of node size on cache misses count

  38. Effect of node size on instructions count

  39. Effect of node size on execution time

  40. Final Conclusions • We investigated the performance of CSB+Tree • We introduced first-order analytical models • We showed that cache misses and instruction count must be balanced • Node size of 512 bytes performs well • Larger node size suffer from poor insert performance

More Related