1 / 24

I/O-Algorithms

I/O-Algorithms. Thomas M ølhave Spring 2012 February 9 , 2012. Massive Data. Pervasive use of computers and sensors Increased ability to acquire/store/process data → Massive data collected everywhere Society increasingly “data driven” → Access/process data anywhere any time

gittel
Download Presentation

I/O-Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012

  2. Massive Data • Pervasive use of computers and sensors • Increased ability to acquire/store/process data → Massive data collected everywhere • Society increasingly “data driven” → Access/process data anywhere any time Nature/Science special issues • 2/06,9/08, 2/11 • Scientific data size growing exponentially, while quality and availability improving • Paradigm shift: Science will be about mining data • Obviously not only in sciences: • Economist 02/10: • From 150 Billion Gigabytes five years ago • to 1200 Billion today • Managing data deluge difficult; doing so • will transform business/public life

  3. I/O-Algorithms Example: LIDAR Terrain Data • Massive (irregular) point sets (~1m resolution) • Becoming relatively cheap and easy to collect • Sub-meter resolution using mobile mapping

  4. ~ 1,5 m betweenmeasurements ~1,2 km ~ 280 km/h at 1500-2000m I/O-Algorithms Example: LIDAR Terrain Data

  5. I/O-Algorithms Example: LIDAR Terrain Data • ~2 million points at 30 meter (<1GB) • ~18 billion points at 1 meter (>1TB)

  6. Example: Detailed Data Essential • Mandø with 2 meter sea-level raise 80 meter terrain model 2 meter terrain model

  7. I/O-Algorithms Random Access Machine Model • Standard theoretical model of computation: • Infinite memory • Uniform access cost • Simple model crucial for success of computer industry R A M

  8. R A M L 1 L 2 I/O-Algorithms Hierarchical Memory • Modern machines have complicated memory hierarchy • Levels get larger and slower further away from CPU • Data moved between levels using large blocks

  9. read/write head read/write arm track magnetic surface I/O-Algorithms Slow I/O • Disk access is 106 times slower than main memory access “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer) • Disk systems try to amortize large access time transferring large contiguous blocks of data (8-16Kbytes) • Important to store/access data to take advantage of blocks (locality)

  10. running time data size I/O-Algorithms Scalability Problems • Most programs developed in RAM-model • Run on large datasets because OS moves blocks as needed • Moderns OS utilizes sophisticated paging and prefetching strategies • But if program makes scattered accesses even good OS cannot take advantage of block access  Scalability problems!

  11. I/O-Algorithms External Memory Model N = # of items in the problem instance B = # of items per disk block M = # of items that fit in main memory T = # of items in output I/O: Move block between memory and disk D Block I/O M P

  12. I/O-Algorithms Scalability Problems: Block Access Matters • Example: Traversing linked list (List ranking) • Array size N = 10 elements • Disk block size B = 2 elements • Main memory size M = 4 elements (2 blocks) • Large difference between N and N/B large since block size is large • Example: N = 256 x 106, B = 8000 , 1ms disk access time N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec 1 5 2 6 3 8 9 4 10 1 2 10 9 5 6 3 4 7 7 8 Algorithm 1: N=10 I/Os Algorithm 2: N/B=5 I/Os

  13. I/O-algorithms Fundamental Bounds Internal External • Scanning: N • Sorting: N log N • Searching: • Note: • Linear I/O: O(N/B) • B factor VERY important: • Cannot sort optimally with search tree

  14. I/O-algorithms External Search Trees • Binary search tree: • Standard method for search among N elements • Search traces at least one root-leaf path • If nodes stored arbitrarily on disk • Search in I/Os

  15. I/O-algorithms External Search Trees • BFS blocking: • Block height • Output elements blocked •  • Rangesearch in I/Os • Optimal: O(N/B) space and query

  16. I/O-algorithms External Search Trees • Maintaining BFS blocking during updates? • Balance normally maintained in search trees using rotations • Seems very difficult to maintain BFS blocking during rotation x y y x

  17. I/O-algorithms B-trees • BFS-blocking naturally corresponds to tree with fan-out • B-trees balanced by allowing node degree to vary • Rebalancing performed by splitting and merging nodes

  18. I/O-algorithms (a,b)-tree • T is an (a,b)-tree (a≥2 and b≥2a-1) • All leaves on the same level and contain between a and b elements • Except for the root, all nodes have degree between a and b • Root has degree between 2 and b (2,4)-tree • (a,b)-tree uses linear space and has height •  • Choosing a,b = each node/leaf stored in one disk block •  • O(N/B) space and query

  19. I/O-algorithms (a,b)-Tree Insert • Insert: Search and insert element in leaf v DO v has b+1 elements/children Splitv: make nodes v’ and v’’ with and elements insert element (ref) in parent(v) (make new root if necessary) v=parent(v) • Insert touch nodes v v’ v’’

  20. I/O-algorithms (2,4)-Tree Insert

  21. I/O-algorithms (a,b)-Tree Delete • Delete: Search and delete element from leaf v DO v has a-1 elements/children Fusev with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and ≤ a+b-1<2b) children split v v=parent(v) • Delete touch nodes v v

  22. I/O-algorithms (2,4)-Tree Delete

  23. I/O-algorithms Summary/Conclusion: B-tree • B-trees: (a,b)-trees with a,b = • O(N/B) space • O(logB N) query • O(logB N) update • B-trees with elements in the leaves sometimes called B+-tree • Construction in I/Os • Sort elements and construct leaves • Build tree level-by-level bottom-up

  24. I/O-algorithms Merge Sort • Merge sort: • Create N/M memory sized sorted runs • Merge runs together M/B at a time  phases using I/Os each • Distribution sort similar (but harder – partition elements)

More Related