1 / 25

Parallel Database System: The Future of High Performance Database Systems

Parallel Database System: The Future of High Performance Database Systems. Present by: Suresh Babu L. Outline . Why parallel Databases? Scale up and Speedup Parallel DB’s Architectures Parallel Data Flow Data Partitioning Parallelism with Relational Operators The State of the Art.

ahmed-case
Download Presentation

Parallel Database System: The Future of High Performance Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Database System: The Future of High Performance Database Systems Present by: Suresh Babu L

  2. Outline • Why parallel Databases? • Scale up and Speedup • Parallel DB’s Architectures • Parallel Data Flow • Data Partitioning • Parallelism with Relational Operators • The State of the Art

  3. Why Parallel Databases? Edgar F.Codd

  4. 1,000 x parallel 100 second SCAN. 1 Terabyte 1 Terabyte BANDWIDTH 10 GB/s 10 MB/s Parallelism: divide a big problem into many smaller ones to be solved in parallel. Parallel Access to Data At 10 MB/s 1.2 days to scan

  5. Pipeline Any Any Sequential Sequential Program Program Sequential Sequential Any Any Sequential Sequential Sequential Sequential Partition outputs split N ways inputs merge M ways Program Program Parallel DBMS: Intro • Pipeline parallelism: • Pipeline partition:

  6. Pipelined and Partitioned Parallelism • Both are natural in DBMS! Pipeline parallelism Partitioned data allows partitioned parallelism Merge Sort Sort Sort Sort Sort Scan Scan Scan Scan Scan Source Data Source Data Source Data Source Data Source Data

  7. Scale-Up And Speed-Up • Speedup • Scale-up: 1TB 100GB 100GB 100GB

  8. A Bad Speedup Curve 3-Factors Interference Skew Startup Processers & Discs Barriers to Achieving Linear Speedup and Scaleup

  9. Architectures for Parallel DBs • Shared memory: • Shared –disks: IBM/370 ,Sequent, SGI, Sun VMScluster, Sysplex

  10. Architectures for Parallel DBs(contd.) • Shared Nothing: Tandem, Teradata, SP2

  11. Architectures (contd.) • Shared Nothing • Teradata: 400 nodes • 80x12 nodes • Tandem: 110 nodes • IBM / SP2 / DB2: 128 nodes • Informix/SP2 100 nodes • ATT & Sybase 8x14 nodes • Shared Disk • Oracle 170 nodes • Rdb 24 nodes • Shared Memory • Informix 9 nodes • RedBrick ? nodes

  12. Parallel Data Flow and Relational Systems Merge Sort Sort Sort Sort Scan Scan Scan Scan Source Data Source Data Source Data Source Data

  13. Data Partitioning • Three main techniques: • Round Robin • Hash Partitioning • Range partitioning

  14. Round Robin Partitioning …. P2 P1 Pn …..

  15. Hash Partitioning …. P2 P1 Pn

  16. Range Partitioning …. …… P2 P1 Pn a….c d…..g w…z

  17. Parallelism with Relational Operators • Two basic operations: • Merge • Split

  18. Merge Operation

  19. Split Operation • Split • Used to partition or replicate the stream produced by a relational operator

  20. Example of Parallelizing Relational Operators C A B INSERT JOIN SCAN SCAN

  21. Example (contd.)

  22. The State of the Art • Teradata • Tandem Nonstop sql • Gamma • The super database computer • Bubba

  23. Specialized Parallel Relational Operators • Algorithms for traditional relational operators written to improve their parallel execution, to better handle data and execution skew. • Look at join • Sort merge • Hash join

  24. CONCLUSION

  25. THANK YOU QUESTIONS ?

More Related