CS 540 Database Management Systems

CS 540 Database Management Systems Parallel Databases

Announcements • Attendance quiz for the last two lectures • Advanced topics not covered in assignments • Short questions • 2% of final grade • Due on 7/13 • Project reports and presentations • Due on 7/11 for all groups.

Database Machines • 80’s: Specialized HW to run RDBMS faster • Extra processors (parallelism) • Smart and fast storage • Much like graphic accelerator for graphic applications. • Example: • processor per disk track: eliminate seek • processor per head: eliminate bulk I/O for selection • 90’s and afterward: The idea failed • Not much help with expensive operators: join, … • No economical scalability • Google versus Altavista!

Database Machines • Interests recently picked up • Data management as HW target • Novel research in HW architecture/DB • One idea never forgotten • Parallel database systems

Parallel v.s. Distributed Databases • Parallel database system • Improves performance through parallel implementation • Fully integrated system • No notion of site autonomy • Centralized schema • All queries start at a well defined host • Distributed database systems • Data is stored across several sites, each site managed by a DBMS capable of running independently. • Not covered in this class

Parallel RDBMSs • Goal • Improve performance by executing multiple operations in parallel • Key benefit • Cheaper to scale than relying on a single increasingly more powerful processor • Key challenge • Ensure overhead and contention do not kill performance

Parallel RDBMSs: performance metrics • Speedup • More processors => higher speed. • Individual queries must run faster. • More transactions per second (TPS). • Scaleup • More processors => can process more data • Batch scaleup • Same query over larger input data should take the same time.

Linear v.s. non-linear Speedup

Constant v.s. non-constant Scaleup

Challenges to linear speedup and constant scaleup • Startup cost • Cost of starting an operation on many processors. • Interference • Contention for shared resources. • Skew • Slowest processor becomes the bottleneck.

Architectures of Parallel Databases From Greenplum Database Whitepaper

Shared Nothing • Most scalable architecture • Minimize interference by minimizing resource sharing • Can use commodity hardware • Processor = server = node • But most difficult to program and manage. • Gamma’s choice • HW scalability

Parallel Query Evaluation • Inter-query parallelism • Each query runs on one processor • Inter-operator parallelism • A query runs on multiple processor • An operator runs on one processor

Parallel Query Evaluation • Intra-operator parallelism • An operator runs on multiple processor • Requires data partitioning • More scalable than other methods. • Main focus in Gamma

Horizontal Data Partitioning • Relation R split into P chunks R0, ..., RP-1, stored at the P nodes. • Round robin • tuple ti to chunk (i mod P) • Hash based on attribute A • Tuple t to chunk h(t.A) mod P • Range based on attribute A • Tuple t to chunk i if vi-1 < t.A < vi

Parallel Selection: sA=V(R), sV1<A<V2(R) • Cost B(R) in centralized RDBMS. • Round robin: All servers do the work • Parallel time = B(R)/P; total work = B(R) • Good load balance but needs to read all the data • Hash based: • σA=v(R): Parallel time = total work = B(R)/P • σV1<A<V2(R): Parallel time = B(R)/P; total work = B(R) • Range based: • Parallel time = total work = B(R) • Works well for range predicates but suffers from data skew

Horizontal Data Partitioning • Round robin • query: no direction. • load: uniform distribution. • Hash based on attribute A • query: can direct equality e.g. equijoin. • load: somehow randomized. • Range based on attribute A • query: range queries, equijoin, group by. • load: depending on the query’s range of interest.

Indexing • Similar to single machine • Indexes are created at all sites • Primary index records where a tuple resides

Query Execution • Query manager • parse and optimize query, generate operator tree. • Send to site (if a single site query), or dispatcher • Dispatcher • give query to a scheduler (simple load balancing) • Scheduler • pass pieces to operator processes at sites • Site query processor with query processes • results sent through scheduler to query manager.

Selection • Selection(R) = Union (Selection R1, …, Selection Rn) • Initiate selection operator at each relevant site • If predicate on partitioning attributes (range or hash) • Send the operator to the overlapping sites. • Otherwise send to all sites.

R Partitions OUTPUT 1 1 2 INPUT 2 hash function . . . M-1 M-1 M main memory buffers Disk Disk Partitions of R & S Join Result Blocks of bucket Ri( < M-1 pages) Output buffer Input buffer For Si M main memory buffers Disk Disk Hash-join: centralized • Partition relations R and S • R tuples in bucket i will only match S tuples in bucket i. • Read in a partition of R. Scan matching partition of S, search for matches.

Hybrid Hash Join Algorithm • Partition R into k buckets t buckets R1, …, Rtstay in memory k-t buckets Rt+1, …, Rkto disk • Partition S into k buckets • First t buckets join immediately with R • Rest k-t buckets go to disk • Finally, join k-t pairs of buckets: (Rt+1,St+1), (Rt+2,St+2), …, (Rk,Sk) • Obvious choice t = 1 (Gamma) • Saves some I/O

Parallel Hybrid Hash-Join M Joining Processors (later) R21 R2k R11 R1M RN1 RNk K Disk Sites joining split table R2 R1 RN • partitioning split table • so that Ri fits into M processors building relation R

Parallel Join: sort-merge join? R, S R, S • Read and sort R and S in runsof M blocks • Merge runs of R and S Blocks of run i . . . main memory Disk Disk R, S Input 1 Input 2 Output . . . . Input M main memory Disk

Aggregate Operations • Aggregate functions: • Count, Sum, Avg, Max, Min, MaxN, MinN, Median • select Sum(sales) from Sales group bytimeID • Each site computes its piece in parallel • Final results combined at a single site • Example: Average(R) • what should each Ri return? • how to combine? • Always can do “piecewise”?

Updates • Updates: replace, delete, append tuples • Like centralized RDBMS • May require moving a tuple • when?

Performance Results • Almost linear speedup and constant scaleup! • Close to perfect • almost expected • little startup overhead • no interference among disjoint data/operations • major operations (equijoin) insensitive to data skew

Research Issues Missing • Query optimization for parallel execution • Load balancing • Skew handling

What You Should Know • Performance metrics: speedup v.s. scaleup • Methods of parallelism in RDBMSs • Methods of partitioning • Parallel query execution • Dealing with sub-system failures

Lessons • Scalability and performance both count! • Incomplete but working system prototypes are useful! • Gamma supports only equi-join

CS 540 Database Management Systems