Parallel Processing with Autonomous Databases in a Cluster System

Parallel Processing with Autonomous Databasesin a Cluster System Stéphane Gançarski1, Hubert Naacke1, Esther Pacitti2, Patrick Valduriez3 1LIP6, University Paris 6, Paris, France FirstName.LastName@lip6.fr 2IRIN, Nantes, France Esther.Pacitti@irin.univ-nantes.fr 3IRIN, Nantes, France Patrick.Valduriez@inria.fr

TCP/IP • • • • • Cluster of PC Application Service Provider ASP context app1 app2 DBMS DB User site

Potential benefits • For the users • No system administration • High availability • Security • For the provider • Centralized management of apps and databases • Use of a cluster => reduced cost • Economy of scale • new services used by every app.

Challenge for the ASP • Exploit the cluster architecture • To obtain good performance/cost through parallelism • Apps can be update-intensive (≠ search engine) • Without hurting app and database autonomy • Apps and databases should remain unchanged These are conflicting objectives

Solutions readily available • Solution 1: TP monitor • Replicate databases at multiple nodes to increase parallelism • Needs to interface applications to TP monitor • Solution 2: Parallel DBMS (shared disk, shared cache or shared nothing) • Requires heavy migration • Hurts database autonomy

Our (optimistic) approach • Trade consistency for performance • Capture apps profile and consistency requirements • Replicate apps and databases at multiple nodes • Without any change • Use consistency requirements to perform load balancing • Detect and repair inconsistencies • Using database logs

Outline • Cluster architecture • Replication model • Transaction model • Execution model • Future work

Directory Conceptual architecture Request (user, app) authentification authorization connection app 1 connection querying DBMS DB

Internet Application load balancer app app app 1 2 n Transaction load balancer Directory Preventive replication manager DBMS DBMS DBMS DBMS DB DB DB DB Conflicts manager Cluster Architecture

Outline • Cluster architectures • Replication model • Transaction model • Execution model • Future work

Symmetric replication Client Client Read or update Read or update Replication Master Master EMP EMP • Increase performance and availability • But may introduce inconsistencies

Update propagation to replicas • Synchronous: all replicas are updated within the same transaction (2PC) • Replicas always consistent • But does not scale up • Asynchronous: each replica is updated (refreshed) in a separated transaction. We support 2 variants: • Preventive (new solution) : transactions are shortly delayed and ordered based on their timestamp so there is no conflict • Optimistic : most efficient but can create • conflicts to resolve • divergence to control

? ? Example T2 • Case 1 • T1 and T2 data independent or commutative • T1 changes sent to N2 • T2 changes sent to N1 T1

Example Q1 T2 • Case 2 • T1 and T2 perform conflicting updates • Conflict prevention • Conflict detection and resolution • Priority-based • Resolution mode • Dirty read from Q1 • Abort or compensation T1 ? ?

Execution rules • Request profile (for any query or transaction) • stored procedure + parameter values • user id, priority, access control rules • Transaction profile • conflict class : data it may read or write • compatibility with other trans. (disjoint or commutative) • Integrity constraints • Max-table-change : {(Rel, max-#tuple)} • Max-tuple-change : {(Rel, {(att, max-value)})} • Query requirements • precision level, tolerated divergence, …

Execute Trans. Data placement Load Transaction processing Trans (from app) Generate run-time policy Execution rules Trans exec. plan preventive replic. optimistic replic.

Node 1 Stock[1,30,10] Node 2 Stock[1,30,10] Decr(1,15) Decr(1,10) 1 Stock[1,15,10] Stock[1,20,10] 2 Q = n - 1 Q = n -1 3 After synchronization [1,5,10] Q = n Example • Stock(item, quantity, threshold) • Decrease item id by q units • procedure Decr(id, q) • UPDATE Stock • SET quantity = quantity – q • WHERE item = id; • How many item to renew ? • query Q: • SELECT count(item) • FROM Stock • WHERE quantity < threshold Commutative updates parallel processing 1 2 Query tolerates imprecision Query with 100% precision 3

Execution model • Problem statement: given the cluster’s load and data placement, and transaction T’s execution plan, find the optimal node to run T • Cost function includes the cost of synchronizing replicas • Step 1: select data access method and replication mode (preventive or optimistic) • Step 2: select best node among those supporting the access method selected at step 1 and run T

Load balancing with optimistic replication • Choice of the node is based on • data placement • node consistency • {Rel, Δ#tuplemax, {(att, Δvaluemax)}} • synchronization cost to meet consistency requir. • apply T’s such that node consistency after applying T’s  requirements • transaction execution cost • normalized estimated response time • node load: (load-avg, {(running T’s, elapsed-time)})

Q2 Q1 T2 T1 ? ? Execution example N2 N1 consistent nodes

T1 Execution example Q2 Q1 T2 T1 imprecision = 1 trans to sync = T1 N2 N1

T2 Execution example Q2 Q1 T2 T1 imprecision = 1 trans to sync = T2 N2 N1

Q1 Execution example Q2 Q1 T2 T1 N2 N1

Q2 T1 Execution example Q2 Q1 T2 T1 N2 N1

Experiments • Implementation • LIP6 cluster (Oracle8i/Linux) • benchmarking with TPC-C : 500MB - 2GB • Interconnection network : 1 GB/s • 5 nodes • Objectives • measure benefit on transaction response time • measure benefit on load balancing for transactions with low consistency requirements

Validation for hot spot load • Incoming load with periodic hot spot : • 10 simultaneous tra nsaction requests • Each request lasts T/4, per period of T.

Hot spot load : results • X: number of nodes : from 1 to 4 • Y: avg response time during hot spot • Benefit on response time • factor of 2 (with 4 nodes) • Even better • if sync starts earlier • improve low-load detection (hot spot end) • if sync faster than original trans • using log to get the update set of a trans.

Future work • Validation by simulation up to 64 nodes • measure scale-up • measure directory access contention • Implement divergence control • capture user & transaction profile (semi-automatic) • generate execution rules (by inference or statistics) • improve node precision (n dimensions) • Implement conflicts resolution/detection

Parallel Processing with Autonomous Databases in a Cluster System

Parallel Processing with Autonomous Databases in a Cluster System

Presentation Transcript

Query Processing over Incomplete Autonomous Web Databases

Parallel Databases

Query Processing over Incomplete Autonomous Databases

Parallel Processing

Parallel Databases

Parallel Processing

PARALLEL PROCESSING

Parallel Databases

Query Processing over Incomplete Autonomous Databases

Parallel Processing

Parallel Processing

Parallel Processing with OpenMP

Investigate and Parallel Processing using E1350 IBM eServer Cluster

Parallel Processing

Parallel Processing

Parallel Processing with PlayStation3

Parallel Development of Autonomous Robots with a Focus on Modularity

Parallel Databases

Parallel Processing with OpenMP

Parallel Processing

Parallel Databases