1 / 21

Distributed and Parallel Databases

Distributed and Parallel Databases. Distributed Databases. Distributed Systems goal: to offer local DB autonomy at geographically distributed locations Multiple CPU's – each has DBMS, but data distributed Loosely coupled homogeneous

donagh
Download Presentation

Distributed and Parallel Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed and Parallel Databases

  2. Distributed Databases • Distributed Systems goal: • to offer local DB autonomy at geographically distributed locations • Multiple CPU's – each has DBMS, but data distributed • Loosely coupled • homogeneous • heterogeneous - different DBMSs - need ODBC, standard SQL

  3. Advantages of DDBs • distributed nature of some DB applications (bank branches) • increased reliability and availability if site failure - also replicate data at > 1 site • data sharing but also local control • improved performance - smaller DBs exist at each site • easier expansion

  4. Client-Server • Client-Server (b) in figure • Client sends request for service (strict – fixed roles) • 3-tier architecture • Presentation tier • Logic tier • Data Tier

  5. Distributed DBSs (DDBS) • Distributed DB (c) in figure • WAN • Multiple CPU's – each has DBMS, but data distributed • lower communication rates • Heterogeneous machines • Homogeneous DDBS • homogeneous – same DBMSs • Heterogeneous DDBS • different DBMSs - need ODBC, standard SQL

  6.    Heterogeneous distributed DBSsHDDBs • Data distributed and each site has own DBMS ORACLE at one site, DB2 at another, etc. • need ODBC, standard SQL • usually transaction manager responsible for cooperation among sites • must coordinate distributed transaction • need data conversion and to access data at other sites

  7. P2P • P2P • Every site can act as server to store part of DB and as client to request service

  8. Federated DB - FDBS • federated DB is a multidatabase that is autonomous (a) in figure • collection of cooperating DBSs that are heterogeneous • preexisting DBs form new database • Each DB specifies import/export schema (view) • keeps a partial view of total schema • Each DB has its own local users, local transparency and DBA • appears centralized for local autonomous users • appears distributed for global users

  9. DDBS • Issues in DDBS in slides that follow

  10. Replication • Full vs. partial replication • Which copy to access • Improves performance for global queries but updates a problem • Ensure consistency of replicated copies of data

  11. Data fragments • Can distribute a whole relation at a site or • Data fragments • logical units of the DB assigned for storage at various sites • horizontal fragmentation - subset of tuples in the relation (select) • vertical fragmentation - keeps only certain attributes of relation (project) need a PK

  12. Fragments cont’d • Horizontal fragments: • disjoint - tuples only member of 1 fragment         salary < 5000 and dno=4 • complete - set of fragments whose conditions include every tuple • Complete vertical fragment:        L1 U L2 U ... Ln - attributes of R                         Li intersect Lj = PK(R)

  13. Example replication/fragmentation • Example of fragments for company DB:     site 1 - company headquarters gets entire DB     site 2, 3 – horizontal fragments based on dept. no.

  14. Increased complexity Additional functions needed: • global vs. local queries • keep track of data and replication • execution strategies if data at > 1 site • which copy to access • maintain consistency of copies

  15.  To process a query • Must use data dictionary that includes info on data distribution among servers • Ensure atomicity • Parse user query • decomposed into independent site queries • each site query sent to appropriate server site • site processes local query, sends result to result site • result site combines results of subqueries

  16. Architectures • Distributed Systems goal:  to offer local DB autonomy at geographically distributed locations versus • Parallel Systems goal:  to construct a faster centralized computer • Improve performance through parallelization • Distribution of data governed by performance • Processing, I/O simultaneously

  17. Parallel DBSs • Shared-memory multiprocessor • get N times as much work with N CPU's access • MIMD, SIMD - equal access to same data, massively parallel • Parallel shared nothing • data split among CPUs, each has own CPU, divide work for transactions, communicate over high speed networks                 LANs - homogeneous machines                 CPU + memory - called a site

  18. Query Parallelism • Decompose query into parts that can be executed in parallel at several sites • Intra query parallelism • If shared nothing & horizontally fragmented: Select name, phone from account where age > 65 • Decompose into K different queries • Result site accepts all and puts together (order by, count) • What if a join and table is fragmented?

  19. Other issues • Distributed concurrency control using locking • New models • Cloud computing

More Related