611 likes | 1.4k Views
DDBMS. Distributed Database Management Systems Fragmentation Data Allocation. Distributed Database. Distributed Database A collection of multiple, logically interrelate databases, distributed over a computer network Distributed Database Management System
E N D
DDBMS • Distributed Database Management Systems • Fragmentation • Data Allocation
Distributed Database • Distributed Database A collection of multiple, logically interrelate databases, distributed over a computer network • Distributed Database Management System A software system that supports the transparent creation, access, and manipulation of interrelated data located at different sites of a computer network.
Centralized vs Distributed • Differences between the two approaches: • Centralized: achieves efficiency through local optimization by using complex physical data structures • Distributed: achieves efficiency through global optimization of processing including cost of network communication and local processing.
DDBMS Benefits • Transparency • Separation of high-level semantics from low-level implementation issues • Extension of the data Independence concept in centralized databases • Basic Concepts • Fragmentation • Allocation • replication
Transparency • Language Transparency • Fragmentation Transparency • Replication Transparency • Network Transparency • Data Independence • Data
Transparency • Network • Protect the user from the operational details of the network • Users do not have to specify where the data is located • location transparency – naming transparency • Replication • Replicas (copies) are created for performance and reliability reasons • Replication causes difficult update problems • Users should be made unaware of the existence of these copies
Transparency • Fragmentation • Basic fragments are parts of a relation • vertical: subset of columns – horizontal: subset of rows • Fragments are also created for performance and reliability reasons • Users should be made unaware of the existence of fragments
DDBMS Benefits PERFORMANCE • Greater throughput due to: • Data Localization • Reduces communication overhead • Parallelism • Inter-query and intra-query parallelism
DDBMS ARCHITECTURAL ALTERNATIVES • Autonomy: degree to which each DBMS can operate independently (distribution of control, not data) • Distribution: describes where the data is located physically • Heterogeneity: indicates uniformity of the DBMSs with respect to data
DDBMS ARCHITECTURAL ALTERNATIVES Distribution Autonomy Heterogeneity
DDBMS ARCHITECTURAL ALTERNATIVES • Heterogeneity (H) • H0:Homogeneous • H1: Heterogeneous • 3*3*2 = 18 Alternatives • Some alternatives are meaningless or not practical! • Autonomy (A) • A0: Tight integration • A1: Semi-autonomous • A2: Total isolation • Distribution (D) • D0: Non-distributed • D1: Client Server • D2: Peer-to-peer
Major DDBMS Architecturea • Client-server • Peer-to-peer • Multidatabase
CLIENT SERVER • (Ax, D1, Hy) • Distribute the functionality between client and server to better manage the complexity of the DBMS • Two-level architecture
CLIENT SERVER • Typical Scenario • 1. Client parses a query, decomposes it into independent site queries, and sends it to an appropriate server. • 2. Each server processes a local query and sends the result relation to the client. • 3. The client combines the results of all the sub-queries.
PEER-TO-PEER • (A0, D2, H0) • Global users submit requests via an external schema (ES) defined over a global conceptual schema (GCS), which is system maintained, and that is the union of all local conceptual schemas (LCSs) • The global system has no control over local data and processing.
MULTI-DATABASE • (A2, Dx, Hy) • The global conceptual schema (GCS) exists as a union of some local conceptual schemas (LCSs) only • or does not exist.
Three orthogonal dimensions for data distribution • level of sharing • access pattern behaviour • level of knowledge (of access pattern behaviour)
ACCESS PATTERN BEHAVIOUR Dynamic Static Partial information Complete information No sharing LEVEL OF KNOWLEDGE Data Data+program Level of Sharing
The three orthogonal dimensions • . Level of sharing • no sharing: each application and its data execute at one site • data sharing: programs replicated at each site; data not replicated, but moved to a site as needed • data+program sharing: both data and programs may be moved to a site as needed
The three orthogonal dimensions • Access pattern behaviour • static: access patterns to data do not change over time • dynamic: access patterns to data change over time . how dynamic? • Level of knowledge (of access pattern behaviour) • no knowledge: not a likely scenario • partial knowledge: can predict, but users may deviate significantly • complete knowledge: can predict and no major changes
DISTRIBUTED DATABASE DESIGN • TOP-DOWN Approach • Have a database • How to partition and /or replicate it across sites • BOTTOM_UP Approach • Have existing databases at different sites • How to integrate then together and deal with heterogeneity and autonomy?
DISTRIBUTED DATABASE DESIGN • Top-Down • Typical approach to database design • information regarding • distribution of accesses among sites • nature of access to database at each site needs to be gathered during Requirements Analysis • Design local conceptual schemas by distributing the entities over the sites of the distributed system after conceptual schema design • Distribution activity consists of: • data fragmentation . split up schema into pieces • data allocation . assign schema pieces to sites
DISTRIBUTED DATABASE DESIGN • . Bottom-Up Approach • Necessary when databases already exist and we need to integrate them into one database • Similar to view integration, but may be heterogeneous
Top-Down Design Requirement Analysis System Requirement Conceptual Design View Design Global Conceptual Schema External Schema Definition Access Information Distribution Design
Top-Down Design(cont’d) Distribution Design User Input Logical Conceptual Schema Physical Design Physical Schema Observation and Monitoring
Reasons for Fragmentation • Relation is not an appropriate unit of distribution • Application views are usually subsets of relations • Application access is local to subsets of relations • Applications at different sites may require different parts of the same relation • Store once . high remote access costs • Duplicate . high update costs • Concurrency of access is more limited
Reasons for Fragmentation • BUT, some applications may suffer from data fragmentation • Their required data is located in two or more fragments • It may be harder to check integrity constraints Careful design of data fragmentation is required!
Correctness Rule for Fragmentation • Completeness • If a relation R is decomposed into fragments R1, R2, …, Rn, each tuple/attribute that can be found in R can also be found in one or more of the Ri’s • Reconstruction • If a relation R is decomposed into fragments R1, R2, …, Rn, it should be possible to define a relational operator such that R = Ri RiFR • Disjointness • If a relation R is horizontally decomposed into fragments R1, R2, …,Rn and data item di is in Rj, it is not in any other fragment Rk (kj)
Types of Fragmentation • Horizontal • Vertical • Hybrid
HORIZONTAL DATA FRAGMENTATION What is it? • Partitions a relation along its tuples so that each fragment has a subset of the tuples the relation. • Types of horizontal fragmentation • Primary: based on a predicate Pi that selects tuples from a relation R • Derived: based on the partitioning of a relation due to predicates defined on another relation • related according to foreign keys
HORIZONTAL DATA FRAGMENTATION: Primary • Each fragment, Ri, is a selection on a relation R using a predicate P
HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS • Database Information • Relations in the database and relationships between them specially with joins • owner • member Project Works On
HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS • Application Information User query predicates . examine most important applications (80/20 rule) • simple: p1: DEPT = ‘CSE’; p2 : SAL > 3000 • conjunctive: minterm predicate: m1= p1ANDp2 (e.g., (DEPT=‘CSEE’) AND (SAL>3000)) • minterm selectivity: number of tuples returned against a given minterm • access frequency: access frequency of user queries
Predicates & Minterms • Aspects of simple predicates: • Completeness: A set of simple predicate is said to be complete iff there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to the set • If the only application that accesses Project wants to access depts, it is complete.
Predicates & Minterms • Minimality: If a predicate influences how fragmentation is performed (causes a fragment to be further broken in fi and fj), there should be at least one application that accesses fi and fj differently.
VERTICAL FRAGMENTATION • produces fragments R1, R2, R3, …,Rn of a relation R • each fragment contains a subset of R’s attributes as well as the primary key of R • divides relations “vertically” by columns (attributes) • the objective is to obtain fragments so that applications only need to access one fragment • want to minimize execution time of applications • inherently more complicated than horizontal data fragmentation due to the total number of alternatives available
ALLOCATION • Given: • a set of fragments F = {F1, F2, …, Fn} • a set of sites S = {S1, S2, …, Sm} • a set of transactions T = {T1, T2, …, Tp} • Find an “optimal” distribution of F to S
ALLOCATION • 1. Minimize cost • Cost of storing each Fi at Sj • Cost of querying Fi at Sj • Cost of updating Fi at all sites where it is stored • Cost of data communication • 2. Maximize performance • Minimize response time at each site • Maximize system throughput at each site • This problem is NP-hard!