1 / 19

Data Management for Peer-to-Peer Computing: A Vision

Data Management for Peer-to-Peer Computing: A Vision. Ali Rahbari. Outline. P2P Data Networks Why P2P Databases are Different A P2P Database Scenario A logic for P2P Databases Propagation Strategy Architecture and Implementation Issues. P2P Data Networks: Basic Notions. Node

reid
Download Presentation

Data Management for Peer-to-Peer Computing: A Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari

  2. Outline • P2P Data Networks • Why P2P Databases are Different • A P2P Database Scenario • A logic for P2P Databases • Propagation Strategy • Architecture and Implementation Issues

  3. P2P Data Networks: Basic Notions • Node • Database, File System, etc • P2P network • Indexed nodes with equal participant rights • Services • Query answering • Query, results and update propagation • Locality • No global schema, no centralized control • Nodes have only a partial vision of the world • Autonomy • Nodes are largely independent of their language and content, etc

  4. Roles for P2P DBs? • Peers come and go, but must still be able to interoperate. • To us, the big question is how to cope with DBs that • are incomplete, overlapping, and mutually inconsistent • dynamically appear and disappear • have limited connectivity. • Scenario • Databases of medical patients • Complete integration is likely to be infeasible • But dynamic integration of DBs relevant to one patient could have high value.

  5. A Model for P2P Databases • Each peer is a node with a database. It exchanges data and services with acquaintances (i.e. other peers). • The set of acquaintanceschanges often, due to • site availability • changing usage patterns • Peers are fully autonomous. • No global control or central server.

  6. D: Doctor P: Pharmacist H: Hospital A Motivating Scenario • A patient may be described in several DBs, which use different patient id formats, disease descriptions, etc. • But the databases can use different patient id formats, disease descriptions, etc • When a patient is admitted to the hospital, H becomes acquainted with D • The acquaintance is dropped when treatment is over • When the doctor prescribes a drug, D becomes acquainted with P • A patient is injured skiing, so more DBs get involved Ski Clinic

  7. Proposal: Local Relational Model (LRM) • A logic for P2P data integration • Instead of a global schema, each peer has • coordination formulas – each specifies semantic interdependencies between two acquaintances • binarydomain relations – each specifies how symbols in one database translate to symbols in an acquaintance’s database. • Each expression in a coordination formula is relative to just one participating database • Use coordination formulas and domain relations for query and update processing.

  8. A Coordination Formula • p: pharmacist DB medication(PrescriptionID, PatientID, Prod) • d: doctor DB treatment(TreatmentID, PatientID, Description, Type) where type {“hospital”, “home”} • (i:x).A(x) means for all xin the domain of databasei, A(x) is true. • A coordination formula: (p:y).(p:z).(p: (x).medication(x, y, z)  d: (w).treatment(w, y, z, “home”) ) “There’s a row in treatment in the doctor DB for each row in medication in the pharmacist DB”

  9. Domain Relation • A row <d1,d2> in domain relation rikspecifies that valued1 in DBicorresponds to value d2 in DBk • rikmay be partial • rik,rki need not be symmetric • Example - DBicontains lengths in meters and DBk in kilometers (total but not symmetric) • rik(x) = roundToClosestK(x) rik(653)=1, rik(453)=0 • rki(x) = x*1000 rki(1)=1000

  10. Queries • A query is a coordination formula of the form A(x) i: q(x), where • A(x) is a coordination formula • x has n variables • i is the database against which the query is posed • q is a new n-ary predicate symbol • A relational space is a pair <db,r> where db is a set of DBs and r associates an rik with each pair of DBs • <db,r> ⊨ f A relational space <db,r> satisfies a coordination formula f • The answer to a query: {ddomi| <db,r> ⊨ ((i:x).A(x)  i:x=d)}

  11. Interpreting a Query • A query: ((i:P(x) j:R(y)) k:S(x,y) )  h:q(x,y) • Evaluate P,R,S ini,j,k (respectively) • Map these results via rih,rjh,rkhto sets si,sj,sk • And then compute ((sisj) sk)

  12. P2P Databases: Proposed Solution Coordinate query and update exchange between autonomous DBs using: • Coordination Formulas • Specify semantic interdependencies between data from two nodes table to table: Cust Customer column to column: name(Cust)  nm(Customer) • Binary Domain Relations • Specify how the symbols used in one database translate to symbols used in another database ‘one’  ‘uno’ CAN$1.00  US$0.65 • Keep AUTONOMY and COORDINATION, as much as possible

  13. What’s New in the Solution? • No global schema, no central registry, no form of control • No need of system restructuring when new nodes come and old ones go away • We do not integrate, we COORDINATE. • Integration is built at design time • coordination happens at runtime

  14. Propagation Strategy: Basic notions • Acquaintance • Pair of nodes which have coordination formulas and binary domain relations with respect to each other • Acquaintances can exchange data and services • Interest Group • Set of nodes with inter-acquaintances between them which have related content • Group Manager • Node of an Interest Group, which is dedicated for group and query propagation management • GM has higher requirements for stability, must be permanently active • Query Scope • Set of nodes which are supposed to answer a given query. Query Scope is defined by Group Manager

  15. Query Propagation Strategy “no more propagation from 8” “no more propagation from 9” 5. “nodes 2 and 4 are reached” “node 8 is reached” “node 6 is reached” GM • User submits query Q () • Node defines query topic • Node sends to Group Manager (GM) request to define Query Scope (QS) • GM computes and sends back QS • Node 1 sends query to acquaintances in QS, and reports this fact to GM • Nodes 2 and 4 send answer to node 1 • Nodes propagate the query to theirs acquaintances from QS and report this fact to GM • And so on… • Nodes which do not propagate any further, report this fact to GM • Propagation stops when “no more propagation” received from all boundary nodes 3. QS (, topic) = ? 4. QS (, topic)= (2, 4, 6, 8, 9, 11) 9 6 2 2. Q (, topic) ←Res2 10 7 1. Q () ←Res4 1 4 11 3 5 8

  16. Implementation Architecture • A classic multi-database system, with • A protocol for adding/dropping acquaintances • LRM query processing (domain mapping logic) that can cope with chains of acquaintances • Dynamic approach to materialized view creation • Tools to help a user establish an acquaintance

  17. Architecture • P2P Layer • P2P functionality’s add-on • Local Data Source • Database • File system • User Interface • User queries • Results • Query Manager and Update Manager • Responsible for query and update propagation • Manage coordination and correspondence rules, acquaintances, and interest groups • Wrapper • Provides a translation layer between QM and UM, and LDS

  18. Summary • Why P2P databases are different • A P2P database scenario • A logic for P2P databases (LRM) • Coordination formulas and domain relations • Query semantics • Architecture and implementation issues

  19. منابع • 1. M.J. Carey, L.M. Haas, P.M. Schwarz, Manish Arya, W.F. Cody, R. Fagin, M. Flickner, A. Luniewski, W. Niblack, D. Petkovic, J. Thomas II, J.H. Williams, E.L. Wimmers: Towards heterogeneous multimedia information systems: The Garlic approach. RIDE-DOM 1995: 124-131. • 2. T. Catarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. International J. of Intelligent and Cooperative Info. Sys., 2(4), 375-398, 1993. • 3. S. Ceri and J. Widom. Managing semantic heterogeneity with production rules and persistent queues. In Proceedings 19thVLDB (1993), 108-119. • 4. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J.D. Ullman, J. Widom. The TSIMMIS Project: Integration of heterogeneous data sources. 16thMeeting of Information Processing Society of Japan, 1994, 7–18. • 5. A. Gupta and J. Widom. Local verification of global integrity constraints in distributed databases. In Proc. ACM SIGMOD Conference, 49-58, 1993.

More Related