1 / 27

Composing Mappings among Data Sources

Composing Mappings among Data Sources. Jayant Madhavan Alon Halevy University of Washington. Mappings in data sharing architectures. Data Integration System Sources with mappings to a single mediated schema …, [Lenzerini, PODS ’02]. Mediated Schema. ACM. DBLP. CiteSeer.

trinh
Download Presentation

Composing Mappings among Data Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Composing Mappings among Data Sources Jayant Madhavan Alon Halevy University of Washington

  2. Mappings in data sharing architectures Data Integration System • Sources with mappings to a single mediated schema …, [Lenzerini, PODS ’02] Mediated Schema ACM DBLP CiteSeer Humboldt • Peer Data Management System • Network of pair-wise mappings • [Piazza, UW], [Hyperion, Toronto], [PeerDB, Singapore], [LRM, Trento], [Edutella, Hannover], [Semantic Gossiping, EPFL], [Raccoon, Irvine], [Orchestra, Penn] ACM DBLP UW CiteSeer Composing Mappings among Data Sources

  3. Peer Schema RUW u1(x,y), … RCiteseer c1(x,y,z), … u1(x,y), u2(y,z)  c1(x,y,z) … Mapping Mapping Formula (Q1  Q2) Peer Data Management System (Piazza) Humboldt DBLP ACM UW CiteSeer Composing Mappings among Data Sources

  4. Mapping Composition Humboldt ACM DBLP UW CiteSeer Composing Mappings among Data Sources

  5. Q3 Q’ Q2 Q5 Q1 Q4 Q Query Answering Humboldt • Iterative rewriting by chaining mappings • Transitive closure of all relevant mappings ACM DBLP UW CiteSeer • Eliminating redundancies from rewritings (optimization) • [Piazza: ICDE’03, WWW’03, VLDBJ’03] Composing Mappings among Data Sources

  6. Optimization • Pre-compute or compose paths to relevant peers Composition in a PDMS Potential inefficiency • Expensive rewriting + optimization at runtime for each query Humboldt Q5 ACM DBLP Q1 Q’ Q3 UW CiteSeer Q • But,composition must be independent of Q • Side-benefit: robustness to information loss • Dead intermediate peers will not semantically partition the network Composing Mappings among Data Sources

  7. Composition: Meta-data operation • Mappings are integral to all data sharing architectures • Message passing • Data exchange [Fagin, Kolaitis & Miller, ICDT’03] • … Composition is a natural problem in many of these • Fundamental operator to meta-data management • Model Management operators: Match, Merge, Compose, … • [Bernstein, Halevy, Pottinger, SIGMOD Record ’01] • [Melnik, Bernstein, Rahm, SIGMOD ’03] • Formal treatment for a particular mapping language Composing Mappings among Data Sources

  8. Q MAC a1,…,am Relational Schema Problem Definition Q’A1 Qc1 … Q’Ak Qck MAC MAB MBCw.r.t. Query LanguageL For all queries QL, Q given MAC = Q given MAB,MBC MAB MBC A B C QA1 QB1 … QAn QBn GLAV formulas Composing Mappings among Data Sources

  9. Overview of Contributions • Surprising: Composition of finite mappings can be infinite!!! • Good news: Composition computable for powerful practical query languages • CQk finite, or infinite but encoded finitely • Composition algorithm that • on termination computes all the formulas in the composition • terminates if composition finite, and also for many infinite compositions • Rewriting algorithm to exploit infinite formulas • Extension of results from answering queries using views • Complexity results Composing Mappings among Data Sources

  10. Outline • Composition is interesting and important • Problem definition and results overview • Finite and infinite composition • Results and Composition Algorithm • Summary, current and future work Composing Mappings among Data Sources

  11. MAB MBC A B C bbc(x,y) b(x,y) bbba(x,y) Composition Example Graph G MAB: bbba(x,y)  b(x,t1), b(t1,t2), b(t2,y) MBC: b(x,t), b(t,y)  bbc(x,y) Composing Mappings among Data Sources

  12. MAB MBC A B C bbc(x,y) b(x,y) bbba(x,y) MAB b(x,t1), b(t1,t2), b(t2,x1) MBC bbc(x,y1), bbc(y1,y2) x y1 y2 Composition Example (2) Q x Q(x) :- bbba(x,x1) bbba(x,x1)  bbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources

  13. MAB MBC A B C bbc(x,y) b(x,y) bbba(x,y) bbba(x1,y)  bbc(y1,y2), bbc(y2,y) MAC y bbba(x,x1), bbba(x1,y)  bbc(x,y1), bbc(y1,y2), bbc(y2,y) x y Composition Example (3) bbba(x,x1)  bbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources

  14. MAB MBC A B C rbc(x,y) bbc(x,y) r(x,y) b(x,y) rbba(x,y) bba(x,y) x t1 t2 y x t y x t y x t y Infinite composition MAB rbba(x,y)r(x,t1),b(t1,t2), b(t2,y) bba(x,y)b(x,t), b(t,y) Graph G MBC r(x,t),b(t,y) rbc(x,y) b(x,t), b(t,y) bbc(x,y) Composing Mappings among Data Sources

  15. x x Infinite Composition (2) MAC MAB MBC A B C rbc(x,y) bbc(x,y) r(x,y) b(x,y) rbba(x,y) bba(x,y) Q(x) :- rbba(x,x1) MAB r(x,t1), b(t1,t2), b(t2,x1) MBC rbba(x,x1)  rbc(x,y1), bbc(y1,y2) rbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources

  16. X 2n x 2n+1 x Infinite Composition (3) rbba(x,x1)  rbc(x,y1), bbc(y1,y2) bba(x,y) bbc(x,y) rbba(x,x1), bba(x1,x2)  rbc(x,y1), bbc(y1,y2), bbc(y2,y3) Composing Mappings among Data Sources

  17. Main Result Composition computable for interesting query languages • CQk : queries with localized variable interactions • Includes most queries in practice, e.g. cyclen(x)  CQ3 cyclen(x) :- b(x,y), pathn-1(y,x) pathn-1(x,y) :- b(x,z), pathn-2(z,y) … path1(x,y) :- b(x,y) Composition w.r.t CQk is computable and is either a finite number of GLAV formulas, or finite encoding of infinite GLAV formulas Composing Mappings among Data Sources

  18. Composition Algorithm • Minimal formulas • Formulas that have to be present in the composition • Larger minimal formulas are extensions of smaller ones • Residues of minimal formulas • Signatures that capture information on extensions • Isomorphic residues  isomorphic extensions • Query Rewrite Graphs • Encoding of all minimal formulas in the composition • Cycles can be used to encode infinite number of formulas Composing Mappings among Data Sources

  19. x x1 x1 x2 Join variable x u1 y1 y1 u2 y2 y2 u3 y3 Internally existential variable Not visible on right side Minimal Mapping Formulas Formulas that cannot be constructed from smaller formulas rbba(x,x1), bba(x1,x2)  rbc(x,y1), bbc(y1,y2), bbc(y2,y3) Theorem: Sufficient to compute all minimal formulas Composing Mappings among Data Sources

  20. Incremental algorithm … Q’A1 Q’A1  Q’C1 QA  QC Q’Ai X Q’Am Try all one atom extensions Complete formulas Incremental Construction • Lemma: If QA QC is a minimal formula  •  minimal formula Q’A Q’C • QA’ has one atom less than QA rbba(x,x1), bba(x1,x2)  rbc(x,y1), bbc(y1,y2), bbc(y2,y3) rbba(x,x1)  rbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources

  21. y2 x1 u3 y3 x2 Potential Join variable Internally existential variable b(u2,y2), {u2}, {y2}, {x1u2} Residue Residues in Formulas rbba(x,x1)  rbc(x,y1), bbc(y1,y2) Residues capture all extension information Null residues  No extensions x x1 x u1 y1 u2 y2 Composing Mappings among Data Sources

  22. rbba(x,x1), bba(x1,x2)  rbc(x,y1), bbc(y1,y2), bbc(y2,y3) x x1 x2 x u1 y1 u2 y2 u3 y3 isomorphic Isomorphic Residues Isomorphic residues  Isomorphic extensions rbba(x,x1)  rbc(x,y1), bbc(y1,y2) x x1 x u1 y1 u2 y2 Composing Mappings among Data Sources

  23. Query Nodes rbc(x,y1),bbc(y1,y2) R2 bbc(x,y) R1 bba(x1,x2) Q3 Rewrite Nodes bbc(y2,y1) R3 Query Rewrite Graphs • Paths from roots encode minimal mapping formulas • Cycles encode infinite formulas rbba(x,x1) Q2 bba(x,y) Q1 Theorem: QRG construction on termination encodes the composition Composing Mappings among Data Sources

  24. Other Results • Algorithm to exploit infinite formulas • Cyclic QRG can be represented by a pair of recursive datalog programs • Extension of earlier results in answering queries using infinite views [Levy, Rajaraman & Ullman, PODS’96] • Complexity Results • Upper-bound: composition verification is in • Lower-bound: composition verification w.r.t. finite sized query languages is -hard Composing Mappings among Data Sources

  25. Related Work • GLAV • [Millstein, Friedman & Halevy, AAAI’99], [Lenzerini, PODS’02], [Fagin, Kolaitis & Miller, ICDT’03] • Generalization of LaV and GaV • Leads to infinite composition • Reasoning w.r.t. Query Languages • View containment [Li, Ullman & Bawa, ICDT’01] • Makes the problem hard Composing Mappings among Data Sources

  26. Summary Q MAC • Mapping composition • Can be infinite for simple GLAV mappings • Can be constructed completely for interesting query languages • QRG encodes valid formulas in composition • QRG can also encode infinite formulas • Can be exploited for query answering even when infinite MAB MBC A B C Composing Mappings among Data Sources

  27. Current and Future Work • Composition in a PDMS • Choosing paths to pre-compute • Manipulating infinite compositions • Semi-automatic construction of mappings • Learning from a corpus of related schemas • Exploiting past mapping experience • [Halevy, Madhavan & Bernstein, DeBull’03 to appear] • [Madhavan, Bernstein, Chen, Halevy & Shenoy, IIW@IJCAI ’03] More information: http://www.cs.washington.edu/homes/jayant Composing Mappings among Data Sources

More Related