1 / 47

Answering queries across mappings

Answering queries across mappings. Grigoris Karvounarakis University of Pennsylvania. WPE-II Presentation. Global mediated schema (virtual). Query Q. T. Mappings. M 1. M 2. M n. Data integration. Heterogeneous data sources. S 1. S 2. S n. I n. I 2. I 1. J.

moe
Download Presentation

Answering queries across mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering queries across mappings Grigoris KarvounarakisUniversity of Pennsylvania WPE-II Presentation

  2. Global mediated schema (virtual) Query Q T Mappings M1 M2 Mn Data integration Heterogeneous data sources S1 S2 Sn ... In I2 I1 ... WPE-II

  3. J J is a data exchange solution if: • hI,Ji²M • J²ST Data exchange Source Target ST M S T I WPE-II

  4. Query answering (basic problem setting) Source Target Query Q M S T I • Given source and target schemas (S, T), mapping M, source instance(s) I and a query QT (over the target), evaluate Q (using data from I) • Query reformulation: Compute a reformulation Q’ of Q that only refers to source relations • Data exchange: Compute a data exchange solution J, such that Q can be evaluated directly on J WPE-II

  5. Outline • Preliminaries • Mapping languages • Semantics of query answering • Query reformulation • Query answering using data exchange • Comparison WPE-II

  6. Mapping languages • Two approaches: • Containment between conjunctive queries • Dependencies (logical assertions) WPE-II

  7. Query containment • Definition: A query Q1 is contained in a query Q2, denoted by Q1vQ2, if for all database instances I: Q1(I) µQ2(I). • Two queries Q1 and Q2 are equivalent, if Q1vQ2 and Q2vQ1. • In the case where Q1 and Q2 are over different schemas, related through mapping M: • M²Q1vQ2 if 8I,J:hI,Ji²M: Q1(I) µQ2(J) WPE-II

  8. Containment mappings • General form (GLAV): • QS(x,y) vQT(x,z) (sound – Open World Assumption) • QS(x,y) ´QT(x,z) (exact – Closed World Assumption) • QS, QTare conjunctions of relational atoms over S,T resp. • Special cases: • GAV(global-as-view): target is specified as a view of the source(s) • QS(x,y)vT(x)(sound – OWA) • QS(x,y)´T(x)(exact – CWA) • LAV(local-as-view): sources are specified as views of the virtual mediated schema • S(x)v QT(x,y)(sound – OWA) • S(x)´ QT(x,y)(exact – CWA) WPE-II

  9. Dependencies • Tuple-generating dependencies (tgds): 8x,z(x,z) y (x,y) (where, are conjunctions of relational atoms and x,y,z are vectors of variables) • Equality-generating dependencies (egds): 8x(x) xi = xj WPE-II

  10. Data exchange schema mappings • Source-to-target tgds: 8x,z(x,z) y (x,y) •  is a conjunction of atoms overS and is a conjunction of atoms overT • Target tgds • Both, areconjunctions of atoms overT • Target egds 8x(x) xi = xj •  is a conjunction of atoms over T WPE-II

  11. Containment mappings vs. source-to-target tgds • A source-to-target tgd of the form: 8x,zQS(x,z) y QT(x,y) is equivalent to the sound GLAV mapping: QS(x,z)v QT(x,y) • Sound GAV and LAV mappings can also be expressed by source-to-targettgds. • But exact mappings also include a target-to-source direction: • E.g.: S(x,z) ´T1(x,y), T2(y,z) is equivalent to: 8x,z S(x,z) yT1(x,y) ÆT2(y,z) (source-to-target) and 8x,y,z T1(x,y) ÆT2(y,z)  S(x,z) (target-to-source) WPE-II

  12. Incompleteness • Mappings do not specify target instance completely • E.g.: 8x,zS(x,z) !9yT(x,y) ÆT(y,z) does not specify the values of y Source Target M E.g., if I = {S(a,b)}: J1 = {T(a,a),T(a,b)} J2 = {T(a,b),T(b,b)} J3 = {T(a,X),T(X,b)} J4 = {T(a,X),T(X,b), T(a,Y),T(Y,b)} . . . S T I J1 J2 J3 . . . WPE-II

  13. Semantics of query answering What do we expect as answers to queries over the target schema? • “Possible worlds” semantics: for every instance I of S, consider all possible instances J of the target schema T such that hI,Ji²M • Convention: certain answers certainM,I(QT) = IJ: hI,Ji²MQT (J) WPE-II

  14. Outline • Preliminaries • Mapping languages • Semantics of query answering • Queryreformulation • Query answering using data exchange • Comparison WPE-II

  15. Equivalent reformulation Definition: Q’S is an equivalent reformulation of QT across M (denoted M²QT´Q’S) if, for every pair of instances I,J of S,T s.t. hI,Ji²M: Q’S (I) = QT (J) WPE-II

  16. Equivalent reformulations may not exist • Any reformulation over S can only return values v such that T(v,v) • But there are instances J, s.t. T contains tuples in which a ¹ b • … even if the mapping is exact 8xS(x) $T(x,x) T(a,b) S(c) Q(x) :- T(x,y) WPE-II

  17. Contained reformulation Definition: Q’S is an contained reformulation of QT across M (denoted M²Q’Sv QT) if, for every pair of instances I,J of S,T s.t. hI,Ji²M: Q’S (I) µQT (J) WPE-II

  18. Maximally-contained reformulation • Definition: QSmax is a maximally-contained reformulation of QT across M if: • M²QSmaxvQTand • Q’Sv QSmax, for every Q’Ss.t.M² Q’Sv QT • The union of all contained reformulations is a maximally-contained reformulation: QSmax´reformM(QT) ´UQ’S: M²Q’SvQTQ’S WPE-II

  19. Maximally-contained reformulations compute certain answers Proposition ([AD98],[FKMP03],[T05]): Let certainM(Q) = lI. certainM,I (Q) Then: certainM(Q) ´reformM(Q) (i.e.,: 8I, reformM(Q)(I) = certainM,I(Q) ) • Note that the above holds for any mapping (i.e., not necessarily conjunctive) WPE-II

  20. Reformulation algorithms (GAV) • Sound/exact GAV mappings: e.g. QS(x,y) vT(x) • Reformulation: • for every relation Ti(x) of the target schema, let ri be the set of rules with Ti on their head (maybe > 1). • Let QTi(x) be the union of the conjunctive queries in the body of the rules in ri • Substitute Ti(x) atoms in Q by QTi(x) WPE-II

  21. Reformulation algorithms (LAV/GLAV) • Sound LAV/GLAV mappings: r: S1(x,y),…,Sn(x,y) vT1(x,z), …, Tm(x,z) (note: Ti ’s are not necessarily distinct relational atoms) (equivalent tgd: 8x,yS1(x,y),…,Sn(x,y) !Ti(x,z),…, Tm(x,z)) • Inverse rules ([DG97]): • For every rule r and every i 2 [1..m] define a rule: Ti(x, fr,z1(x,y), …, fr,zk(x,y)) :- S1(x,y),…,Sn(x,y) (tgd: 8x,yS1(x,y),…,Sn(x,y) !Ti(x,fr,z1(x,y),…, fr,zk(x,y)) skolemization of existential variables) WPE-II

  22. Inverse rules: Example • r: S1(x,y),S2(y,w) vT1(x,z),T1(z,w) • Inverse rules: • T1(x,fr,z(x,y,w)) :- S1(x,y),S2(y,w) • T1(fr,z(x,y,w),w) :- S1(x,y),S2(y,w) • Observe that the same skolem term (fr,z(x,y,w)) represents the common existential variable (z) of the two atoms WPE-II

  23. Query reformulation using inverse rules • Create a logic program PQ composed by: • the query Q • the inverse rules of all mappings M • Let P(I) be the result of the evaluation of the composition of a logic program P with a set of facts I • Theorem ([DG97,AD98]): Let PQ+ be a logic program s.t. for every set of facts I, PQ+(I) is the result of discarding all tuples that contain skolem terms from PQ(I). Then: • PQ+ is a maximally-contained reformulation • PQ+(I) = certainM,I(Q) WPE-II

  24. Peer Data Management Systems • LAV source-to-peer mappings • P2P mappings: inclusion (sound)or equality (exact) GLAV + definitional (GAV) • Queries can be issued at any peer • Every peer can be both source and target w.r.t. different mappings • Pairs of peers may be indirectly connected (by paths of mappings) I3 ... In S3 Sn ... Mn3 P3 Pn M31 M23 M12 P1 P2 S1 S2 I1 I2 WPE-II

  25. Simple PDMS example Q(n1,n2) :- SameProj(n1,n2,p), Author(n1,p),Author(n2,p) r0: SameProj(n1,n2,p) = ProjMem(n1,p),ProjMem(n2,p) ProjMem SameProj Area Author P1 P2 r1:S1(n,p,a)µProjMem(n,p),Area(p,a) r2: S2(n1,n2)µAuthor(n1,p), Author(n2,p) S1 S2 S1 S2 I1 I2 WPE-II

  26. Mapping Graph r0a: SameProj(n1,n2,p) ¶ProjMem(n1,p),ProjMem(n2,p) r0b: SameProj(n1,n2,p) µ ProjMem(n1,p),ProjMem(n2,p) r2: S2(n1,n2)µAuthor(n1,p),Author(n2,p) r1: S1(n,p,a)µProjMem(n,p),Area(p,a) r0a r0b ProjMem SameProj Area Author P1 P2 r1 r1 r2 S1 S2 S1 S2 I1 I2 WPE-II

  27. Query answering in PDMS Theorem: ([HIST05]) • In general, query answering in PDMS is undecidable • Reason: cycles in mapping graph • For acyclic mapping graph: query answering is in PTIME • Still in PTIME, for a limited form of cycles (i.e., exact mappings with some restrictions) • Allows chains of sound (“LAV”) mappings and exact (“GAV”) mappings without projections • Piazza reformulation algorithm • Sound and complete for acyclic mapping graph and limited form of cycles • Sound, in general (computes subset of certain answers) WPE-II

  28. q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Piazza reformulation algorithm (1) q: Q(n1,n2) :- SameProj(n1,n2,p), Author(n1,w), Author(n2,w) r0: SameProj(n1,n2,p) :- ProjMem(n1,p), ProjMem(n2,p) r1: S1(n,p,a)µProjMem(n,p),Area(p,a) ir1a: ProjMem(n,p) :- S2(n,p,a) r2: S2(n1,n2)µAuthor(n1,p), Author(n2,p) ir2a: Author(n1,f(n1,n2)) :- S2(n1,n2) ir2b: Author(n2,f(n1,n2)) :- S2(n1,n2) WPE-II

  29. Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_))Æ (S2(n1,n2)[S2(n2,n1)) Æ(S2(n2,n1)[S2(n1,n2)) WPE-II

  30. Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ(S2(n1,n2)[S2(n2,n1))Æ(S2(n2,n1)[S2(n1,n2)) WPE-II

  31. Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ(S2(n1,n2)[S2(n2,n1)) Æ(S2(n2,n1)[S2(n1,n2)) WPE-II

  32. Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ(S2(n1,n2)[S2(n2,n1)) Æ(S2(n2,n1)[S2(n1,n2)) ´ (S1(n1,p,_)ÆS1(n2,p,_)ÆS2(n1,n2)) (S1(n1,p,_)ÆS1(n2,p,_)ÆS2(n2,n1)) WPE-II

  33. Outline • Preliminaries • Mapping languages • Semantics of query answering • Query reformulation • Query answering using data exchange • Comparison WPE-II

  34. Universal solutions • Data exchange setting S,T,M, instance I of S • An instance J of T is a universal solution of the de setting above if it has homomorphisms to all other solutions • Solutions contain constants (i.e., values that appear in I) and variables (labeled nulls) • Homomorphismh: J1→ J2between target instances: • h(c) = c, for constant c • If R(a1,…,am) is in J1,, then R(h(a1),…,h(am)) is in J2 WPE-II

  35. Universal solutions Source Target M S T J I Universal Solution Homomorphisms h1 h2 h3 J2 J1 J3 . . . Solutions WPE-II

  36. Universal solutions example • M: 8x,zS(x,z) !9yT(x,y) ÆT(y,z) • I = {S(a,b)} • Solutions: J1 = {T(a,a), T(a,b)} is not universal J2 = {T(a,b), T(b,b)} is not universal J3 = {T(a,X), T(X,b)} is universal J4 = {T(a,X), T(X,b), T(a,Y), T(Y,b)} is universal J5 = {T(a,X), T(X,b), T(Y,Y)} is not universal . . . WPE-II

  37. Computing universal solutions Apply the chase procedure on joint instance hI,;i • Source-to-target dependencies only: terminates in PTIME and produces a joint instance hI,Ji, where J is a universal solution (chase(I)) • Target dependencies: not guaranteed to terminate • If it does, it computes universal solution • If it fails, no universal solution exists WPE-II

  38. Example chase sequence d1: 8x,y,zS(x,y)ÆS(y,z) !9w T(x,z,w) h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i )h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i h1: x! a, y! b, z! c extend to h1’: w!X1 WPE-II

  39. Example chase sequence d1: 8x,y,zS(x,y)ÆS(y,z) !9w T(x,z,w) h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i )h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i )h2h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1),T(a,d,X2)}i h1: x! a, y! b, z! c extend to h1’: w!X1 h2: x! a, y! b, z! d extend to h2’: w!X2 WPE-II

  40. Example chase sequence d1: 8x,y,zS(x,y)ÆS(y,z) !9w T(x,z,w) h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i )h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i )h2h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1),T(a,d,X2)}i h1: x! a, y! b, z! c extend to h1’: w!X1 h2: x! a, y! b, z! d extend to h2’: w!X2 h3: x! a, y! e, z! c extend to h3’: w!X1 not applicable! WPE-II

  41. Universal solutions and query answering Theorem ([FKMP]): • If Q is a conjunctive query, I is a source instance and J is a universal solution: Q(J)+= certainM,I(Q) • Any solution J, for which the above holds for any conjunctive query, is universal WPE-II

  42. Outline • Preliminaries • Mapping languages • Semantics of query answering • Query reformulation • Query answering using data exchange • Comparison WPE-II

  43. Using inverse rules to compute universal solutions • For every relation Ti of T, let PM,Ti be the reformulation of the query Q(x) :- Ti(x), using the inverse rules algorithm. Proposition:UiPM,Ti (I) @chase(I) • Crux: every step of a chase sequence corresponds to a step in the evaluation of the logic program using SLD resolution Corollary:UiPM,Ti (I) is a universal solution WPE-II

  44. T Applying data exchange in GAV/LAV settings J2 J1 J ... Jn Query Q M1 M2 Mn S S1 S2 Sn ... In I I2 I1 ... WPE-II

  45. Performance tradeoffs Data exchange: - requires the computation of a solution (polynomial in the size of the instance I) - need to propagate updates in the source - may require to recompute the whole universal solution + But then query evaluation is easy and efficient + If query load is large, the cost of computing the solution may be amortized WPE-II

  46. Performance tradeoffs Reformulation + No “startup” cost + No need to propagate updates - Adds overhead to query processing (although reformulations for “common” queries can be precomputed/cached) - Requires distributed query evaluation engine (but there is room for optimization, e.g., adaptive query processing) - Generated reformulations are generally not minimal WPE-II

  47. Conclusions • Two approaches for answering queries across mappings • Reformulation (data integration) • Universal solutions (data exchange) • Different problems • Data exchange is concerned with other aspects, e.g., identifying the appropriate solution to materialize • Same answers (certain answers) • Performance tradeoffs • Tight relationship between chase and inverse rules techniques WPE-II

More Related