170 likes | 285 Views
Time to Leave the Trees: From Syntactic to Conceptual Querying of XML. Bertram Lud ä scher Ilkay Altintas Amarnath Gupta San Diego Supercomputer Center U.C. San Diego. Overview. Motivating Example: querying XML w/o and w/ conceptual-level information
E N D
Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas Amarnath Gupta San Diego Supercomputer Center U.C. San Diego XMLDM'02, Prague
Overview • Motivating Example: • querying XML w/o and w/ conceptual-level information • “syntactic” vs. “conceptual” querying of XML • Distilling conceptual-level information: • MXS (abstract Model for XML Schema) • XPathT: • Incorporating conceptual-level information in XPath XMLDM'02, Prague
Motivating Example • Example: “Books DB” (yes, more complex examples exist... ;) • elements: <myDB> ... <book> .... <price> .... <author> ... • Sample Queries: • Q1: Which <book>s have a <price> below $80? • Q2: What’s the count and average <price> of <book>s? • (Nice) Try: • Q1: myDB//book[price<80] • Q2: N := count(myDB//book); S := sum(myDB//book/price); Avg := S/N; • But what about ... • ... <book>s with multiple <price>s? • ... <awe> (award-winning-exemplars) elements (= subtype of book having subelement <award>): we forgot those! XMLDM'02, Prague
Schema Information to the Rescue! • XML & Semistructured Data Model: • labeled ordered trees • “instance contains its own schema information” • XML instances and DTDs have very little “schema info”: • tag names (aka element “types”) = attribute names • element nesting = object (“slot”) structure • no data types, constraints, classes, class hierarchy, ... • Schemas are Good for You! • link to conceptual models/DB design, query formulation, • validation, storage layout (optimization), • query processing (optimization), ... • XML Schema XMLDM'02, Prague
Motivating Example (Cont’d) • Q1 after studying <myDB> and/or its XML Schema: • there is a type hierarchy below type bookT • tag names are bound to those types • but XPath doesn’t know this => use Syntactic Queries: //*[book OR tbook OR cbook OR...OR awe] [price<80] • tedious and error-prone (do-it-yourself: Appendix A) • e.g. you overlooked <publication xsi:type=“bookT”> ! (usually schema info notcontained in the XML instance) • small changes in the schema (adding a new subtype) require rewriting of your query... XMLDM'02, Prague
From Syntactic to Conceptual XML Queries 1. Distill conceptual information from the XML Schema • Abstract Model of XML Schema (MXS) 2. Incorporate MXS information into the query language • XPathT (“XPath with types/classes”) • turn Syntactic XML Query //*[book OR tbook OR cbook OR ... OR awe] [price<80] • into a more adequate Conceptual XMLQuery: //*[ts(bookT)][price<80] /* works for any subtype of bookT */ • more robust w.r.t. schema changes • new opportunities for semantic query optimization XMLDM'02, Prague
Abstract Model of XML Schema (MXS) • Basic Ideas: • Formal abstract model(never mind the XML Schema syntax!), inspired by Model Schema Language (MXL) [Brown-Fuchs-Robie-Wadler-WWW10-2001] • “Types as Classes” • XML Schema Names: • T: Type Names • E: Element Names • A: Attribute Names • XML Instances... • ... usually contain only element names (tags) Eand attributes A ( exception: “xsd:type = ...” ) XMLDM'02, Prague
Abstract Model of XML Schema (MXS) • MXS Names • T: Types, E: Elements, A: Attributes • Kinds of Types • simple vs. complex: T_s, T_c • abstract vs. concrete: T_a, T_na • Type Hierarchy • restrict (T_s T_s) (T_c T_c) • restricts possible instances, keeping structure • extend (T_s T_c) T_c • adds “slots” (elements and attributes) • subtype = extend restrict • extend and restrict are subtyping mechanisms XMLDM'02, Prague
Type (Class) Hierarchy in XML Schema • Convention: user-defined type names end with “T” • authorT, publicationT, bookT, ... XMLDM'02, Prague
EXTEND SUBTYPE RESTRICT Inheritance in XML Schema (I) expTextBookT ::= SUBTYPE(bookT) that RESTRICTs<price> to expPriceTandEXTENDs with <recommended_for> XMLDM'02, Prague
multiple inheritance single inheritance Inheritance in XML Schema (II) 19thcenturyTextBookType ::= SUBTYPE{textBookT, c19bookT} XML Schema type system does not known the two are equivalent! XMLDM'02, Prague
Framework for Conceptual Queries in XML • Binding Types to Elements • bind (E (T_s T_c )) (A T_s) • binds element names to simple or complex types • binds attribute names to simple types • Syntactic XML Instance: D • root(NodeId), child(NodeId,Integer,NodeId), tag(NodeId,Tagname), data(NodeId,Data) • Conceptual XML Instance: D+ • restrict(T, T), extend(T, T), subtype(T, T), • bind(E T, T) • ... XMLDM'02, Prague
XPathT: Incorporating Type (Class) Information in XPath • XPath patterns p and qualifiers q: p[q] returns matches of p which qualify according to q • New XPathT patterns: • r(t), e(t), s(t):restrict, extend, subtype type t • tr(t), te(t), ts(t): transitive versions XMLDM'02, Prague
Semantics of XPathT • Example: “transitive subtype”: SEM( ts(t) ) := { t’ | subtype*(t,t’) } from types to element names: SEM( [T] ) := { e | bind(t,e), tT } SEM( [ts(bookT)] ) := {book,ebook,tbook, ...} XMLDM'02, Prague
conceptual information tree structure information Conceptual(-level) XML Queries in XPathT • Which books have price below $80? //*[ts(bookT)][price<80] • Semantic-aware equivalent rewriting: //*[ts(bookT)][NOT ts(expTextBookT)][price<80] • Logic XPathT Query Plan: XMLDM'02, Prague
Summary • Complex domains require conceptual level modeling and querying capabilities beyond just tree structure • Statues Quo: XML Schema: simple “conceptual model” with may ad-hoc “design decisions”/restrictions • Abstract Model of XML Schema (MXS) • XPathT: first step towards “conceptual” or “semantic” XML query language extensions • more concise, intuitive, flexible, and robust queries • the system maps conceptual to syntactic queries, not the programmer/query designer! XMLDM'02, Prague
Next Steps & Outlook • extend MXS to include more conceptual information • develop formal semantics • XPathT, extensions: XPathC, XQueryC • research problems: • mapping: XPathC queries => equivalent XPath queries • formalize equivalence, always possible? Then, conventional XML query processors can be used! • “proxy XML Schema doc”: instead of rewriting into XPath over the original instance, can one materialize some conceptual info as a “proxy XML doc” such that conceptual queries become conventional queries against the proxy... • semantic query optimization: equivalent rewritings given the conceptual level constraints XMLDM'02, Prague