1 / 30

Efficiently Publishing Relational Data as XML Documents

Efficiently Publishing Relational Data as XML Documents. Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo. What drove them? . No…it wasn’t the chaffeur… XML rapidly emerging as a global standard Large amount of data stored in RDBMS and needs to be exchanged. Primary Issues.

sun
Download Presentation

Efficiently Publishing Relational Data as XML Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.

  2. What drove them? • No…it wasn’t the chaffeur… • XML rapidly emerging as a global standard • Large amount of data stored in RDBMS and needs to be exchanged

  3. Primary Issues • Language specification • Implementation – what method works best? • Adding TAG and STRUCTURE - when do you do these operations?

  4. Roadmap • Language specification • Implementation • Early tagging, structuring • Late tagging, structuring • Early structure, late tagging • Performance Evaluation

  5. Our little sample… <customer id=C1> <name> Dilys Thomas </name> <accounts> <account id=A1>34552211233</account> <account id=A2>98653412223</account> <! Dilys – we know where your money is…> </accounts> <porders> <porder id=PO1 acct=A1> <items> <item id=I1>Gift for Consulate VISA Woman</item> <item id=I1>Traveller’s cheques</item> </items> <payments> <payment id=P1> due Feb 12 </payment> <!that’s today!!> </payments> </porder> </porders> </customer> • Note the • Elements • Names/Tags • ID Refs

  6. Underlying tables Customer(id int, name varchar) PurchOrder(id int, custID int, acctID varchar, date varchar) Account(id varchar, custID int, acctnum int) Item(id int, poID int, desc varchar) Payment(id int, poID int, desc varchar)

  7. SQL-based language spec. Sqlfunctions: Define XMLConstruct ITEM(id int, desc varchar) AS { <item id=$id>$desc </item> } Sqlaggregates: Select XMLAGG(ITEM(id, desc)) FromItem // returns an XML aggregation of items

  8. Sample query Select cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum)) From Account acct Where acct.custId=cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poid=porder.id) (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay, Where pay.poid=porder.id))) From PurchOrder porder Where porder.custID=cust.id)) From Customer cust Constructs XML from the relational tables.

  9. Roadmap • Language specification • Implementation • Early tagging, structuring • Late tagging, structuring • Early structure, late tagging • Performance Evaluation

  10. Implementation alternatives • Late/early tagging • Late/early structuring • (No late structuring+early tagging) TAG STRUCTURE DB Result

  11. Early tagging and structuring • Stored Procedure • Explicitly issue nested queries • Get corresponding nested data using other queries • Done outside relational engine. Tag/str as soon as results are available. • Too many queries per tuple. • Fixed order (nested loop join)

  12. Contd… • Correlated CLOB • Push queries into the engine • Plug in XMLAGG, XMLCONSTRUCTsupport into engine • Have to handle huge CLOBS in the engine • Fixed join order • Decorrelated CLOB • Decorrelate and use Outer Joins – no longer fixed order • Still carry around CLOBs (due to early tagging!)

  13. Roadmap • Language specification • Implementation • Early tagging, structuring • Late tagging, structuring • Early structure, late tagging • Performance Evaluation

  14. Late tagging and Structuring 2 phases -> Content creation + Tagging/Structuring • Redundant Relation: • Blindly join all constituent tables • ‘Parent’ data repeated • Unsorted Outer Union: • Decorrelate query, compute common subexpressions and use Outer Joins • Take an Outer Union of result tables • Columns grow with width/depth of XML doc. - Path Outer Union

  15. Contd… • Alternatively, don’t repeat Parent node at every child. • Feed parent into Outer Union and only keep parent Ids with children. – Node Outer Union • Greatly increases no of tuples generated

  16. Outer Union • Note: Outer Joins to retain parents. • OU • Separate column in result for each column of input • Unused cols set to NULL • Type column added for each row.

  17. Contd… 2 phases -> Content creation + Tagging/Structuring • Inside the Engine • XMLAGG, XMLCons support required • Final step after content generated • CLOBs not carried around • Outside the Engine • GROUP Siblings • Eliminate Duplicates • Extract info and TAG

  18. Grouping data HASH! Every row in the final table has a column with name of element with all parents (a.b.c.d.e) • Check if <a.b.c.d> hashes true, TAG accordingly and add as another child at that level • Else check if <a.b.c> hashes true, add <a.b.c.d> and then <a.b.c.d.e> • And so on…till you either find hash or hit root element. • Tuples can come in any order. Sufficient mem required!

  19. Roadmap • Language specification • Implementation • Early tagging, structuring • Late tagging, structuring • Early structure, late tagging • Performance Evaluation

  20. Late tagging, Early Structuring • As before…only, now SORT the outer union • Ensure • Parent info comes before child • Info about node and desc. completed before any other node info starts • Ordering follows user-def condns

  21. Sort and Tag Sort on Pkeys • Define an order on Pkeys (CustID, AcId, POId, ItemId, PaymentID) – based on structure of XML Doc. • Parent tuples will have filled values for first few cols and null for the later ones • Nulls sort low Tag in constant memory • Maximum amount of info to be stored is proportional only to the depth of the XML Doc.

  22. Roadmap • Language specification • Implementation • Early tagging, structuring • Late tagging, structuring • Early structure, late tagging • Performance Evaluation

  23. Modeling transformations • Query fanout • Query nesting depth • # Root nodes (tuples in the root table) • # Leaf tuples (tuples corresp to all leaf nodes) Structure Result size

  24. Results (graph time!!) • “Inside the engine” versions are about 3 times as fast as “outside” counterparts

  25. Inside –vs- Outside the engine • Query Execution • Bind out • Tag/Structure • Write XML to file • Bind-out time not required for Inside Engine approaches

  26. Query Fan out • Increasing QFO -> Greater Joins -> More time • CorrCLOB has to use Nested Loop Join Order – bad performance • Unsorted OU better than Sorted OU. Sorting cost > Cost of complex tagging • DecorrCLOB – optimized by DB2 engine. CLOBs retained in memory (low fanout)

  27. Query Depth • DecorrCLOB – huge increase! • Complexity of queries increases. • Engine makes bad choices (sorting after XMLAGG etc)

  28. Number of Roots • Outer Union approaches not affected • CorrCLOB at #root=1 equiv to just 2 queries!

  29. Number of tuples, Memory • If sufficient memory, no great changes! • If not, Unsorted OU which requires large space for tagging, fails. Overflow! • Sorted OU – based on scalable sorting. Adapt to large size and less mem better.

  30. Roadmap • Language specification • Implementation • Early tagging, structuring • Late tagging, structuring • Early structure, late tagging • Performance Evaluation • Quo vadis?

More Related