1 / 21

Luc

Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary , 2011 . Another one of these No-SQL talks ?.

rania
Download Presentation

Luc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trees, semistructured data,and other strange ways to go beyond tablesSerge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011 Another one of these No-SQL talks? IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON… Luc Véro

  2. Theorem: Information lives in trees and not in relations Proof: the Bible does not say « But of the two dimensional table of knowledge of good and evil … »  Introduction Trees are useless n Knowledge lives in trees But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die.Genesis, 2. 17 • A tree is a tree. How many more do you have to look at? • Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966) • We don’t need anything beyond relations. These things are useless. Reject! • Anonymous referee (circa 1990)

  3. Organization • Introduction • Hierarchical data model 60s • Nested relations 80s • Complex objects early 90s • Semistructured data & unranked labeled trees late 90s • Unranked labeled ordered trees, aka XML early 00s • Evolving trees, aka Active XML mid 00s • Cycles 90s to now • Conclusion More or less chronological

  4. For lack of time, we will ignore IMS and the hierarchical model • The language was purely navigational anyway • We will also ignore early works such as Makinouchi, Jacobs or Hardgrave • We will start with N1NF • François Bancilhon in France • Hans Schek in Germany • PhD thesis of Nicole Bidoit

  5. Non-First-Normal-Form N1NF A quarter on tables. Now what? Data live in 1NF relations Data would prefer to live in infamous nested relations aka V-relations aka N1NF relations aka NF2 relations Trees! DB101

  6. The devil is in the details V-relations N1NF-relations A is not a key The size is now possibly exponential in the size of the domain A is a key No new power

  7. Complex object model tuple and set constructors used freely * * * * * Families        Children Children Cars Cars Name Peter Name Peter Name Mimi Sex F Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Sex F

  8. A logic and algebra for complex objects • Logic: main novelty is set variables – non first-order • Example: AbouBanat Query • { T.Father| Families(T)  X  T.Children ( X.Sex = F ) } • Algebra: powerset operation, unnest/nest

  9. Results • Equivalence theorem: algebra and logic have same expressive power • Remark: one can compute TC using algebra/logic (waoh! Cool!) • Also studied: fixpoint, datalog, while… • Complexity: each new level of nesting introduces one more exponential • Need to control the use of powerset 2n 2n 2 ….

  10. From complex objects to semistructured data * * * * * Families        Children Children Cars Cars Name Peter Name Peter Name Mimi Sex F Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Sex F

  11. Revolution 1: more flexibility * * * * * Families        Children Children Cars Cars Name Peter Name Peter Name Mimi Sex F Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Sex F Annotations Trash

  12. Revolution 2: Remove some nodes; name all * * * * Families       Family Family Children Cars Cars Name Peter Name Peter Child Child Car Car Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Ann. Trash Sex F

  13. Unranked label trees Families Family Family Children Cars Cars Name Peter Name Peter Child Child Car Car Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Ann. Trash Sex F

  14. This is better adapted to a Web context • Self describing data: No separation between schema and data • Flexibility • Not such a big deal • May be the main contribution is the format? • <families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> … Plus ça change, plus c’est la même chose The more things change, the more theystay the same

  15. What else? The trees are unbounded a r a $ a a a a a a a a a a $ a b a b b • Like nested relations, trees are unbounded in width • Unlike nested relations, they are unbounded in depth • One can simulate 2 counter machines with 2 branches • Do applications simulate 2 counter machines with XML documents? • I am still looking for one • XML documents are rarely deep • But even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees

  16. What else? the trees are orderedUnranked labeled ordered trees = XML • Ignore order • Classical optimization • Respect order • Totally new ball game • Bring in tree automata Order is often painful for optimization Reconcile

  17. Selling argument is the Web… • The move from relations to trees is interesting • But the move from centralized to distributed as well • and much less investigated • Where the fun is: • Scale is beyond what we though was thinkable • Machines are totally autonomous • Schema replaced by numerous ontologies • True/false logic replaced by inconsistency, probabilities, trust, belief…

  18. And the trees are evolving (aka Active XML) • An old idea from object databases: mix data and computation Resorts Resort snowcond hotels State Colorado Name Aspen snow !Yahoo.com/GetHotels <city name=“Aspen”/>) Unit Depth Meter 1 !Unisys.com/snow (“Aspen”)

  19. And there are cycles Person Name Spouse • For lack of time, I will not mention the network model [Codasyl 1969] • The language was purely navigational anyway • If I would add references to XML, I’d get cycles • Lots of models for graph data, e.g., IQL • Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQL • Similar issue for unordered trees [recent result with Vianu] Adam Person Name Spouse Eve Paris C. Kanellakis

  20. Conclusion • Is this a good time to do research on trees in databases? • The best time to plant a tree was 20 years ago.  • The next best time is now.  • Chinese Proverb

  21. AdvertisementBook on Web data management to appear at Cambridge University Presshttp://webdam.inria.fr/Jorge

More Related