150 likes | 246 Views
Managing XML and Semistructured Data. Lecture 4: Path Expressions. Prof. Dan Suciu. Spring 2001. In this lecture. Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1. Path Expressions. Examples:
E N D
Managing XML and Semistructured Data Lecture 4: Path Expressions Prof. Dan Suciu Spring 2001
In this lecture • Path expressions • Regular path expressions • Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1
Path Expressions Examples: • Bib.paper • Bib.book.publisher • Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects
Bib &o1 paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &o44 &o45 &o46 &o52 &96 1997 &o51 &o50 &o49 &o47 &o48 last firstname firstname lastname first lastname &o70 &o71 &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Path Expressions Examples: DB = Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}
Answer of a Path Expression Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): • PTIME complexity Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}
Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: • Bib.(paper|book).author • Bib.book.author.lastname? • Bib.book.(references)*.author • Bib.(_)*.zip
Applications ofRegular Path Expressions • Navigating uncertain structure: • Bib.book.author.lastname? • Syntactic substitute for inheritance: • Bib.(paper|book).author • Better: Bib.publication.author, but we don’t have inheritance
Applications ofRegular Path Expressions • Computing transitive closure: • Bib.(_)*.zip = everything accessible • Bib.book.(references)*.author = everything accessible via references • Some regular expressions of doubtful practical use: • (references.references)* = a path with an even number of references • (_._)* = paths of even length • (_._._.(_)?)* = paths of length (3m + 4n) for some m,n • But make great examples for illustration
Answer of aRegular Path Expression Recall: • Lang(R) = the set of words P generated by R Answer of regular path expressions: • Answer(R,DB) = {Answer(P,DB) | P Lang(R)} Need an evaluation algorithm that copes with cycles
Regular Path Expressions Recall: each regular expression NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4
Regular Path Expressions Canonical Evaluation Algorithm • Answer(R,DB): • construct A from R • construct product automaton G = A x DB: • nodes(G) = states(A) x nodes(db) • edges(G) = {((s,x),L,(s’,x’) | (s,L,s’) edges(A), (x,L,x’) edges(DB)} • root(G) = (initial(A), root(DB)) • compute Gacc = set of nodes accessible from root(G) • return {x | s terminal(A) s.t. (s,x) Gacc}
_ &o1 s1 s2 s3 a a _ &o2 a a &o3 &o4 b Regular Path Expressions Example: R = _.(_._)*.a A = DB = Answer of R on DB = { &o2, &o3}
Compute Product Automaton G _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b
Compute Accessible Part Gacc _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b Answer(R,DB) = {&o2, &o3}
Complexity of Regular Path Expressions • The evaluation algorithm runs in PTIME in size(R), size(DB) • Even when there are cycles in DB