1 / 50

XPath

XPath. By Laouina Marouane. Outline. Introduction Data Model Expression Patterns Location Paths Example XPath 2.0 Practice Conclusion. What is XPath?. A scheme for locating documents and identifying sub-structures within them.

mvito
Download Presentation

XPath

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XPath By Laouina Marouane

  2. Outline • Introduction • Data Model • Expression • Patterns • Location Paths • Example • XPath 2.0 • Practice • Conclusion

  3. What is XPath? • A scheme for locating documents and identifying sub-structures within them. • A language designed to be used by both XSL Transformations (XSLT) and XPointer. • Provides common syntax and semantics for functionality shared between XSLT and XPointer. • Primary purpose: Address ‘parts’ of an XML document, and provide basic facilities for manipulation of strings, numbers and booleans. • W3C Recommendation. November 16, 1999 • Latest version: http://www.w3.org/TR/xpath

  4. Why XPath? • Unique identifiers are not sufficient • Assigning unique identifier to every element is a burden • Identity of element may be unknown • Identifiers cannot handle ranges of text • May be inconvenient to identify a large number of objects by listing their identifiers

  5. Introduction • XPath uses a compact, string-based, rather than XML element-based syntax. • Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. • Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. • A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. • Models an XML document as a tree of nodes of types: element, attribute, text. • Supports Namespaces. • Name of a node (a pair consisting of a local part and namespace URI). • Example of an XPath expression: /bib/book/publisher

  6. Data Model • Treats an XML document as a logical tree • This tree consists of 7 nodes: • Root Node – the root of the document not the document element • Element Nodes – one for each element in the document • Unique ID’s • Attribute Nodes • Namespace Nodes • Processing Instruction Nodes • Comment Nodes • Text Nodes • The tree structure is ordered and reads from top to bottom and left to right

  7. bib Data Model The root Processing instruction Comment The root element book book publisher author . . . . Addison-Wesley Serge Abiteboul

  8. Example For this simple doc: <doc> <?Pub Caret?> <para>Some <em>emphasis</em> here. </para> <para>Some more stuff.</para> </doc> Might be represented as: root <doc> <?Pub Caret?> <para> <para> text <em> text text text

  9. Expressions • A text string to select an element, attribute, processing instructions, or text • The primary syntactic construct in XPath. • An expression is evaluated to yield an object, which has one of the following four basic types: • node-set (an unordered collection of nodes without duplicates) • boolean (true or false) • number (a floating-point number) • string (a sequence of UCS characters)

  10. Element Context • Meaning of element can depend upon its context • <book><title>…</title></book><person><title>…</title></person> • Want to search for, e.g. title of book, not title of person • XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) • title book/title person/title

  11. Context • Expression evaluation occurs with respect to a context . • The context consists of: • a node (the context node) • a pair of non-zero positive integers (the context position and the context size) • a set of variable bindings • a function library • the set of namespace declarations in scope for the expression

  12. More on context types • The context position is always less than or equal to the context size • The variable bindings consist of a mapping from variable names to variable values • The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result • The namespace declarations consist of a mapping from prefixes to namespace URIs

  13. Patterns • A pattern is an expression used not to find objects, but to establish if a specific object matches certain criteria • Very important in XSLT specification • The '|' symbol is used to specify alternative patterns for matching • note|warning|/book/intro

  14. Location Paths • One important kind of expression is a location path (special case of expr) • The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path • Location paths can recursively contain expressions that are used to filter sets of nodes • LocationPath (most important construct) describes a path from 1 point to another. • Analogy: Set of street directions. “Second store on the left after the third light” • Two types of paths: Relative & Absolute • Composed of a series of steps (1 or more) and optional predicates

  15. Relative Paths • A relative location path consists of a sequence of one or more location steps separated by / • Each node in that set is used as a context node for the following step • E.g. para will select children of the current node that are of name 'para' • <chapter> //Current node <title>…</title> <para>…</para> //Selected <note> <para>…</para> //Not selected until note <note></chapter> • Verbose expression is child::para

  16. Absolute Paths • An absolute location path consists of / optionally followed by a relative location path • A / by itself selects the root node of the document containing the context node

  17. Location Steps • A location step has three parts: • an axis, which specifies the tree relationship between the nodes selected by the location step and the context node, • a node test, which specifies the node type and expanded-name of the nodes selected by the location step, and • zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step.

  18. Location Steps parts explained • Axes • 13 axes defined in XPath • Ancestor, ancestor-or-self • Attribute • Child • Descendant, descendant-or-self • Following • Preceding • Following-sibling, preceding-sibling • Namespace • Parent • Self • Node test • Identifies type of node. Evaluates to true/false • Can be a name or function to evaluate/verify type • Predicate • XPath boolean expressions in square brackets following the basis(axis & node test)

  19. Location Steps in syntax • The syntax for a location step is the axis name and node test separated by a double colon, followed by zero or more expressions each in square brackets. • For example, in child::para[position()=1], child is the name of the axis, para is the node test and [position()=1] is a predicate

  20. Abbreviated Syntax • child:: can be omitted from a location step.(child is the default axis)div/para is equivalent to child::div/child::para • attribute:: can be abbreviated to @ • // is short for /descendant-or-self::node()/ • A location step of . is short for self::node()ex: .//para is short for self::node()/descendant-or-self::node()/child::para • Location step of .. is short for parent::node()

  21. Wildcards • Sometimes don't or can't know names • Can use wildcard '*' for any single element • book/intro/titleand book/chapter/titleare matched by book/*/title(but so is book/appendix/title) • Verbose child::* • Multiple asterisks can match several levels • But must know exact level and that inappropriate matches won't be made

  22. Descendants • Rather than use wildcard - Recursively search through descendants • chapter//para will go through chapter hierarchy and select any para elements • <chapter> //Starting node <title>…</title> <para>…</para> //Selected <note> <para>…</para> //Selected <note></chapter> • child::chapter/descendant-or-self::node()/child::para

  23. Ancestors • To signify parent of context element • '..' • parent() • To find all 'title' elements that share parent of context node • ../title • parent::node()/child::title

  24. Other Relationships • May move around siblings of current context element • preceding-sibling:: • following-sibling:: preceding-sibling:: child:: parent:: following-sibling::

  25. Other Relationships (2) • Can access all ancestors and descendants of current context element • ancestor:: • descendant:: • These methods don't select siblings descendant:: ancestor::

  26. Other Relationships (3) • Can access all ancestors and descendants of current context element • ancestor-or-self:: • descendant-or-self:: • These methods don't select siblings descendant-or-self:: ancestor-or-self::

  27. Other Relationships (4) • Can access all preceding and following completed nodes of current context element • preceding:: • following:: • Can access attributes • attribute:: preceding:: attribute:: following::

  28. Predicate Filters • Location paths are indiscriminate • May get a list of items that are selected • Predicate filter is used to filter the list • Filter is held between '[ ]' • Simplest is position() function predicate • exon[position() = 1] //1st exon • intron[2] //2nd intron • Can combine tests with 'and' and 'or'

  29. Position Tests • The last() operation • Locates the last sibling in list • The count() operation • Evaluates the number of items in list • child::transcript[count(child::intron) = 1] • The id() operation • Checks the identifier of the element • child::transcript[id("ENS0001")]

  30. Attribute Tests • Attributes can be selected • feature/@type • Elements can be selected dependant upon attribute value • feature[@type="exon"]

  31. Functions Functions in XPath: • text() = matches the text value • node() = matches any node (= * or @* or text()) • name() = returns the name of the current tag

  32. Booleans • A boolean can only have two values: true or false • The following expressions can be evaluated: • or • and • =, != • <=, <, >=, >

  33. Example • Operations perform boolean tests on conditions • exon[not(position() = 1)] • transcript[not(exon)] • intron[position != last()] • exon[position > 2] • exon[position >= 3] • exon[position() = 1 or last()]

  34. Numbers • A number represents a floating-point number • The numeric operators convert their operands to numbers • Operators include: • +, -, *, div, mod • Since XML allows - in names, the - operator typically needs to be preceded by whitespace • Example: 5 mod 2 returns 1

  35. Strings • Strings consist of a sequence of zero or more character • A character is defined in the XML Recommendation

  36. Example • Strings can be tested for characters and substrings • <note>hello there</note> • note[contains(text(), "hello")] • <note><b>hello</b> there</note> • note[contains(., "hello")] • The '.' is current node, and will go through all children

  37. Example (2) • starts-with(string, pattern) • note[starts-with(., "hello")] • string(exp) • note[contains(string(2))] • string-after(string, terminator) • string-before(string, terminator) • substring(string, offset, length)

  38. Example (3) • normalize(string) • Removes trailing and leading whitespace • translate(string, source, replace) • translate(., ";+", ",") • concat(strings) • string-length(string)

  39. Core Function Library • XPath defines a core set of functions and operators • All implementations of Xpath must implement the core function library • Node Set Functions list/item[position() mod2 = 1]selects all odd number element of a list id)(“foo”)/child::para[position()=5]selects the 5th para child of the element with the unique ID foo • String Functions substring(“12345”, 0, 3) returns “12” • Boolean Functions boolean true() returns “true” • Number Functions number sum(node-set) returns the sum of the nodes

  40. Example for XPath Queries <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>

  41. Example summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…

  42. XPath 2.0 • Latest version: • http://www.w3.org/TR/xpath20/ • W3C Working Draft 22 August 2003 • Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages

  43. XPath 2.0 (2) • XPath 2.0 is a much more powerful language that operates on a much larger domain of data types • A better way of describing XPath 2.0 is as an expression language for processing sequences, with built-in support for querying XML documents • driving forces behind XPath 2.0 include not only the XPath 2.0 Requirements document but also many of the XML Query language requirements. • XPath 2.0 is a strict syntactic subset of XQuery 1.0

  44. XPath 2.0 (3) • XPath 2.0 introduces support for the XML Schema primitive types, which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc. • In addition, a number of functions and operators are provided for processing and constructing these different data types

  45. XPath 2.0 (4) • Everything is a sequence • sequences are ordered • In XPath 1.0, if you wanted to process a collection of nodes, you had to deal with node-sets. • In XPath 2.0, the concept of the node-set has been generalized and extended. • sequences may contain simple-typed values as well as nodes • “for” expression enables iteration over sequences

  46. XPath 2.0 (5) • sum(for $x in /order/item return $x/price * $x/quantity) • Conditional expression: • if ($widget1/unit-cost < $widget2/unit-cost) • then $widget1 • else $widget2 • Quantifiers: • some $x in /students/student/name satisfies $x = "Fred“ • every $x in /students/student/name satisfies $x = "Fred"

  47. XPath 2.0 (6) • Intersections, differences, unions: • The except operator to select all of a given node-set, except for certain nodes • @* except @exc:foo • the intersect operator • $x intersect /foo/bar

  48. Some Practice • Try XPath Visualizer. • You can download it from: http://www.vbxml.com/downloads/files/xpathvisualiserseptember.zip • It can help you with: • Learning and playing with XPath expressions. • Composing and visually verifying the exact XPath expression when designing an XSLT stylesheet. • Obtaining the quantitative characteristics of an xml document, counts, sums, arithmetical and relational results, strings, substrings, etc.

  49. Conclusion • XPath provides a concise and intuitive way to address into XML documents • Standard part of the XSLT and XPointer specifications • Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library

  50. References • http://www.w3.org/TR/xpath • http://www.w3.org/TR/xpath20/ • http://www.vbxml.com/xpathvisualizer/default.asp • http://www.xml.com/pub/a/2002/03/20/xpath2.html • XML in a Nutshell

More Related