Querying XML
From Suhrid.net Wiki
Intro
- Not as mature as querying relational databases
- No underlying algebra
- XPath : Path expressions and conditions
- XSLT : XPath + Transformations, output processing
- XQuery : XPath + full featured query language
- XLink, XPointer : Use XPath as a component
XPath
- Think of XML as a tree
- Expressions in XPath as navigations down/across the tree with conditions
- / - root element + separator, Element name, * is wildcard, @ for attribute
- // - any descendant of the current element including self
- condition in square bracket. [price < 50]. Also [] used as array access.
- Many built-in functions : e.g. contains(s1, s2) : true/false. name() : returns element tag name
- Navigation axes : e.g. parent, following-sibling, descendants
- XPath queries operate on and return sequence of elements for XML document & XML stream
- Sometimes result of XPath query can be expressed in XML, but not always
Sample queries
 
* doc("Bookstore.xml")/Bookstore/Book/Title - returns titles of all books
* doc("Bookstore.xml")/Bookstore/(Book | Magazine)/Title - titles of all books or magazines
* doc("Bookstore.xml")/Bookstore/*/Title - wildcard
* doc("Bookstore.xml")//Title - any Title element anywhere in the tree - Double slash
* doc("Bookstore.xml")//* - Will print the whole tree for the root, then subtree for the child etc
* doc("Bookstore.xml")/Bookstore/Book/data(@ISBN) - Data operator needs to be specified
* doc("Bookstore.xml")/Bookstore/Book[@Price < 90] - Condition, price < 90 : Will print the whole book
* doc("Bookstore.xml")/Bookstore/Book[@Price < 90]/Title - Above, but return only title.
* doc("Bookstore.xml")/Bookstore/Book[Remark]/Title - Existence condition, Book must have a remark element
* doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author/Last_Name = "Ullman" and Authors/Author/First_Name = "Jennifer" ]/Title  : Bigger condition. The second part is actually  a "there exists". So actually not doing an AND. 
* doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and First_Name = "Jennifer" ]/Title : This is the correct one.
* doc("Bookstore.xml")//Authors/Author[2] - Return the second author element of each Authors subelement
* doc("Bookstore.xml")/Book/[contains(Remark, "Great")]/Title : contains function
* doc("Bookstore.xml")//Magazine[Title=doc("Bookstore.xml")//Book/Title] : Self-join. Condition is satisified if there is SOME element that meets it. Implicit existential quantification.
* doc("Bookstore.xml")/Book//*[name(parent::*) != 'Bookstore' and name(parent::*) != 'Book'] : All elements whose parent element is not bookstore or book.  * after parent:: says match any tag of the parent.
* doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::*/Title] : All books and magazines that have a non-unique title.  Similarly, preceding-sibling.
* doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::Book/Title] : Instead of star in the axes, we specify an element.
* doc("Bookstore.xml")//Book[count(Authors/Author[contains(First_Name, 'J')]) = count(Authors/Author/First_Name)] - Universal quantification (for all). Every author's first_name equals J.
- doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and count(Authors/Author[First_Name = "Jennifer"] = 0] : Similar trick, simulating "and first_name != 'Jennifer'"
XQuery
- Xquery is an expression language also known as a compositional language.
- Like a relational algebra - expression on a type of data will be an answer in the same type of data.
- In relational model, type of data is relations. In XML, the type is "sequence of elements".
- Sequence can come from XML Document, XML Stream.
- XQuery uses XPath. Every XPath expression is an XQuery expression.
- Commonly used XQuery expression is the FLWOR expression :
For $var in expr Let $var := expr Where condition Order By expr Return expr
- Everything is optional except the return statement
- For and let clause can be repeated multiple times and interleaved.
- Possible to mix query language with hardcoded XML that we want in the result.
Xquery examples
- Variable b is bound to each of the Book elements in a loop.
for $b in doc("BookstoreQ.xml")/Bookstore/Book where $b/@Price < 90 and $b/Authors/Author/Last_Name = "Ullman" return $b/Title
- For clause is an iterator, let clause is an assignment. Find all price attr's in the DB and assign to plist variable as a list.
<Average>
  { let $plist := doc("BookstoreQ.xml")/Bookstore/Book/@Price
    return avg($plist) }
</Average>
- If we want something in the return block to be evaluated, then we need to put it in curly brackets - ${n}
for $n in distinct-values(doc("BookstoreQ.xml")//Last_Name)
return <Last_Name> {$n} </Last_Name>
- Existential quantification :
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where some $fn in $b/Authors/Author/First_Name
         satisfies contains($b/Title, $fn)
return <Book>
          { $b/Title }
          { $b/Authors/Author/First_Name }
       </Book>
- Universal quantification :
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where every $fn in $b/Authors/Author/First_Name
         satisfies contains($fn, "J")
return $b
- Self-join
for $b1 in doc("BookstoreQ.xml")/Bookstore/Book
for $b2 in doc("BookstoreQ.xml")/Bookstore/Book
where $b1/Authors/Author/Last_Name = $b2/Authors/Author/Last_Name <!-- EXISTENTIAL QUANTIFICATION -->
return
   <BookPair>
      <Title1> { data($b1/Title) } </Title1>
      <Title2> { data($b2/Title) } </Title2>
   </BookPair>
