Querying XML

From Suhrid.net Wiki
Jump to navigationJump to search

Intro

  • Not as mature as querying relational databases
  • No underlying algebra
  • XPath : Path expressions and conditions
  • XSLT : XPath + Transformations, output processing
  • XQuery : XPath + full featured query language
  • XLink, XPointer : Use XPath as a component

XPath

  • Think of XML as a tree
  • Expressions in XPath as navigations down/across the tree with conditions
  • / - root element + separator, Element name, * is wildcard, @ for attribute
  • // - any descendant of the current element including self
  • condition in square bracket. [price < 50]. Also [] used as array access.
  • Many built-in functions : e.g. contains(s1, s2) : true/false. name() : returns element tag name
  • Navigation axes : e.g. parent, following-sibling, descendants
  • XPath queries operate on and return sequence of elements for XML document & XML stream
  • Sometimes result of XPath query can be expressed in XML, but not always

Sample queries

 
* doc("Bookstore.xml")/Bookstore/Book/Title - returns titles of all books
* doc("Bookstore.xml")/Bookstore/(Book | Magazine)/Title - titles of all books or magazines
* doc("Bookstore.xml")/Bookstore/*/Title - wildcard
* doc("Bookstore.xml")//Title - any Title element anywhere in the tree - Double slash
* doc("Bookstore.xml")//* - Will print the whole tree for the root, then subtree for the child etc
* doc("Bookstore.xml")/Bookstore/Book/data(@ISBN) - Data operator needs to be specified
  • doc("Bookstore.xml")/Bookstore/Book[@Price < 90] - Condition, price < 90 : Will print the whole book
  • doc("Bookstore.xml")/Bookstore/Book[@Price < 90]/Title - Above, but return only title.
  • doc("Bookstore.xml")/Bookstore/Book[Remark]/Title - Existence condition, Book must have a remark element
  • doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author/Last_Name = "Ullman" and Authors/Author/First_Name = "Jennifer" ]/Title : Bigger condition. The second part is actually a "there exists". So actually not doing an AND.
  • doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and First_Name = "Jennifer" ]/Title : This is the correct one.
  • doc("Bookstore.xml")//Authors/Author[2] - Return the second author element of each Authors subelement
  • doc("Bookstore.xml")/Book/[contains(Remark, "Great")]/Title : contains function
  • doc("Bookstore.xml")//Magazine[Title=doc("Bookstore.xml")//Book/Title] : Self-join. Condition is satisified if there is SOME element that meets it. Implicit existential quantification.
  • doc("Bookstore.xml")/Book//*[name(parent::*) != 'Bookstore' and name(parent::*) != 'Book'] : All elements whose parent element is not bookstore or book. * after parent:: says match any tag of the parent.
  • doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::*/Title] : All books and magazines that have a non-unique title. Similarly, preceding-sibling.
  • doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::Book/Title] : Instead of star in the axes, we specify an element.
  • doc("Bookstore.xml")//Book[count(Authors/Author[contains(First_Name, 'J')]) = count(Authors/Author/First_Name)] - Universal quantification (for all). Every author's first_name equals J.
  • doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and count(Authors/Author[First_Name = "Jennifer"] = 0] : Similar trick, simulating "and first_name != 'Jennifer'"

XQuery

  • Xquery is an expression language also known as a compositional language.
  • Like a relational algebra - expression on a type of data will be an answer in the same type of data.
  • In relational model, type of data is relations. In XML, the type is "sequence of elements".
  • Sequence can come from XML Document, XML Stream.
  • XQuery uses XPath. Every XPath expression is an XQuery expression.
  • Commonly used XQuery expression is the FLWOR expression :

For $var in expr Let $var := expr Where condition Order By expr Return expr

  • Everything is optional except the return statement
  • For and let clause can be repeated multiple times and interleaved.
  • Possible to mix query language with hardcoded XML that we want in the result.

Xquery examples

  • Variable b is bound to each of the Book elements in a loop.
for $b in doc("BookstoreQ.xml")/Bookstore/Book where $b/@Price < 90 and $b/Authors/Author/Last_Name = "Ullman" return $b/Title
  • For clause is an iterator, let clause is an assignment. Find all price attr's in the DB and assign to plist variable as a list.
<Average>
  { let $plist := doc("BookstoreQ.xml")/Bookstore/Book/@Price
    return avg($plist) }
</Average>
  • If we want something in the return block to be evaluated, then we need to put it in curly brackets - ${n}
for $n in distinct-values(doc("BookstoreQ.xml")//Last_Name)
return <Last_Name> {$n} </Last_Name>
  • Existential quantification :
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where some $fn in $b/Authors/Author/First_Name
         satisfies contains($b/Title, $fn)
return <Book>
          { $b/Title }
          { $b/Authors/Author/First_Name }
       </Book>
  • Universal quantification :
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where every $fn in $b/Authors/Author/First_Name
         satisfies contains($fn, "J")
return $b
  • Self-join
for $b1 in doc("BookstoreQ.xml")/Bookstore/Book
for $b2 in doc("BookstoreQ.xml")/Bookstore/Book
where $b1/Authors/Author/Last_Name = $b2/Authors/Author/Last_Name <!-- EXISTENTIAL QUANTIFICATION -->
return
   <BookPair>
      <Title1> { data($b1/Title) } </Title1>
      <Title2> { data($b2/Title) } </Title2>
   </BookPair>