Difference between revisions of "Querying XML"
From Suhrid.net Wiki
Jump to navigationJump to search|  (Created page with "= Intro =  * Not as mature as querying relational databases * No underlying algebra * XPath : Path expressions and conditions * XSLT : XPath + Transformations, output processing ...") | |||
| (24 intermediate revisions by the same user not shown) | |||
| Line 12: | Line 12: | ||
| * Think of XML as a tree | * Think of XML as a tree | ||
| * Expressions in XPath as navigations down/across the tree with conditions | * Expressions in XPath as navigations down/across the tree with conditions | ||
| + | * / - root element + separator, Element name, * is wildcard, @ for attribute | ||
| + | * // - any descendant of the current element including self | ||
| + | * condition in square bracket. [price < 50]. Also [] used as array access. | ||
| + | * Many built-in functions : e.g. contains(s1, s2) : true/false. name() : returns element tag name | ||
| + | * Navigation axes : e.g. parent, following-sibling, descendants  | ||
| + | * XPath queries operate on and return sequence of elements for XML document & XML stream | ||
| + | * Sometimes result of XPath query can be expressed in XML, but not always | ||
| + | |||
| + | == Sample queries == | ||
| + | |||
| + | <syntaxhighlight lang="xml">  | ||
| + | doc("Bookstore.xml")/Bookstore/Book/Title - returns titles of all books | ||
| + | doc("Bookstore.xml")/Bookstore/(Book | Magazine)/Title - titles of all books or magazines | ||
| + | doc("Bookstore.xml")/Bookstore/*/Title - wildcard | ||
| + | doc("Bookstore.xml")//Title - any Title element anywhere in the tree - Double slash | ||
| + | doc("Bookstore.xml")//* - Will print the whole tree for the root, then subtree for the child etc | ||
| + | doc("Bookstore.xml")/Bookstore/Book/data(@ISBN) - Data operator needs to be specified | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | doc("Bookstore.xml")/Bookstore/Book[@Price < 90] - Condition, price < 90 : Will print the whole book | ||
| + | doc("Bookstore.xml")/Bookstore/Book[@Price < 90]/Title - Above, but return only title. | ||
| + | doc("Bookstore.xml")/Bookstore/Book[Remark]/Title - Existence condition, Book must have a remark element | ||
| + | doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author/Last_Name = "Ullman" and Authors/Author/First_Name = "Jennifer" ]/Title  : Bigger condition. The second part is actually  a "there exists". So actually not doing an AND.  | ||
| + | doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and First_Name = "Jennifer" ]/Title : This is the correct one. | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | doc("Bookstore.xml")//Authors/Author[2] - Return the second author element of each Authors subelement | ||
| + | doc("Bookstore.xml")/Book/[contains(Remark, "Great")]/Title : contains function | ||
| + | doc("Bookstore.xml")//Magazine[Title=doc("Bookstore.xml")//Book/Title] : Self-join. Condition is satisified if there is SOME element that meets it. Implicit existential quantification. | ||
| + | doc("Bookstore.xml")/Book//*[name(parent::*) != 'Bookstore' and name(parent::*) != 'Book'] : All elements whose parent element is not bookstore or book.  * after parent:: says match any tag of the parent. | ||
| + | doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::*/Title] : All books and magazines that have a non-unique title.  Similarly, preceding-sibling. | ||
| + | doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::Book/Title] : Instead of star in the axes, we specify an element. | ||
| + | doc("Bookstore.xml")//Book[count(Authors/Author[contains(First_Name, 'J')]) = count(Authors/Author/First_Name)] - Universal quantification (for all). Every author's first_name equals J. | ||
| + | doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and count(Authors/Author[First_Name = "Jennifer"] = 0] : Similar trick, simulating "and first_name != 'Jennifer'" | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | = XQuery = | ||
| + | |||
| + | * Xquery is an expression language also known as a compositional language. | ||
| + | * Like a relational algebra - expression on a type of data will be an answer in the same type of data. | ||
| + | * In relational model, type of data is relations. In XML, the type is "sequence of elements". | ||
| + | * Sequence can come from XML Document, XML Stream. | ||
| + | * XQuery uses XPath. Every XPath expression is an XQuery expression.  | ||
| + | * Commonly used XQuery expression is the FLWOR expression : | ||
| + | |||
| + | For $var in expr Let $var := expr Where condition Order By expr Return expr | ||
| + | |||
| + | * Everything is optional except the return statement | ||
| + | * For and let clause can be repeated multiple times and interleaved. | ||
| + | * Possible to mix query language with hardcoded XML that we want in the result. | ||
| + | |||
| + | == Xquery examples == | ||
| + | |||
| + | * Variable b is bound to each of the Book elements in a loop. | ||
| + | <syntaxhighlight lang="xml"> | ||
| + | for $b in doc("BookstoreQ.xml")/Bookstore/Book where $b/@Price < 90 and $b/Authors/Author/Last_Name = "Ullman" return $b/Title  | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | * For clause is an iterator, let clause is an assignment. Find all price attr's in the DB and assign to plist variable as a list. | ||
| + | <syntaxhighlight lang="xml"> | ||
| + | <Average> | ||
| + |   { let $plist := doc("BookstoreQ.xml")/Bookstore/Book/@Price | ||
| + |     return avg($plist) } | ||
| + | </Average> | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | * If we want something in the return block to be evaluated, then we need to put it in curly brackets - ${n} | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | for $n in distinct-values(doc("BookstoreQ.xml")//Last_Name) | ||
| + | return <Last_Name> {$n} </Last_Name> | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Existential quantification : | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | for $b in doc("BookstoreQ.xml")/Bookstore/Book | ||
| + | where some $fn in $b/Authors/Author/First_Name | ||
| + |          satisfies contains($b/Title, $fn) | ||
| + | return <Book> | ||
| + |           { $b/Title } | ||
| + |           { $b/Authors/Author/First_Name } | ||
| + |        </Book> | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Universal quantification : | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | for $b in doc("BookstoreQ.xml")/Bookstore/Book | ||
| + | where every $fn in $b/Authors/Author/First_Name | ||
| + |          satisfies contains($fn, "J") | ||
| + | return $b | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Self-join | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | for $b1 in doc("BookstoreQ.xml")/Bookstore/Book | ||
| + | for $b2 in doc("BookstoreQ.xml")/Bookstore/Book | ||
| + | where $b1/Authors/Author/Last_Name = $b2/Authors/Author/Last_Name <!-- EXISTENTIAL QUANTIFICATION --> | ||
| + | return | ||
| + |    <BookPair> | ||
| + |       <Title1> { data($b1/Title) } </Title1> | ||
| + |       <Title2> { data($b2/Title) } </Title2> | ||
| + |    </BookPair> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | = XSLT = | ||
| + | |||
| + | * First XSL (Extensible Stylesheet Language) was introduced and was extended to included Transformations : XSLT. | ||
| + | * XSLT processor takes the input XML doc and an XSLT specification (which is specified in XML) and transforms the data into an XML document or String. This is similar to DB query processing where we input data & query to the query processor to get the result.  | ||
| + | * However, the query paradigm is quite different - based on the notion of transforming the data. Transformation occurs with rules. | ||
| + | * Matching a template and replacing it - Write an expression that find portion of the XML tree based on template matching. Then, replace subtree with what is in the template. The template is specified using XPath. | ||
| + | * Templates can be recursively matched | ||
| + | * Extract values, for-each, conditionals | ||
| + | |||
| + | == Queries == | ||
| + | |||
| + | * Returns only BookTitle & MagazineTitle elements in the output XML.  | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="Book"> | ||
| + |   <BookTitle> <xsl:value-of select="Title" /> </BookTitle> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="Magazine"> | ||
| + |   <MagazineTitle> <xsl:value-of select="Title" /> </MagazineTitle> | ||
| + | </xsl:template> | ||
| + | |||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Copies all books - "." is current element. i.e. retains them in the output XML. | ||
| + | * Second template - replaces the text() elements with nothing. This is done because XSLT returns leaf-values (text) for elements that are NOT matched by the condition.  | ||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="Book[@Price < 90]"> | ||
| + |    <xsl:copy-of select="." /> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="text()" /> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | * When two templates match, the later template gets priority. However, when one template is more specific than the other, the specific template gets the higher priority.  | ||
| + | |||
| + | * Recursively apply templates : to copy the entire document | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="*|@*|text()"> | ||
| + |    <xsl:copy> | ||
| + |       <xsl:apply-templates select="*|@*|text()" /> | ||
| + |    </xsl:copy> | ||
| + | </xsl:template> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Use the above template & add exception templates to make specific changes | ||
| + | * e.g. ISBN attribute is converted to an ISBN element, Author element is converted to an author attribute | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="*|@*|text()"> | ||
| + |    <xsl:copy> | ||
| + |       <xsl:apply-templates select="*|@*|text()" /> | ||
| + |    </xsl:copy> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="@ISBN"> | ||
| + |    <ISBN><xsl:value-of select="." /></ISBN> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="@Price"> | ||
| + |    <Price><xsl:value-of select="." /></Price> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="@Edition"> | ||
| + |    <Edition><xsl:value-of select="." /></Edition> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="@Month"> | ||
| + |    <Month><xsl:value-of select="." /></Month> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="@Year"> | ||
| + |    <Year><xsl:value-of select="." /></Year> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="Author"> | ||
| + |   <Author LN="{Last_Name}" FN="{First_Name}" /> | ||
| + | </xsl:template> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Translating XML to HTML  | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="/"> | ||
| + |    <html> | ||
| + |    <table border="1"> | ||
| + |       <th>Book</th> | ||
| + |       <th>Cost</th> | ||
| + |       <xsl:for-each select="Bookstore/Book"> | ||
| + |       <xsl:sort select="@Price" /> | ||
| + |          <xsl:if test="@Price < 90"> | ||
| + |             <tr> | ||
| + |             <td><i><xsl:value-of select="Title" /></i></td> | ||
| + |             <td><xsl:value-of select="@Price" /></td> | ||
| + |             </tr> | ||
| + |          </xsl:if> | ||
| + |       </xsl:for-each> | ||
| + |    </table> | ||
| + |    </html> | ||
| + | </xsl:template> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Take out First_Name elements that have Jennifer and replace Last_Name Widom elements with Name elements instead | ||
| + | * Cant use First_Name = "Jennifer", since then . will represent parent and the parent will be removed, so have to use the data function instead.  | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="*|@*|text()"> | ||
| + |    <xsl:copy> | ||
| + |       <xsl:apply-templates select="*|@*|text()" /> | ||
| + |    </xsl:copy> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="First_Name[data(.) = 'Jennifer']"> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="Last_Name[data(.) = 'Widom']"> | ||
| + |   <Name>Ms. Widom</Name> | ||
| + | </xsl:template> | ||
| + | |||
| + | |||
| + | </syntaxhighlight> | ||
| + | |||
| + | * Above two templates can be replaced by a single template : | ||
| + | |||
| + | <syntaxhighlight lang="xml"> | ||
| + | |||
| + | <xsl:template match="*|@*|text()"> | ||
| + |    <xsl:copy> | ||
| + |       <xsl:apply-templates select="*|@*|text()" /> | ||
| + |    </xsl:copy> | ||
| + | </xsl:template> | ||
| + | |||
| + | <xsl:template match="Author[First_Name[data(.) = 'Jennifer']]"> | ||
| + |   <Author> | ||
| + |     <Name>Ms. Widom</Name> | ||
| + |   </Author> | ||
| + | </xsl:template> | ||
| + | |||
| + | </syntaxhighlight> | ||
Latest revision as of 23:48, 14 February 2014
Intro
- Not as mature as querying relational databases
- No underlying algebra
- XPath : Path expressions and conditions
- XSLT : XPath + Transformations, output processing
- XQuery : XPath + full featured query language
- XLink, XPointer : Use XPath as a component
XPath
- Think of XML as a tree
- Expressions in XPath as navigations down/across the tree with conditions
- / - root element + separator, Element name, * is wildcard, @ for attribute
- // - any descendant of the current element including self
- condition in square bracket. [price < 50]. Also [] used as array access.
- Many built-in functions : e.g. contains(s1, s2) : true/false. name() : returns element tag name
- Navigation axes : e.g. parent, following-sibling, descendants
- XPath queries operate on and return sequence of elements for XML document & XML stream
- Sometimes result of XPath query can be expressed in XML, but not always
Sample queries
 
doc("Bookstore.xml")/Bookstore/Book/Title - returns titles of all books
doc("Bookstore.xml")/Bookstore/(Book | Magazine)/Title - titles of all books or magazines
doc("Bookstore.xml")/Bookstore/*/Title - wildcard
doc("Bookstore.xml")//Title - any Title element anywhere in the tree - Double slash
doc("Bookstore.xml")//* - Will print the whole tree for the root, then subtree for the child etc
doc("Bookstore.xml")/Bookstore/Book/data(@ISBN) - Data operator needs to be specified
doc("Bookstore.xml")/Bookstore/Book[@Price < 90] - Condition, price < 90 : Will print the whole book
doc("Bookstore.xml")/Bookstore/Book[@Price < 90]/Title - Above, but return only title.
doc("Bookstore.xml")/Bookstore/Book[Remark]/Title - Existence condition, Book must have a remark element
doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author/Last_Name = "Ullman" and Authors/Author/First_Name = "Jennifer" ]/Title  : Bigger condition. The second part is actually  a "there exists". So actually not doing an AND. 
doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and First_Name = "Jennifer" ]/Title : This is the correct one.
doc("Bookstore.xml")//Authors/Author[2] - Return the second author element of each Authors subelement
doc("Bookstore.xml")/Book/[contains(Remark, "Great")]/Title : contains function
doc("Bookstore.xml")//Magazine[Title=doc("Bookstore.xml")//Book/Title] : Self-join. Condition is satisified if there is SOME element that meets it. Implicit existential quantification.
doc("Bookstore.xml")/Book//*[name(parent::*) != 'Bookstore' and name(parent::*) != 'Book'] : All elements whose parent element is not bookstore or book.  * after parent:: says match any tag of the parent.
doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::*/Title] : All books and magazines that have a non-unique title.  Similarly, preceding-sibling.
doc("Bookstore.xml")/Bookstore/(Book | Magazine)[Title = following-sibling::Book/Title] : Instead of star in the axes, we specify an element.
doc("Bookstore.xml")//Book[count(Authors/Author[contains(First_Name, 'J')]) = count(Authors/Author/First_Name)] - Universal quantification (for all). Every author's first_name equals J.
doc("Bookstore.xml")/Bookstore/Book[@Price < 90 and Authors/Author[Last_Name = "Ullman" and count(Authors/Author[First_Name = "Jennifer"] = 0] : Similar trick, simulating "and first_name != 'Jennifer'"
XQuery
- Xquery is an expression language also known as a compositional language.
- Like a relational algebra - expression on a type of data will be an answer in the same type of data.
- In relational model, type of data is relations. In XML, the type is "sequence of elements".
- Sequence can come from XML Document, XML Stream.
- XQuery uses XPath. Every XPath expression is an XQuery expression.
- Commonly used XQuery expression is the FLWOR expression :
For $var in expr Let $var := expr Where condition Order By expr Return expr
- Everything is optional except the return statement
- For and let clause can be repeated multiple times and interleaved.
- Possible to mix query language with hardcoded XML that we want in the result.
Xquery examples
- Variable b is bound to each of the Book elements in a loop.
for $b in doc("BookstoreQ.xml")/Bookstore/Book where $b/@Price < 90 and $b/Authors/Author/Last_Name = "Ullman" return $b/Title
- For clause is an iterator, let clause is an assignment. Find all price attr's in the DB and assign to plist variable as a list.
<Average>
  { let $plist := doc("BookstoreQ.xml")/Bookstore/Book/@Price
    return avg($plist) }
</Average>
- If we want something in the return block to be evaluated, then we need to put it in curly brackets - ${n}
for $n in distinct-values(doc("BookstoreQ.xml")//Last_Name)
return <Last_Name> {$n} </Last_Name>
- Existential quantification :
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where some $fn in $b/Authors/Author/First_Name
         satisfies contains($b/Title, $fn)
return <Book>
          { $b/Title }
          { $b/Authors/Author/First_Name }
       </Book>
- Universal quantification :
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where every $fn in $b/Authors/Author/First_Name
         satisfies contains($fn, "J")
return $b
- Self-join
for $b1 in doc("BookstoreQ.xml")/Bookstore/Book
for $b2 in doc("BookstoreQ.xml")/Bookstore/Book
where $b1/Authors/Author/Last_Name = $b2/Authors/Author/Last_Name <!-- EXISTENTIAL QUANTIFICATION -->
return
   <BookPair>
      <Title1> { data($b1/Title) } </Title1>
      <Title2> { data($b2/Title) } </Title2>
   </BookPair>
XSLT
- First XSL (Extensible Stylesheet Language) was introduced and was extended to included Transformations : XSLT.
- XSLT processor takes the input XML doc and an XSLT specification (which is specified in XML) and transforms the data into an XML document or String. This is similar to DB query processing where we input data & query to the query processor to get the result.
- However, the query paradigm is quite different - based on the notion of transforming the data. Transformation occurs with rules.
- Matching a template and replacing it - Write an expression that find portion of the XML tree based on template matching. Then, replace subtree with what is in the template. The template is specified using XPath.
- Templates can be recursively matched
- Extract values, for-each, conditionals
Queries
- Returns only BookTitle & MagazineTitle elements in the output XML.
<xsl:template match="Book">
  <BookTitle> <xsl:value-of select="Title" /> </BookTitle>
</xsl:template>
<xsl:template match="Magazine">
  <MagazineTitle> <xsl:value-of select="Title" /> </MagazineTitle>
</xsl:template>
- Copies all books - "." is current element. i.e. retains them in the output XML.
- Second template - replaces the text() elements with nothing. This is done because XSLT returns leaf-values (text) for elements that are NOT matched by the condition.
<xsl:template match="Book[@Price < 90]">
   <xsl:copy-of select="." />
</xsl:template>
<xsl:template match="text()" />
- When two templates match, the later template gets priority. However, when one template is more specific than the other, the specific template gets the higher priority.
- Recursively apply templates : to copy the entire document
<xsl:template match="*|@*|text()">
   <xsl:copy>
      <xsl:apply-templates select="*|@*|text()" />
   </xsl:copy>
</xsl:template>
- Use the above template & add exception templates to make specific changes
- e.g. ISBN attribute is converted to an ISBN element, Author element is converted to an author attribute
<xsl:template match="*|@*|text()">
   <xsl:copy>
      <xsl:apply-templates select="*|@*|text()" />
   </xsl:copy>
</xsl:template>
<xsl:template match="@ISBN">
   <ISBN><xsl:value-of select="." /></ISBN>
</xsl:template>
<xsl:template match="@Price">
   <Price><xsl:value-of select="." /></Price>
</xsl:template>
<xsl:template match="@Edition">
   <Edition><xsl:value-of select="." /></Edition>
</xsl:template>
<xsl:template match="@Month">
   <Month><xsl:value-of select="." /></Month>
</xsl:template>
<xsl:template match="@Year">
   <Year><xsl:value-of select="." /></Year>
</xsl:template>
<xsl:template match="Author">
  <Author LN="{Last_Name}" FN="{First_Name}" />
</xsl:template>
- Translating XML to HTML
<xsl:template match="/">
   <html>
   <table border="1">
      <th>Book</th>
      <th>Cost</th>
      <xsl:for-each select="Bookstore/Book">
      <xsl:sort select="@Price" />
         <xsl:if test="@Price < 90">
            <tr>
            <td><i><xsl:value-of select="Title" /></i></td>
            <td><xsl:value-of select="@Price" /></td>
            </tr>
         </xsl:if>
      </xsl:for-each>
   </table>
   </html>
</xsl:template>
- Take out First_Name elements that have Jennifer and replace Last_Name Widom elements with Name elements instead
- Cant use First_Name = "Jennifer", since then . will represent parent and the parent will be removed, so have to use the data function instead.
<xsl:template match="*|@*|text()">
   <xsl:copy>
      <xsl:apply-templates select="*|@*|text()" />
   </xsl:copy>
</xsl:template>
<xsl:template match="First_Name[data(.) = 'Jennifer']">
</xsl:template>
<xsl:template match="Last_Name[data(.) = 'Widom']">
  <Name>Ms. Widom</Name>
</xsl:template>
- Above two templates can be replaced by a single template :
<xsl:template match="*|@*|text()">
   <xsl:copy>
      <xsl:apply-templates select="*|@*|text()" />
   </xsl:copy>
</xsl:template>
<xsl:template match="Author[First_Name[data(.) = 'Jennifer']]">
  <Author>
    <Name>Ms. Widom</Name>
  </Author>
</xsl:template>
