XQuery – A Language for Querying XML Documents: XQuery – A Language for Querying XML Documents Werner Nutt
Requirements for an XML Query Language: Requirements for an XML Query Language David Maier, W3C XML Query Requirements:
Closedness: output must be XML
Composability: wherever a set of XML elements is required, a subquery is allowed as well
Support for key operations:
selection
extraction, projection
restructuring
combination, join
fusion of elements
Requirements for an XML Query Language (cntd.): Requirements for an XML Query Language (cntd.) Can benefit from a schema, but should also be applicable without
Retains the order of nodes
Formal semantics:
structure of results should be derivable from query
defines equivalence of queries
Queries should be representable in XML
documents can have embedded queries
How Does One Design a Query Language?: How Does One Design a Query Language? In most query languages, there are two aspects to a query:
Retrieving data (e.g., from … where … in SQL)
Creating output (e.g., select … in SQL)
Retrieval consists of
Pattern matching (e.g., from … )
Filtering (e.g., where … )
… although these cannot always be clearly distinguished
XQuery Principles: XQuery Principles Data Model identical with the XPath data model
documents are ordered, labeled trees
nodes have identity
nodes can have simple or complex types (defined in XML Schema)
Xquery can be used without schemas, but can be checked against DTDs and XML schemas
XQuery is a functional language
no statements
evaluation of expressions
A Query over the Recipes Document:
{for $r in doc("recipes.xml")//recipe
return $r/title}
returns
Beef Parmesan with Garlic Angel Hair Pasta
Ricotta Pie
…
A Query over the Recipes Document
Query Features:
{for $r in doc("recipes.xml")//recipe
return $r/title}
Query Features doc(String) returns input document Sequence of results, one for each variable binding
Features: Summary: Features: Summary The result is a new XML document
A query consists of parts that are returned as is
... and others that are evaluated (everything in {...} )
Calling the function doc(String) returns an input document
XPath is used to retrieve nodes sets and values
Iteration over node sets: let binds a variable to all nodes in a node set
Variables can be used in XPath expressions
return returns a sequence of results, one for each binding of a variable
XPath is a Fragement of XQuery: XPath is a Fragement of XQuery doc("recipes.xml")//recipe[1]/title
returns
Beef Parmesan with Garlic Angel Hair Pasta
doc("recipes.xml")//recipe[position()Beef Parmesan with Garlic Angel Hair Pasta,
Ricotta Pie,
Linguine Pescadoro
an element a list of elements
Beware: XPath Attributes: Beware: XPath Attributes doc("recipes.xml")//recipe[1]/ingredient[1] /@name
→ attribute name {"beef cube steak"}
string(doc("recipes.xml")//recipe[1] /ingredient[1]/@name)
→ "beef cube steak"
a constructor for an attribute node a value of type string
XPath Attributes (cntd.): XPath Attributes (cntd.) {string(doc("recipes.xml")//recipe[1] /ingredient[1]/@name)}
→ beef cube steak an element with string content
XPath Attributes (cntd.): XPath Attributes (cntd.) {doc("recipes.xml")//recipe[1] /ingredient[1]/@name}
→
an element with an attribute Note: The XML that we write down is only the surface structure of the data model that is underlying XQuery
XPath Attributes (cntd.): XPath Attributes (cntd.) Beef
→
Beef
Iteration with the For-Clause: Iteration with the For-Clause Syntax: for $var in xpath-expr
Example: for $r in doc("recipes.xml")//recipe return string($r)
The expression creates a list of bindings for a variable $var
If $var occurs in an expression exp, then exp is evaluated for each binding
For-clauses can be nested: for $r in doc("recipes.xml")//recipe for $v in doc("vegetables.xml")//vegetable return ...
Nested For-clauses: Example: Nested For-clauses: Example
{for $r in doc("recipes.xml")//recipe
return
{for $i in $r//ingredient
return
{string($i/@name)}
}
}
Returns my-recipes with titles as attributes and my-ingredients
with names as text content
The Let Clause: The Let Clause Syntax: let $var := xpath-expr
binds variable $var to a list of nodes, with the nodes in document order
does not iterate over the list
allows one to keep intermediate results for reuse (not possible in SQL)
Example:
let $ooreps := doc("recipes.xml")//recipe [.//ingredient/@name="olive oil"]
Let Clause: Example: Let Clause: Example {let $ooreps := doc("recipes.xml")//recipe [.//ingredient/@name="olive oil"] for $r in $ooreps return {$r/title/text()} {": "} {string($r/nutrition/@calories)} }
Calories of recipes
with olive oil Note the implicit
string concatenation
Let Clause: Example (cntd.): Let Clause: Example (cntd.) The query returns:
Beef Parmesan: 1167
Linguine Pescadoro: 532
The Where Clause: The Where Clause Syntax: where
occurs before return clause
similar to predicates in XPath
comparisons on nodes:
"=" for node equality
">" for document order
Example:
for $r in doc("recipes.xml")//recipe
where $r//ingredient/@name="olive oil"
return ...
Quantifiers: Quantifiers Syntax: some/every $var in satisfies
$var is bound to all nodes in
Test succeeds if is true for some/every binding
Note: if is empty, then “some” is false and “all” is true
Quantifiers (Example): Quantifiers (Example) Recipes that have some compound ingredient
Recipes where every ingredient is non-compound
for $r in doc("recipes.xml")//recipe
where some $i in $r/ingredient
satisfies $i/ingredient
Return $r/title for $r in doc("recipes.xml")//recipe
where every $i in $r/ingredient
satisfies not($i/ingredient)
Return $r/title
Element Fusion: Element Fusion “To every recipe, add the attribute calories!”
{let $rs := doc("recipes.xml")//recipe
for $r in $rs return
{$r/nutrition/@calories}
{$r/title}
}
Element Fusion (cntd.): Element Fusion (cntd.) The query result:
Beef Parmesan with Garlic Angel Hair Pasta
Ricotta Pie
Linguine Pescadoro
Zuppa Inglese
Cailles en Sarcophages
Join: Join “Pair every ingredient with the recipes where it is used!”
let $rs := doc("recipes.xml")//recipe
for $i in $rs//ingredient
for $r in $rs
where $r//ingredient/@name=$i/@name
return
{$i/@name}
{$r/title}
Join (cntd.): Join (cntd.) The query result:
Beef Parmesan with Garlic Angel Hair Pasta
,
Beef Parmesan with Garlic Angel Hair Pasta
,
Beef Parmesan with Garlic Angel Hair Pasta
,
Restructuring: “For every ingredient, return all the recipes where it is used!”
{let $rs := doc("recipes.xml")//recipe
for $i in $rs//ingredient
return
{$i/@*}
{$rs[.//ingredient/@name=$i/@name]/title}
}
Restructuring
Restructuring (cntd.): Restructuring (cntd.) The query result:
Zuppa Inglese
…
Beef Parmesan with Garlic Angel Hair Pasta Linguine Pescadoro
…
Eliminating Duplicates : Eliminating Duplicates The function distinct-values(Node Set)
extracts the values of a sequence of nodes
creates a duplicate free sequence of values
Note the coercion: nodes are cast as values!
Example:
let $rs := doc("recipes.xml")//recipe return distinct-values($rs//ingredient/@name)
yields
xdt:untypedAtomic("beef cube steak"),
xdt:untypedAtomic("onion, sliced into thin rings"),
...
Avoiding Multiple Results in a Join: Avoiding Multiple Results in a Join We want that every ingredient is listed only once:
Eliminate duplicates using distinct-values!
{let $rs := doc("recipes.xml")//recipe for $in in distinct-values( $rs//ingredient/@name) return {$rs[.//ingredient/@name=$in]/title} }
Avoiding Multiple Results (cntd.): Avoiding Multiple Results (cntd.) The query result:
Beef Parmesan with Garlic Angel Hair Pasta
Beef Parmesan with Garlic Angel Hair Pasta ... Linguine Pescadoro Cailles en Sarcophages
...
The Order By Clause: The Order By Clause Syntax: order by expr [ ascending | descending ]
for $iname in doc("recipes.xml")//@name order by $iname descending return string($iname)
yields
"whole peppercorns", "whole baby clams", "white sugar", ...
The Order By Clause (cntd.): The Order By Clause (cntd.) The interpreter must be told whether the values should be regarded as numbers or as strings (alphanumerical sorting is default)
for $r in $rs order by number($r/nutrition/@calories) return $r/title
Note:
The query returns titles ...
but the ordering is according to calories, which do not appear in the output
Not possible in SQL!
FLWOR Expresssions (pronounced “flower”): FLWOR Expresssions (pronounced “flower”) We have now seen the main ingredients of XQuery:
For and Let clauses, which can be mixed
a Where clause imposing conditions
an Order by clause, which determines the order of results
a Return clause, which constructs the output.
Combination these yields FLWOR expressions.
Conditionals: Conditionals if (expr) then expr else expr
Example
let $is := doc("recipes.xml")//ingredient for $i in $is[not(ingredient)] let $u := if (not($i/@unit)) then attribute {"unit"} {"pieces"}
else ()
creates an attribute unit="pieces" if none exists and an empty nodelist otherwise
Conditionals (cntd.): We use the conditional to construct variants of ingredients:
let $is := doc("recipes.xml")//ingredient for $i in $is[not(ingredient)] let $u := if (not($i/@unit)) then attribute {"unit"} {"pieces"} else () return {$i/@* | $u}
Conditionals (cntd.)
Conditionals (cntd.): Conditionals (cntd.) The query result:
,
...
, …
Grouping and Aggregation: Grouping and Aggregation Aggregation functions count, sum, avg, min, max
Example: The number of simple ingredients per recipe
for $r in doc("recipes.xml")//recipe return {attribute {"title"} {$r/title/text()}} {count($r//ingredient[not(ingredient)])}
Grouping and Aggregation (cntd.): Grouping and Aggregation (cntd.) The query result:
11,
12,
15,
8,
30
Nested Aggregation: Nested Aggregation “The recipe with the maximal number of calories!” let $rs := doc("recipes.xml")//recipe let $maxCal := max($rs//@calories) for $r in $rs where $r//@calories = $maxCal return string($r/title)
returns
"Cailles en Sarcophages"
Running Queries with Galax: Running Queries with Galax Galax is an open-source implementation of XQuery (http://www.galaxquery.org/)
The main developers have taken part in the definition of XQuery
Galax has is installed on the Linux machines of the department. To use it, you have to adjust your paths
If you run a c-shell, add to .cshrc
setenv GALAXHOME /usr/local/galax
setenv PATH /usr/local/galax/bin:${PATH}
If you run a bash shell, add to your .profile
export GALAXHOME /usr/local/galax
export PATH=$PATH:$GALAXHOME/bin
Running Queries with Galax (cntd.): Running Queries with Galax (cntd.) Write your query in a file .xq
Call galax-run .xq in your shell
The answer will be returned in the shell
… or an error message
More info on Galax can be found in the manual on the Galax website
Exercises: Exercises Write queries that produce
A list, containing for every recipe the recipe's title element and an element with the number of calories
The same, ordered according to calories
The same, alphabetically ordered according to title
The same, ordered according to the fat content
The same, with title as attribute and calories as content.
A list, containing for every recipe the top level ingredients, dropping the lower level ingredients
Sample Solution: Sample Solution {for $r in doc("recipes.xml")//recipe return {attribute {"title"} {$r/title} {for $i in $r/ingredient return if (not($i/ingredient)) then $i else {$i/@*} } }
More Exercises: More Exercises The file pods98.xml contains a list of all the papers published at the database conference PODS’98. It follows the DTD:
More Exercises (cntd.): More Exercises (cntd.) This is the beginning of the document:
17. PODS 1998: Seattle, Washington Ronald Fagin Fuzzy Queries in Multimedia Database Systems 1 10 Frank Neven Jan Van den Bussche Expressiveness of Structured Document Query Languages Based on Attribute Grammars 11 17
…
More Exercises (cntd.): More Exercises (cntd.) Write queries in XQuery to produce:
For each paper title the number of authors
The average number of authors per paper
For each author, the list of papers to which they contributed
A list of authors
ordered according to
the number of papers they have at the conference
and alphabetically if the have the same number of papers
with author name, paper titles and the total number of pages their papers occupy in the proceedings