XML2

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Query Languages for XML: 

Query Languages for XML

Slide2: 

Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL http://www.w3.org/TR/NOTE-xml-ql http://db.cis.upenn.edu/XML-QL/ XPATH (part of a query language) http:www.w3.org/TR/xpath XSLT http://www.w3.org/TR/xslt http://www.mulberrytech.com/quickref/XSLTquickref.pdf QUILT http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html http://db.cis.upenn.edu/Kweelt/

XML-QL (XML Query Language): 

XML-QL (XML Query Language) W3C proposal, August 1998 authors: Mary Fernandez AT&T Dana Florescu INRIA Alon Levy Univ. of Washington Dan Suciu AT&T Alin Deutsch Univ. of Pennsylvania

Address Book Revisited: 

Address Book Revisited <addrBook> <person SSN=“111-22-3333”> <name> Caesar </name> <greet> Caesar Imperator</greet> <addr> The Capitol </addr> <addr> Rome, OH 98765 </addr> <tel> (321) 786 2543 </tel> <fax> (321) 786 2543 <fax> <tel> (321) 786 2543 </tel> <email> jc@forum.rome.org </email> </person> </addrBook>

XML-QL: Pattern Matching: 

XML-QL: Pattern Matching Find Caesar’s e-mail address: where <addrBook> <person> <name>Caesar</name> <email>$e</email> </person> </addrBook> in “http://db.cis.upenn.edu/~peter/address.xml” construct $e <XML>jc@forum.rome.org</XML> Data Extraction

XML-QL: Constructing New XML Data: 

XML-QL: Constructing New XML Data Whom can we contact electronically? where <addrBook> <person> <greet>$g</greet> <email>$e</email> </person> </addrBook> in “http://...” construct <e-contact> <who>$g</who> <where>$e</where> </e-contact> <XML> <e-contact> <who>Caesar Imperator</who> <where>jc@forum.rome.org </where> </e-contact> <e-contact> <who>Brutus</who> <where>mb@philippi.com </where> </e-contact> ... </XML> Data Restructuring

XML-QL: Joins: 

XML-QL: Joins Who of our contacts was involved in a movie? where <addrBook> <person> <greet>$g</greet> <email>$e</email> </person> </addrBook> in “http://…address.xml” <movie><title>$t</> <character>$g</> </movie> in “http://www.imdb.com” construct <cine-contact> <who>$g</who> <movie>$t</movie> <where>$e</where> </cine-contact>

XML-QL: Joins (cont’d): 

XML-QL: Joins (cont’d) <XML> <cine-contact> <who>Caesar Imperator</who> <where>jc@forum.rome.org</where> <movie>Asterix and Cleopatra</movie> </cine-contact> <cine-contact> <who>Dr. Strangelove</who> <where>strangelov@love.the.bomb</where> <movie>Dr. Strangelove or How I Stopped ...</movie> </cine-contact> ... </XML> Data Integration

XML-QL Data Model: 

XML-QL Data Model Directed, labeled graph Tags represented as edge labels Sets of attribute name-value pairs as node labels Two models: ordered and unordered

XML-QL Data Model (cont’d): 

XML-QL Data Model (cont’d) <person SSN=“111-22-3333”> <name> Caesar </name> <greet> Caesar Imperator </greet> <addr> The Capitol </addr> <addr> Rome, OH 98765 </addr> <tel> (321) 786 2543 </tel> <fax> (321) 786 2543 <fax> <tel> (321) 786 2543 </tel> <email> jc@forum.rome.org </email> </person>

XML-QL Semantics: Variable Bindings: 

XML-QL Semantics: Variable Bindings

XML-QL Semantics: XML Output: 

XML-QL Semantics: XML Output construct <e-contact> <who>$n</who> <where>$e</where> </e-contact> XML e-contact e-contact who where who where Caesar jc@forum.rome.org Strangelove strangelov@love.the.bomb

Advanced XML-QL: 

Advanced XML-QL Find tags of person subelements: where <addrBook.person.$tag></> in “http://db.cis.upenn.edu/~peter/address.xml” construct <childOfPerson>$tag</> Find all email addresses and fax numbers : where <addrBook._*. (email | fax)>$eORf</> in “http://db.cis.upenn.edu/~peter/address.xml” construct <emailOrFax>$eORf</> Schema browsing

More Advanced XML-QL: 

More Advanced XML-QL Find attributes of person elements: where <_*.person $attrName=$attrVal></> in “http://db.cis.upenn.edu/~peter/address.xml” construct <personAttribute> <name>$attrName</> <value>$attrVal</> </> Schema browsing

XPath: 

XPath Reasonably widely adopted -- in XML-Schema and query languages. Neither more expressive nor less expressive than regular path expressions (can’t do (ab)* ) Primary goal = to permit to access some nodes from a given document XPath main construct : axis navigation An XPath path consists of one or more navigation steps, separated by / A navigation step is a triplet: axis + node-test + list of predicates Examples /descendant::node()/child::author /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2] XPath also offers some shortcuts no axis means child // º /descendant-or-self::node()/

XPath- child axis navigation: 

XPath- child axis navigation author is shorthand for child::author. Examples: aaa -- all the child nodes labeled aaa (1,3) aaa/bbb -- all the bbb grandchildren of aaa children (4) */bbb all the bbb grandchildren of any child (4,6) . -- the context node / -- the root node

XPath- child axis navigation (cont): 

XPath- child axis navigation (cont) /doc -- all the doc children of the root ./aaa -- all the aaa children of the context node (equivalent to aaa) text() -- all the text children of the context node node() -- all the children of the context node (includes text and attribute nodes) .. -- parent of the context node .// -- the context node and all its descendants // -- the root node and all its descendants //para -- all the para nodes in the document //text() -- all the text nodes in the document @font the font attribute node of the context node

Predicates: 

Predicates [2] -- the second child node of the context node chapter[5] -- the fifth chapter child of the context node [last()] -- the last child node of the context node chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”

Unions of Path Expressions: 

Unions of Path Expressions employee | consultant -- the union of the employee and consultant nodes that are children of the context node For some reason person/(employee|consultant) --as in regular path expressions -- is not allowed However person/node()[boolean(employee|consultant)] is allowed!! From the XPATH specification: The boolean function converts its argument to a boolean as follows: a number is true if and only if it is neither positive or negative zero nor NaN a node-set is true if and only if it is non-empty a string is true if and only if its length is non-zero an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type

Axis navigation: 

Axis navigation So far, nearly all our expressions have moved us down the by moving to child nodes. Exceptions were . -- stay where you are / go to the root // all descendants of the root .// all descendants of the context node All other expressions have been abbreviations for child::… e.g. child::para. child:is an example of an axis XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self Some of these (self, parent) describe single nodes, others describe sequences of nodes.

XPath Navigation Axes (merci, Arnaud Sahuguet): 

XPath Navigation Axes (merci, Arnaud Sahuguet) ancestor descendant following preceding following-sibling preceding-sibling child attribute namespace self

XPath abbreviated syntax: 

XPath abbreviated syntax (nothing) child:: @ attribute:: // /descendant-or-self::node() . self::node() .// descendant-or-self::node .. parent::node() / (document root)

Quilt: 

Quilt proposed by Chamberlin, Robbie and Florescu (from the authors’ slides) Leverage the most effective features of several existing and proposed query languages Design a small, clean, implementable language Cover the functionality required by all the XML Query use cases in a single language Write queries that fit on a slide Design a quilt, not a camel

Quilt = XPath + “comprehension” syntax: 

Quilt = XPath + “comprehension” syntax XML -QL Quilt

Examples of Quilt (from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt ): 

Examples of Quilt (from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt ) Relational data -- two DTDs: <?xml version="1.0" ?> <!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)> ]> <?xml version="1.0" ?> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)> ]>

The data: 

The data <items> <item_tuple> <itemno>1001</itemno> <description>Red Bicycle</description> <offered_by>U01</offered_by> <start_date>1999-01-05</start_date> <end_date>1999-01-20</end_date> <reserve_price>40</reserve_price> </item_tuple> <item_tuple> <itemno>1002</itemno> <description>Motorcycle</description> <offered_by>U02</offered_by> <start_date>1999-02-11</start_date> <end_date>1999-03-15</end_date> <reserve_price>500</reserve_price> </item_tuple> … </items> <bids> <bid_tuple> <userid>U02</userid> <itemno>1001</itemno> <bid>35</bid> <bid_date>99-01-07</bid_date> </bid_tuple> <bid_tuple> <userid>U04</userid> <itemno>1001</itemno> <bid>40</bid> <bid_date>99-01-08</bid_date> </bid_tuple> … </bids>

Query 1 : 

Query 1 FUNCTION date() { "1999-02-01" } <result> ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description </item_tuple> SORTBY (itemno) ) </result> XPath expressions in orange simple function definitions dates are formatted so that lexicographic ordering gives the right result

Output from Q1: 

Output from Q1 <?xml version="1.0" ?> <result> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> </item_tuple> </result>

Query Q2: 

Query Q2 For all bicycles, list the item number, description, and highest bid (if any), ordered by item number. <result> ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description , IF ($b) THEN <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid> ELSE "" </item_tuple> SORTBY (itemno) ) </result> use of variable in Xpath lots of coercion

Output from Q2: 

Output from Q2 <result> <item_tuple> <itemno> 1001 </itemno> <description> Red Bicycle </description> <high_bid> 55 </high_bid> </item_tuple> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> <high_bid> 20 </high_bid> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> <high_bid> 225 </high_bid> </item_tuple> <item_tuple> <itemno> 1008 </itemno> <description> Broken Bicycle </description> </item_tuple> </result>

Query Q3: 

Query Q3 Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000. <result> ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN <warning> <user_name>$u/name/text()</user_name>, <user_rating>$u/rating/text()</user_rating>, <item_description>$i/description/text()</item_description>, $i/reserve_price </warning> ) </result> Comparing sets with singletons Same rules as in XPath? In this case the DTD gives uniqueness

Conclusions: 

Conclusions XML is a data format for which there are an increasing number of useful tools for Constructing schemas Programming Querying Although it is likely that a query language will soon emerge as a standard, there is less agreement or understanding on how to store XML data efficiently. Many other database issues remain to make it useful for manipulating large amounts of data.