XMLDB M6 2005

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Module 6 XML and RDBMSs (Shredding, SQL / XML, Products): 

Module 6 XML and RDBMSs (Shredding, SQL / XML, Products)

Why bother?: 

Why bother? Most (structured) data stored in RDBMS Seemless integration of XML and rel. Data Exploit capabilities of RDBMS Scalability, Availability, Performance, ... XML - RDBMS Applications Legal documents, decision support on XML When not to use RDBMS Streaming data (RSS, SOAP), IR (Google) -> Fine line, often a difficult decision!

XML-RDB Overview (Kernfach): 

XML-RDB Overview (Kernfach) Store XML as a BLOB in relational database index by materializing indexed expressions in separate columns Plus: store XML in parsed and validated form Minus: proprietary solution (blob is a black box) Minus: replicate data for indexing Model-driven Shredding (Florescu, Kossmann 99) Edge, binary approach + alternatives Corresponds to generic APIs for Java Plus: very general; integrates well with relational data Minus: poor performance Schema-based Shredding (Shanmugasundaram et al. 99) Map XML Schema / DTD to SQL DDL Plus: integrates well with relational data Minus:  missing tools, complicated  SQL / XML (latest product developments 2005) Extend SQL with XML data type Plus: integrates well with relational data Minus: not clear how it integrates with application, odd „marriage“

Overview: SQL / XML: 

Overview: SQL / XML Store XML Data in Relational Databases Publish relational data as XML Query XML data in RDBMS using XQuery Literature: C. Türker: SQL:1999&SQL:2003. Dpunkt, 2003. A. Chaudhuri, A. Rashid, R. Zicari: XML Data Management. Addison Wesley, 2003. A. Eisenberg, J. Melton: Advancements in SQL /XML. ACM SIGMOD Record, 2004.

History of SQL / XML: 

History of SQL / XML First edition part of SQL:2003 Part 14 of the SQL standard Pre-dates XQuery standard!!! Limited functionality - storage and publishing Second edition: work in progress More complete integration of XQuery + XQuery Data Model Advanced Query capabilities Expected to be published in 2006

XML Type in SQL: 

XML Type in SQL A new type (like varchar, date, numeric) SQL:2003 - XML type restricted to XML document or XML element or Sequence of XML elements SQL / XML, 2nd edition Full support of XQuery Data Model XML(SEQUENCE), XML(ANY CONTENT), ... N.B. XML in different rows is incomparable

Example (SQL:2003): 

Example (SQL:2003) create table books( title varchar(20), authors XML); No schema validation, no typing!

Publishing Rel. Data as XML: 

Publishing Rel. Data as XML SQL / XML provides ways to publish relational data as XML. RDBMS looks like an XML Database XML view on relational data Enables XML Data Integration Idea: Map SQL Types to XML Schema Types Encode special characters: e.g., „<“ -> „&lt;“ Provide functions to construct XML Data (XML data can be published 1:1)

Publishing Rel. Data as XML: 

Publishing Rel. Data as XML <Phantasy-People> <row> <Id>4711</Id> <Name>Wutz</Name> </row> <row> <Id>4711</Id> <Name>Wutz</Name> </row> </Phantasy-People> Phantasy-People

XML Schema of Publishing Data: 

XML Schema of Publishing Data <xsd:complexType name=„ROW.Phantasy-People“> <xsd:sequence> <xsd:element name=„Id“ type=„xsd:integer“/> <xsd:element name=„Name“ type=„varchar20“/> </xsd:sequence> </xsd:complexType> <xsd:complexType name=„TABLE.Phantasy-People“> <xsd:sequence> <xsd:element name=„row“ type=„ROW.Ph...“/> </xsd:sequence> </xsd:complexType> <xsd:element name=„Phantasy-People“ type=„TABLE.Pha...“/> create table Phantasy-People(Id integer, Name varchar(20));

XML Schema for CHAR(20): 

XML Schema for CHAR(20) <xsd:simpleType name=„char20“> <xsd:annotation> <xsd:appinfo> <sqlxml:sqltype kind=„PREDEFINED“ name=„CHAR“ length=„20“ characterSetName=„LATIN1“ collation=„DEUTSCH“/> </xsd:appinfo> <xsd:annotation> <xsd:restriction base=„xsd:string“> <xsd:length value=„20“/> </xsd:restriction> </xsd:simpleType>

Publishing Functions: 

Publishing Functions XMLGEN Generate an XML Document using Xquery XMLELEMENT, XMLATTRIBUTES, XMLFOREST, XMLCONCAT Simplified versions of XMLGEN for typical cases XMLAGG Aggregate XML Elements into a group

Example: XML View on Rel. Data: 

Example: XML View on Rel. Data Phantasy-People SELECT XMLGEN( <Person id = „{$Id}“> {$Name} </Person>) as Person FROM Phantasy-People

Example: XML View on XML Data: 

Example: XML View on XML Data SELECT Title, XMLGEN(<pa>{$Authors[1]/text()}</pa>) as PrimA FROM MyAuthors;

XMLAGG: 

XMLAGG SELECT Product, XMLAGG( XMLELEMENT(NAME „S“, Sales)) AS AllSales FROM SalesTable GROUP BY Product; SalesTable

SQL / XML: 2nd Edition: 

SQL / XML: 2nd Edition XML datatype will support XQuery data model XML(UNTYPED CONTENT) – old XML infoset model XML(SEQUENCE) – holds heterogeneous sequences ... (other parameterized types; validated data possible! Non well-formed XML data possible, too.) Full XML Schema support and validation XMLQuery() function create XML content using XQuery XMLTable() function Shred XML to rel. Data using Xquery Mapping between SQL & XQuery data model XMLCAST between XML and SQL types

XMLExists: 

XMLExists SELECT Title FROM books WHERE XMLEXISTS(Authors, //author = „et al.“); Explicit PASSING also possible (see XMLQuery)

VALID Predicate: 

VALID Predicate Check whether XML data validates to an XML data type. XML Schemas must be registered by admin Otherwise security problems xml-value-expression IS [NOT] VALID [identity-constraint-option] [validity-target] Identity constraings: deal with ID/IDREFs Validity target: Typically, URI of Schema

XMLQuery expression: 

XMLQuery expression SQL Expression – use in select for constructing XML select XMLQuery( ‘for $i in ./PurchaseOrder where $i/PoNo = $j/val return $i//Item ‘ passing p.pocol , xmlelement(“val”,2100) as “j” returning content) from purchaseorder p <Item itemno=“21”><Quantity>200</Quantity>..</Item> <Item itemno=“22”><Quantity>22</Quantity>..</Item> Pocol maps to default item XMLElement value maps to $j

XML Query: 

XML Query XMLQuery( Xquery-expression PASSING { BY REF | BY VALUE} (value-expression AS identifier [BY REF | BY VALUE])* RETURNING { CONTENT|SEQUENCE } { BY REF|BY VALUE} If PASSING value has no identifier, then that is context node BY REF - preserves Id (of an XML type) BY VALUE - creates a copy of the data

XMLCast: 

XMLCast XMLCAST (value-expression AS type) Cast XML (or SQL) value into one of the XML types (SEQUENCE, ANY CONTENT, ...)

XMLTable construct: 

XMLTable construct Used in FROM clause: translate XML into relational data Splits up result into SQL columns, passing always BY REF select items.pos, items.itemno, items.quantity from purchaseorder p, XMLTable(‘for $i in /PurchaseOrder//Items where $i/Quantity > 200 return $i’ passing p.pocol columns pos for ordinality, itemno number path ‘ItemNo’ quantity number DEFAULT 0 path ‘Quantity’ ) items; POS ITEMNO QUANTITY ------ ----------- ------------ 1 21 21 2 22 0 Default value is used If path does not return value Ordinality returns sequential position Relational columns returned in result

XMLTable (ctd.): 

XMLTable (ctd.) Can be chained (master-detail drill down) select items.pono, items.itemno, locs.city from purchaseorder p, XMLTable(‘for $i in /PurchaseOrder//Items where $i/Quantity > 200 return $i’ passing p.pocol columns pono number path ‘../PurchaseOrder/@PoNo’ itemno number path ‘ItemNo’ locs xmltype path ‘Locations’ ) items, XMLTable(‘for $i in /Location where $i/State = “CA” return $i’ passing items.locs columns city varchar2(20) path ‘City’ ) locs; PONO ITEMNO CITY ------- ----------- ------------ 100 21 Redwood City 100 21 Foster City 100 22 Berkeley 101 30 Los Angeles

Microsoft SQL Server 2005: 

Microsoft SQL Server 2005 XML Parser XML Validation XML datatype (binary XML) Schema Collection XML Relational XML Schemata OpenXML/nodes() FOR XML with TYPE directive Rowsets query() modify() Node Table PATH Index PROP Index VALUE Index PRIMARY XML INDEX query()

Microsoft: Indexing: 

Microsoft: Indexing Create XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc) Creates secondary indexes on tags, values, paths Speeds up queries Results can be served directly from index Entire query is optimized Same award winning cost based optimizer Indexes are used as available

Microsoft: Updates: 

Microsoft: Updates Insert, update, & delete a la XUpdate E.g.: Add a new section after Section 1 UPDATE docs SET xDoc.modify( 'insert <section num=''2''> <heading>Background</heading> </section> after (/doc/section[@num=1])[1]')

XQuery Support in Oracle: 

XQuery Support in Oracle XMLDB integrated database engine SQL / XML standard support Optimized queries – rewrite to relational Standalone Java query engine 100% Java Integrated into Oracle App Server -XDS Interoperates with XSLT/XPath First relational database to ship an XQuery implementation !

XQuery database support: 

XQuery database support Production in Oracle Database 10gr2 Supports XMLQuery and XMLTable construct Native compilation into SQL /XML structures Returns XMLType(Content) Can query over relational, O-R, XMLType data fn:doc - Maps to XDB Repository on server SQLPlus provides xquery command to execute XQuery XSL-T will also get compiled to XQuery

Architecture: 

Architecture XQuery XSL-T Parser Compiler Rewrite to SQLX Normalization XQuery Type check SQLX rewrite Execution engine XQueryX Compiled XQuery Tree SQL Metadata XMLSchema Repository Statically Type checked Tree Normalized Tree (casts, treat ) SQL/XML Operand Tree Relational Optimizer SQL Operand Tree XML Indexes, Text Indexes XQuery F&O Execution Structures S Q L XQUERY

Example SQL integration: 

Example SQL integration SQL tables (get all 200K+ employees) select XMLQuery(‘ for $i in ora:view(“SCOTT”,”EMP”)/ROW where $i/SALARY > 200000 return $i/EMPNO’ returning content) from dual; <EMPNO>2100</EMPNO> <EMPNO>344</EMPNO> rich employee

Example: XMLType Integration: 

Example: XMLType Integration XMLType tables as input select p.XQuery(‘<PO pono=“{./PoNo}”/>’) from purchaseorder p where p.XQuery(‘@ShipAddr/City=“Fresno”]‘) is not null; After rewrite select XMLElement(“PO”, XMLAttributes(p.xmldata.”PONo”)) from purchaseorder p where p.xmldata.”ShipAddr”.”City” = ‘Fresno’

Example: Repository Integration: 

Example: Repository Integration Query files from repository doc() queries files from repository (rewrite) collection() maps to directories select XQuery(‘ for $i in doc(“/public/foo.xml”) return $i’) from dual; <FOO> ....</FOO>

Java Layer Architecture: 

DB Adapter Java Execution engine Java Layer Architecture XQJ API Dr i ver XQuery XQueryX XPath XQJ API (Java) conn = datasrc.getConnection(); expr = conn.prepareExpression( ‘for $i in doc(“xxx”) return $i’); res = expr.executeQuery(); while (res.next()) { … } Push down query select * from XMLTable(‘for $i in doc(“xx”) return $i); User XMLDB SQL + XQuery or XQueryX XQuery aware SQL engine Rewrite to SQLX Normalization XQuery Type check SQL Compiler XMLQuery, XMLTable SQL/XML Rel optimizer, Execution engine Compiler XQuery Parser SQLX rewrite Query Pushdown Normalization Type check Compiler Parser XDS data sources, files Push down XQuery Java Engine

XQuery Java implementation: 

XQuery Java implementation XQuery or XQueryX input Extensible function implementation Compiles into rowsource like structures Optimization – push XQuery to XMLDB XQJ API driver – for accessing mid tier/backend Shared data model with XSL/XPath Shared F&O – pre-defined & external Standard Function implementation interfaces Write Java func once – use it in XQuery/XSLT

IBM DB2: 

IBM DB2 Hybrid SQL/XQuery Compiler XML Navigation Query Evaluation Run-time SQL/XML Parser XQuery Parser Table Storage XML Indexes XML Storage Native XML storage High-performance XPath processing Unified internal representation XML node-level, text indexes SQL extended for XML Dynamic function dispatch XML data type in SQL XQuery top-level parser Languages Compose See System RX in Session 10

Design Point: 

Design Point XML Parsing is slow Thus: Native storage of the XQuery Data Model (QDM) Direct access to nodes From node to node From indexes

The XML Schema repository: 

The XML Schema repository register xmlschema ‘http://ibm.com/xsr/recipe-v1.xsd’ from ‘recipe-v1.xsd’ as recipe1 complete Stable, fast access to schemas stored in the database Schemas in many documents Recommended, but not required: Schemas have unique URIs for each schema version Reuse target namespace, except for drastic changes Documents specify the schema URI in xsi:schemaLocation schema URI local file DB2 id only one doc

A Recipe in XML (v1): 

A Recipe in XML (v1) <Recipe xmlns=“http://ibm.com/xsr/recipe” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=“http://ibm.com/xsr/recipe http://ibm.com/xsr/recipe-v1.xsd” TimeToPrepare="5" CookMethod="Grill" Difficulty="Easy" Serves="1" Category="Entrees"> <Title>Veggie Dog with Onions</Title> <Ingredients> <Ingredient Name=“veggie sausage" Amount="1" /> <Ingredient Name=“hot dog bun" Amount="1" /> <Ingredient Name=“sliced onion" Amount="1" Unit=“ounce" /> <Ingredient Name=“mustard" Amount=“1" Unit="teaspoon" /> <Ingredient Name=“relish" Amount="2" Unit="teaspoon" /> </Ingredients> … <Comment>Some like spicy mustard, or ketchup instead</Comment> </Recipe>

A Recipe in XML (v2): 

A Recipe in XML (v2) <Recipe xmlns=“http://ibm.com/xsr/recipe” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=“http://ibm.com/xsr/recipe http://ibm.com/xsr/recipe-v2.xsd” TimeToPrepare="5" CookMethod="Grill" Difficulty="Easy" Serves="1" Category="Entrees"> <Title>Veggie Dog with Onions</Title> <Ingredients> <Ingredient Name=“veggie sausage" Amount="1" /> … </Ingredients> <Nutrition> <Calories>300</Calories> <FatGrams>18.5</FatGrams> <CarboGrams>12</CarboGrams> <ProteinGrams>9.5</ProteinGrams> </Nutrition> </Recipe> Schema requires ‘Nutrition’ Incompatible schema change Very few applications affected Changing namespace or table means all applications affected

A Recipe in XML (v3): 

A Recipe in XML (v3) <Recipe xmlns=“http://ibm.com/xsr/recipe” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=“http://ibm.com/xsr/recipe http://ibm.com/xsr/recipe-v3.xsd” TimeToPrepare=“5” CookMethod=“Grill” Difficulty=“Easy” Serves=“1” Category=“Entrees”> <Title>Veggie Dog with Onions</Title> <Ingredients> <Ingredient Name=“veggie sausage” > <Amount Unit=‘item’>1</Amount> </Ingredient> <Ingredient Name=“sliced onion” /> <Amount Unit=“ounce">1</Amount> <Amount Unit=“grams">30</Amount> </Ingredient> <Ingredient Name=“mustard” /> <Amount Unit=“teaspoon">1</Amount> <Amount Unit=“ml">5</Amount> … Internationalization: Amount in multiple units Incompatible schema change Applications using Amount affected, but not many others. Changing namespace or table means all applications affected

Insert rows with XML: 

Insert rows with XML insert into recipes (id,recipe) values (1, xmlParse(document ‘<?xml …’)) Schema derived from the document itself Schema (or not) per document xmlValidate() accepts computed XML too Including xmlValidate( xmlParse( … )) Parse XML from varchar, clob, or blob into QDM without validation insert into recipes (id,recipe) values (2, xmlValidate(?)) Validate XML from a parameter marker using xsi:schemaLocation

Design Point: 

Design Point Queries need to work with conflicting schemas Thus: Document-level validation No strict static typing

There and back in SQL /XML: 

There and back in SQL /XML xmlElement Create an element from relational data xmlForest Create elements from relational data xmlAttributes Create attributes from relational data xmlNamespaces Declare namespaces xmlSerialize Convert XML data to textual form xml2Clob Convert XML data to textual form xmlCast Convert XML to/from relational data

Manipulating XML in SQL /XML: 

Manipulating XML in SQL /XML xmlQuery Invoke XQuery to produce a sequence Takes SQL arguments, including XML Returns an XML sequence xmlTable Invoke XQuery to produce a table Takes SQL+XML, returns SQL+XML xmlExists Test XQuery result is nonempty Takes SQL+XML, returns Boolean xmlAgg Aggregate XML into a sequence xmlConcat Concatenate an XML sequence

xmlExists example: 

xmlExists example Return recipes that are both easy and tasty Both XML and relational predicates select recipe from recipes where xmlExists(‘ $r[Recipe/@Difficulty = “easy”] ’ passing by ref recipe as “r”) and rating > 3

xmlQuery example: 

xmlQuery example Return the title of each recipe select xmlQuery(‘ $r/Recipe/Title ’ passing by ref recipe as “r” returning sequence) from recipes

xmlTable example: 

xmlTable example Return the title, servings, and ingredients of each easy recipe select R.id, X.title, X.howMany, X.stuff from recipes as R, xmlTable(‘$r/Recipe[@Difficulty = “easy”]’ passing R.recipe as “r” columns title varchar(50) path ‘./Title’, howMany int path ‘./@Serves’, stuff xml by ref path ‘./Ingredients/Ingredient’ returning sequence) as X

XML data type is a XQuery Data Model sequence: 

XML data type is a XQuery Data Model sequence DB2 always uses XML(sequence) vs. XML(content), … pass by ref vs. by value returning sequence vs. content Other flavors cause construction and copying Construction has side-effects: Node identity changed Simple values smashed together and lose types … Interferes with query optimization

Returning sequences: 

Returning sequences Return a list of ingredients select id, xmlQuery(‘ $r/ Recipe/ Ingredients/ Ingredient/ @Name/ data(.) ’ passing by ref recipe as “r” returning sequence) as stuff from recipes

Design Point: 

Design Point Cross-language optimization is necessary Thus: Single unified model Avoid copying as much as possible

XQuery join example: 

XQuery join example Return the recipes that the family likes xquery for $r in db2-fn:xmlcolumn(“RECIPES.RECIPE)/ Recipe for $yummyOne in db2-fn:sqlquery( “select xmlForest(title) from familyLikes”) where $r/ Title = $yummyOne return $r

Same join in SQL /XML: 

Same join in SQL /XML Return the recipes that the family likes select R.recipe from familyLikes as L, recipies as R where xmlExists(‘ $r/ Recipe[ Title = $yummyOne ] ’ passing by ref R.recipe as “r”, L.title as “yummyOne”)

Design Point: 

Design Point SQL is better than XQuery XQuery is better than SQL Thus: Support both languages Support language composition Give XQuery access to database data

XML Indexes and query matching: 

XML Indexes and query matching Indexing every node in a document is very expensive Slow insertion time Large space, log requirements Fast queries Index value of nodes returned by a simple XQuery ( (‘/’ | ‘//’) (axis)? (name-test | kind-test) )+ Given a query Q and an index I Determine if the index is applicable Build index pushdown QI : Q = QC (QI)

Shredding XML into tables: 

Shredding XML into tables Some applications require relational data Annotated schemas Can break document into XML fragments <xsd:element name=“phone” type=“xsd:string” sql:relation=“employee_tab” sql:field=“phone_col” />