09structuralrecursion

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Managing XML and Semistructured Data: 

Managing XML and Semistructured Data Lecture 9: Query Languages - StruQL and XSL Prof. Dan Suciu Spring 2001

In this lecture: 

In this lecture Website management with Strudel Background on skolem functions Skolem functions in StruQL Structural recursion XSL Resources Catching the boat with Strudel VLDBJ 2001 UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion Buneman, Fernandez, Suciu.VLDBJ 2000 Data on the Web Abiteboul, Buneman, Suciu : sections 5.2, 6.4, 6.5

Strudel and StruQL: 

Strudel and StruQL Strudel = a Website management tool Idea: separate the following three tasks Management of data use some database Management of the site’s structure use StruQL Management of the site’s presentation use HTML templates (this was before XML...)

Example: Bibliography Data: 

Example: Bibliography Data {Bib: { paper: { author: “Jones”, author: “Smith”, title: “The Comma”, year: 1994 }, paper: { author: “Jones”, title: “The Dot”, year: 1998 }, paper: { author: “Mark”, .... } . . . } } Input data: Bib paper paper paper author author title year “Jones” “Smith” “The Comma” .....

Simple Website Definition in StruQL: 

Simple Website Definition in StruQL WHERE Root -> “Bib.paper.author” -> A CREATE Root(), HomePage(A) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root() StruQL query: Result: Root(), HomePage(A) = Skolem Functions (more later) “Smith” “Jones” “Mark” name name name home home home

Complex Website Definition in StruQL: 

Complex Website Definition in StruQL WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T

Example: A Complex Web Site: 

Example: A Complex Web Site “The Comma” “The Dot”

Skolem Functions: 

Skolem Functions Maier, 1986 in OO systems Kifer et al, 1989 F-logic Hull and Yoshikawa, 1990 deductive db (ILOG) Papakonstantinou et al., 1996 semistructured db (MSL)

Skolem Functions in Logic: 

Skolem Functions in Logic Origins: First Order Logic The Satisfiability problem given a formula , does it have a model ?

Skolem Functions in Logic: 

Skolem Functions in Logic Example: does  have a model ? Skolem functions: replace  with functions, drop  Fact:  has a model iff ’ “has a model”

Skolem Functions in Databases: 

Skolem Functions in Databases Recall Datalog: Means:

Skolem Functions in Databases: 

Skolem Functions in Databases Now consider: I want to “create a new object x”. What meaning ?

Skolem Functions in Databases: 

Skolem Functions in Databases Better: use Skolem functions directly in Datalog Choices:

Skolem Functions in StruQL: 

Skolem Functions in StruQL StruQL’s semantics: Input graph: (Node, Edge) Output graph:(Node’, Edge’) Example: WHERE Root -> “Bib.paper.author” -> A CREATE Root(), HomePage(A) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root() Node’(Root()) :- Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)

A Different Paradigm: Structural Recursion: 

A Different Paradigm: Structural Recursion Data as sets with a union operator: {a:3, a:{b:”one”, c:5}, b:4} = {a:3} U {a:{b:”one”,c:5}} U {b:4}

Structural Recursion: 

Structural Recursion f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = f($T) f({}) = {} f($V) = if isInt($V) then {result: $V} else {} Example: retrieve all integers in the data

Structural Recursion: 

Structural Recursion What does this do ? f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L=a then {b:f($T)} else {$L:f($T)} f({}) = {} f($V) = $V Returns the same tree with a-edges replaced by b-edges

Structural Recursion: 

Structural Recursion What does this do ? f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = {$L:{$L:f($T)}} f({}) = {} f($V) = $V Input = tree with n nodes Output = tree with 2n nodes (every edge is doubled)

Structural Recursion: 

Structural Recursion Example: increase all engine prices by 10%

Structural Recursion: 

Structural Recursion Retrieve all subtrees reachable by (a.b)*.a a b a

Structural Recursion: General Form: 

Structural Recursion: General Form f1($T1 U $T2) = f1($T1) U f1($T2) f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T) f1({}) = { } f1($V) = { } fk($T1 U $T2) = fk($T1) U fk($T2) fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T) fk({}) = { } fk($V) = { } . . . . Each of E1, ..., Ek consists only of {_ : _}, U, if_then_else_

Evaluating Structural Recursion: 

Evaluating Structural Recursion Recursive Evaluation: Compute the functions recursively, starting with f1 at the root Termination is guaranteed. How efficiently can we evaluate this ?

Structural Recursion: 

Structural Recursion Consider this: f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = {$L:f($T)}, $L:f($T)} f({}) = {} f($V) = $V

Naive Recursive Evaluation: 

Naive Recursive Evaluation a b c d a a b b b b c c c c c c c c Input tree = n nodes Output tree = 2n+1 – 1 nodes

Efficient Recursive Evaluation: 

Efficient Recursive Evaluation Recursive Evaluation with function memorization. PTIME complexity. f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = {$L:f($T)}, $L:f($T)} f({}) = {} f($V) = $V Alternatively: apply the function in parallel to each input edge  Bulk Evaluation

Bulk Evaluation: 

Bulk Evaluation Sometimes f doesn’t return anything  use  edges f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L=c then $T else f($T) f({}) = {} f($V) = $V

Epsilon Edges: 

Epsilon Edges Meaning of  edges:  = a b c d a b c d d c

Epsilon Edges: 

Epsilon Edges Note: union becomes easy to draw with  edges: Example: a b c d  e U = a b c d e  = a b c d e T1 T2 U = T1 T2  

Bulk Evaluation: 

Bulk Evaluation Idea: “apply” E1, ..., Ek independently on each edge, then connect with  edges  PTIME

Bulk Evaluation: 

Bulk Evaluation Recall (a.b)*.a: a b a a a b b a c b a a a b b a c b b a c a d d d d b c

Structural Recursion: 

Structural Recursion Can evaluate in two ways: Recursively: memorize functions’ results Bulk: apply all functions on all edges, in parallel, connect, eliminate what is useless Complexity: PTIME More precisely: NLOGSPACE Works on graphs with cycles too !

XSL: 

XSL XSLT 1.0 (a recommendation) http://www.w3.org/TR/xslt.html XSLT 1.1: (a working draft) http://www.w3.org/TR/xslt11/ In commercial products (e.g. IE5.0)

XSL: 

XSL Purpose: stylesheet specification language: stylesheet: XML -> HTML in general: XML -> XML Uses XPath

XSL Program: 

XSL Program XSL program = template-rule ... template-rule template-rule = match pattern + template <xsl:template match = “* | /”> <xsl:apply-templates/> </xsl:template> <xsl:template match = “/bib/*/title”> <result> <xsl:value-of select = “.” /> </result> </xsl:template> Example: Retrieve all book titles:

Simple XSL Program: 

Simple XSL Program Copy the input: <xsl:template match = “/”> <xsl:apply-templates/> </xsl:template> <xsl:template match = “text()”> <xsl:value-of select=“.”/> </xsl:template> <xsl:template match = “*”> <xsl:element name=“name(.)”> <xsl:apply-templates/> </xsl:element> </xsl:template>

Flow Control in XSL: 

Flow Control in XSL <xsl:template match = “* | /”> <xsl:apply-templates/> </xsl:template> <xsl:template match=“a”> <A><xsl:apply-templates/></A> </xsl:template> <xsl:template match=“b”> <B><xsl:apply-templates/></B> </xsl:template> <xsl:template match=“c”> <C><xsl:value-of/></C> </xsl:template>

Slide37: 

<a> <e> <b> <c> 1 </c> <c> 2 </c> </b> <a> <c> 3 </c> </a> </e> <c> 4 </c> </a> <A> <B> <C> 1 </C> <C> 2 </C> </B> <A> <C> 3 </C> </A> <C> 4 </C> </A>

XSL is Structural Recursion: 

XSL is Structural Recursion Equivalent to: f(T1 U T2) = f(T1) U f(T2) f({L: T}) = if L= c then {C: t} else L= b then {B: f(t)} else L= a then {A: f(t)} else f(t) f({}) = {} f(V) = V XSL query = single function XSL query with modes = multiple function (next)  <xsl:template match = “* | /”>  <xsl:template match=“c”>  <xsl:template match=“b”>  <xsl:template match=“a”>

Modes in XSLT: 

Modes in XSLT <xsl:template match = “/”> <xsl:apply-templates mode=“f”/> </xsl:template> <xsl:template match=“*” mode=“f”/> <xsl:template match=“a” mode=“f”> <result> <xsl:copy-of match=“.”/> </result> <xsl:apply-templates mode=“g”/> </xsl:template> <xsl:template match=“*” mode=“g”> <xsl:template match=“b” mode=“g”> <xsl:apply-templates mode=“f”/> </xsl:template> f(T1 U T2) = f(T1) U f(T2) f({a: T}) = {result:T} U g(T) f({}) = {} f(V) = V g(T1 U T2) = g(T1) U g(T2) g({b: T}) = f(T) g({}) = {} g(V) = V <xsl:copy-of ... > copies the input to the output Compute the path (a.b)* : ignoring modes, this computes (a|b)*

Modes in XSLT: 

Modes in XSLT Mode = a name for a group of template rules No mode = empty mode Same as having multiple recursive functions

Conflict Resolution for Template Rules: 

Conflict Resolution for Template Rules If several template rules match, choose that with highest “priority”. Explicit priority: <xsl:template match=“abc” priority=“1.41”> Computing implicit priority: ad-hoc rules given by the W3C, based on match match=“P1 | P2 | ...”  transform to a set of template rules. match=“abc”  the priority is 0. match=“[... some namespace name... ]”  the priority is -0.25. match=“node()”  the priority is -0.5. Otherwise, the priority is 0.5 It is an error if this leaves more than one matching template rule.

Built-in Template Rules: 

Built-in Template Rules Keeps us going: <xsl:template match = “* | /”> <xsl:apply-templates/> </xsl:template> there is one such rule for each mode Copies what we forgot: <xsl:template match = “text()|@*”><xsl:value-of select=“.”/> </xsl:template> there is only one rule, for the empty mode Lowest priorities among all rules; hence, can be easily overridden

XSL Template: 

XSL Template <xsl:template match = “expression” mode = “name” priority = “number” name = “name” > Body </xsl:template> Default: mode = “” priority = (computed as explained earlier) name = when no match, no mode Body = XML constructors: <myTag>...</myTag> <b> ... </b> ... XSL instructions: <xsl:apply-templates> (= recursive call) <xsl:value-of> (= copy the value) <xsl:copy> (= shallow copy) <xsl:copy-of> (=deep copy) <xsl:element> (= more flexible than XML constructors) <xsl:attribute> (= add an attribute to the element) <xsl:if> (= conditional) <xsl:for-each> Instructions for variables

XSL Apply Templates: 

XSL Apply Templates <xsl:apply-templates select = “expression” mode = “name” > Body </xsl: apply-templates> Default select = “*” (children) mode = “” (empty mode) Body: “Sort” instructions “Paramemter” instructions

XSL Variables: 

XSL Variables Declaring a variable <xsl:variable name = “vname” select = “value”> value </xsl:variable> Value = either in select, or in body Either in <xsl:template> ... </xsl:template> or at top level Declaring a parameter: <xsl:param select = “value”> value </xsl:param> In <xsl:template> ... </xsl:template>, at the beginning Passing a paramemter <xsl:with-param select = “value”> value </xsl:param> In <xsl:apply-templates> ... </xsl:apply-templates > Using variables: {$vname}

XSL and Structural Recursion: 

XSL and Structural Recursion XSL: mainly on trees may loop Structural Recursion: arbitrary graphs always terminates <xsl:template match = “e”> <xsl:apply-patterns select=“/”/> </xsl:template> stack overflow on IE 5.0 add the following rule: