Semantics for Valid XML Documents : Semantics for Valid XML Documents Harold Boley Dagstuhl Seminar 01021 Semantics in Databases
Jan. 7-12, 2001
Overview : Overview Introduction of valid XML documents to establish a ‘grammar-typed’ syntax for Web data
Survey of some practical (Web) aspects of three semantics:
Transformational semantics (incl. proof-theoretic ~)
Model-theoretic semantics
Metadata semantics
Study of (Web) applicability and combination of the three semantics
Systems using or implementing such semantics
Running example: address-document processing
Address Example: HTML to XML : Address Example: HTML to XML Xaver M. Linde
Wikingerufer 7
10555 Berlin HTML Markup: XML tags are chosen for
content-structuring needs
Xaver M. Linde
Wikingerufer 7
10555 Berlin
XML Markup: While not conveying
any formal semantics:
Address Example: XML to XML :
Xaver M. Linde
Wikingerufer 7
10555 Berlin
Address Example: XML to XML
Xaver M. Linde
Wikingerufer 7
10555 Berlin
XML Markup 1: XML Markup 2: XML stylesheets are
usable to transform
XML elements E.g., for data
interoperation:
Address Example: XML to XML :
Xaver M. Linde
Wikingerufer 7
10555 Berlin
Address Example: XML to XML
Xaver M. Linde
Wikingerufer 7
10555 Berlin
XML Markup 1: XML Markup 2: XML stylesheets are
usable to transform
XML elements E.g., for a kind
of normalization:
Address Example: XML Queries :
Xaver M. Linde
Wikingerufer 7
10555 Berlin
WHERE
Xaver M. Linde
$s
$t
CONSTRUCT
$s
$t
Address Example: XML Queries XML Markup: XML Query (XML-QL): XML queries can
select subelements
of XML elements element s subelements
Wikingerufer 7
10555 Berlin
Address Example: Prolog Queries : address(
name("Xaver M. Linde"),
street("Wikingerufer 7"),
town("10555 Berlin")
) Address Example: Prolog Queries Prolog Term: Prolog Query: Prolog queries can
select substructures
of Prolog structures S = "Wikingerufer 7"
T = "10555 Berlin" structure s substructures address(
name("Xaver M. Linde"),
street(S),
town(T)
)
Address Example: The Element Tree : Address Example: The Element Tree Node-Labeled, (Left-to-Right-)Ordered Element Tree: subtrees tree
Address Example:Document Type Definition and Tree (1) : Address Example: Document Type Definition and Tree (1) Document Type Tree:
Document Type Definition (DTD): address PCDATA PCDATA PCDATA name street town address ::= name street town
name ::= PCDATA
street ::= PCDATA
town ::= PCDATA Extended Backus-Naur Form (EBNF):
Address Example:Document Type Definition and Tree (2) : Address Example: Document Type Definition and Tree (2) Document Type Tree:
Document Type Definition (DTD): address PCDATA PCDATA PCDATA name street town place
Well-Formedness and Validity : Well-Formedness and Validity Open and close all tags
Empty tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
< and & are only used to start tags and entities
Only the five predefined entity references are used Matches the type-like constraints listed in the DTD (or, can be generated from DTD as linearized CF grammar-derivation tree) XML principles for a document being well-formed: XML principle for a document being valid with respect to a DTD : Checked by
validators such as http://www.stg.brown.edu/service/xmlvalid/
Practical Semantics Need: Web(-Page) Transformations, Models, and Metadata : Practical Semantics Need: Web(-Page) Transformations, Models, and Metadata Up to now: XML with Document Type Definitions (DTDs) or XML Schemas as the syntactic basis
Practical need for Web semantics:
1) Getting meaning from XML Web pages through translation results
2) Modeling formal XML elements by constructing their extensions (finite or infinite sets)
3) Annotating arbitrary Web objects in RDF/XML for semantic retrieval
Practical Semantics Techniques: Web Transformations, Models, and Metadata : Practical Semantics Techniques: Web Transformations, Models, and Metadata Corresponding semantic techniques:
1) Transformational semantics translates XML into other XML or HTML documents via XSLT stylesheets (e.g. using Cocoon engine)
2) Model-theoretic semantics explicates rule consequences by generating Herbrand models for XML knowledge bases of relations and functions
3) Metadata semantics in XML-based RDF (Resource Description Framework) and RDF Schema enables high-precision search engines for Berners-Lee’s "Semantic Web"
Address Document: Transformational Semantics via an XSLT Stylesheet : Address Document: Transformational Semantics via an XSLT Stylesheet
Me2XML
96 Hyper Road
Boston
RDF4All 2001 Broadway New York
XML4You
96 Hyper Road
Boston
Me2XML
96 Hyper Road
Boston
RDF4All
2001 Broadway New York
XML4You
96 Hyper Road
Boston
% start fact base for addresses address(
name("Me2XML"),
place(
street("96 Hyper Road"),
town("Boston")
)
). address(
name("RDF4All"),
place(
street("2001 Broadway"),
town("New York")
)
). address(
name("XML4You"),
place(
street("96 Hyper Road"),
town("Boston")
)
).
% end fact base for addresses XSLT template
Address Document: XSLT Stylesheet Template as a Tree-Transforming Rule : Address Document: XSLT Stylesheet Template as a Tree-Transforming Rule
Colocation Rule: Model-Theoretic Semantics via Consequence Generation : Colocation Rule: Model-Theoretic Semantics via Consequence Generation
Me2XML
XML4You
% start fact base for addresses address(
name("Me2XML"),
place(
street("96 Hyper Road"),
town("Boston")
)
). address(
name("RDF4All"),
place(
street("2001 Broadway"),
town("New York")
)
). address(
name("XML4You"),
place(
street("96 Hyper Road"),
town("Boston")
)
).
% end fact base for addresses % start rule base for colocated colocated(name(N1),name(N2)) :-
address(name(N1),place(P)),
address(name(N2),place(P)),
lexiless(N1,N2).
% end rule base for colocated % start fact base for colocated colocated(
name( "Me2XML" ),
name( "XML4You")
).
% end fact base for colocated Horn rule The Herbrand model of
the rule and addresses
is the set of the colocated
and address ground facts
N1
N2
. . .
Linked Address Documents: Metadata Semantics via RDF Annotations : Linked Address Documents: Metadata Semantics via RDF Annotations
flat
Me2XML
96 Hyper Road
Boston
RDF4All 2001 Broadway New York
. . .
nested
Me2XML
96 Hyper Road
Boston
. . .
http://addr.flat.com http://addr.nest.com
Practical Semantics Combination: Metadata Transformation Model : Practical Semantics Combination: Metadata Transformation Model Generate the finite model containing all colocated facts derivable from given flat-address base facts, with inference rules available only for nested facts
This problem can be divided into three subproblems:
Navigate metadata, starting from flat-address URL, for available nested-address version (alternatively, use a semantic search engine with Shape = nested)
If none available, transform flat-address facts into nested addresses via the URL’s XSLT stylesheet
Apply colocated rule to nested-address base facts to generate finite model of colocated facts Consider the following problem of (inferential, XML) data mining with report generation for findings:
(1) Check Metadata via RDF Annotations : (1) Check Metadata via RDF Annotations
flat
Me2XML
96 Hyper Road
Boston
RDF4All 2001 Broadway New York
. . .
http://addr.flat.com http://addr.nest.com ConvertsTo Shape
(2) Transform via the XSLT Stylesheet : (2) Transform via the XSLT Stylesheet
Me2XML
96 Hyper Road
Boston
RDF4All 2001 Broadway New York
XML4You
96 Hyper Road
Boston
% start fact base for addresses address(
name("Me2XML"),
place(
street("96 Hyper Road"),
town("Boston")
)
). address(
name("RDF4All"),
place(
street("2001 Broadway"),
town("New York")
)
). address(
name("XML4You"),
place(
street("96 Hyper Road"),
town("Boston")
)
).
% end fact base for addresses PCD
(3) Generate Model as Rule Consequences : (3) Generate Model as Rule Consequences
Me2XML
XML4You
% start fact base for addresses address(
name("Me2XML"),
place(
street("96 Hyper Road"),
town("Boston")
)
). address(
name("RDF4All"),
place(
street("2001 Broadway"),
town("New York")
)
). address(
name("XML4You"),
place(
street("96 Hyper Road"),
town("Boston")
)
).
% end fact base for addresses % start rule base for colocated colocated(name(N1),name(N2)) :-
address(name(N1),place(P)),
address(name(N2),place(P)),
lexiless(N1,N2).
% end rule base for colocated Horn rule Data findings report:
Me2XML and XM4You might
be the same organization
Model-Theoretic Semantics : Model-Theoretic Semantics Practically usable only for finite models (such as in the address example)
Still theoretically interesting to formalize semantics of XML-based inference systems (such as RFML, RuleML, DAML, or OIL)
Even when finite, not practical for highly distributed and highly dynamic fact bases (such as the ever-changing geographic data scattered over the Web)
Perhaps to be replaced/augmented by semantics characterizing new logic for the Web, which is open, uncertain, and paraconsistent
Transformational Semantics : Transformational Semantics Practically usable for all declarative programs, e.g. for normalization or interoperation (such as in the address example)
XSLT stylesheet engines flourish, e.g. Cocoon, and probably to be built directly into most Web browsers
XSLT with variables and parameter passing recently shown to be relationally complete
The emerging XML query algebra permits similar transformations, incl. certain functional programs
The Rule Markup Initiative will provide a lattice of XML DTDs (Schemas) for RuleML subsets containing inference and/or transformation rules
Metadata Semantics : Metadata Semantics Practically usable for describing/localizing all possible (Web) objects, whose internals need not be accessible (unlike in the address example)
RDF can be formalized logically and its expressive power may be generalized via logic programming (and hypergraphs): metadata combined with rules
Metadata complemented by subsumption semantics for XML tags to better integrate XML and RDF
RDF extensible by subClassOf/subPropertyOf vocabularies (cf. sorted logics), as in RDF Schema, and, further, by full ontologies, as in DAML or OIL
SubPropertyOf Example: An Illustrative Hierarchy of Properties : SubPropertyOf Example: An Illustrative Hierarchy of Properties Color Shape Texture Surface Composition Density Hardness Body nested flat . . . . . . Property inheritance handled as, e.g., for description logic roles:
RDF Schema
OIL and DAML
Conclusions : Conclusions Identified and exemplified three complementary semantics for XML data:
Transformations
Models
Metadata
For data distributed in the Web, models are of limited use, while transformations and metadata are being widely applied
Further semantics will be needed for the Web, e.g.:
SQL semantics: Contributions to this Dagstuhl Seminar
“URI-deictic” logics: Berners-Lee’s “pointing as proving”
Already the different usage of transformations and metadata would suggest: