Open Conceptual Data Models :© 2008 OpenLink Software, All rights reserved Open Conceptual Data Models Making the Conceptual Layer Real
via
RDF Linked Data
Conceptual Data Models in the Linked Data Web :© 2008 OpenLink Software, All rights reserved Conceptual Data Models in the Linked Data Web Linked Data Vision:
The transition of the Web
from a Web of linked documents
to a Web of interlinked structured data items (aka: entities, data objects, resources)
Concurrent trend in the IT industry:
A recognition of the benefits of conceptual data models vs logical data models
The Big Question:
To what extent does the Linked Data support conceptual level data models ?
Open Conceptual Data Models :© 2008 OpenLink Software, All rights reserved Open Conceptual Data Models Topics:
Conceptual & Logical Data Models
Conceptual Models for the Semantic Web
Realizing Conceptual Models through Ontologies & Linked Data
Virtuoso RDF Views
ADO.NET Data Services & the Entity Data Model
Conceptual & Logical Data Models :© 2008 OpenLink Software, All rights reserved Conceptual & Logical Data Models Describe a software system’s target problem space
Typically, in today’s database-driven applications
Three levels of data model
Physical
How data is physically represented on disk
Logical (aka logical schema)
Expresses problem domain in terms of data management technology (tables / columns)
e.g. relational schema
Conceptual (aka conceptual schema)
Purely semantic description of problem space
Describes things (entities), their characteristics (attributes) & associations between things (relationships)
Logical Data Model :© 2008 OpenLink Software, All rights reserved Logical Data Model Most prominent of the three data model types
Main focus of database applications
Due to pervasiveness of SQL in application code
Weaknesses
Impedance mismatch
Loss of semantics during development process
Heterogeneous databases & interoperability
Logical Data Model Weaknesses :© 2008 OpenLink Software, All rights reserved Logical Data Model Weaknesses Impedance Mismatch
SQL expresses queries in terms of tables / views
=> targets logical schema
Normalization fragments the data model
Entities & their attributes may be split across several tables
Navigation between objects requires relational joins over two or more tables
Table rows must be reconstituted into higher level conceptual entities
Conceptual level data model is desirable to:
Remove impedance mismatch
Isolate application from changes to logical data model
Provide framework for human level interaction
Logical Data Model Weaknesses :© 2008 OpenLink Software, All rights reserved Logical Data Model Weaknesses Loss of Semantics During Development
Process:
Develop conceptual model (E-R modelling)
Transform to logical model for implementation
Derive physical model from logical model
Problems:
Each move to a lower level model discards meaning
Higher level model typically not retained
Model semantics fragmented across schema / business rules / application code
Application must know logical data model
Must be hardcoded or inferred (imperfectly) from system tables
Logical Data Model Weaknesses :© 2008 OpenLink Software, All rights reserved Logical Data Model Weaknesses Heterogeneous Databases & Interoperability
Logical data model
Describes problem domain in terms of tables/columns
Requires SQL to navigate model
Application
Exposed to specifics of a particular vendor’s RDBMS
In heterogeneous database environment, must handle
Different SQL dialects
Different schemas
No explicit data model. No explicit semantics.
Interoperability/integration = perpetual problem for IT depts
Conceptual Models for the Semantic Web :© 2008 OpenLink Software, All rights reserved Conceptual Models for the Semantic Web Growing recognition in the industry of the benefits of a conceptual, rather than logical, model for data-centric applications
e.g. Microsoft’s Entity Data Model / Entity Framework
Semantic Web technologies provide powerful tools for this paradigm shift
Benefits of Conceptual Models :© 2008 OpenLink Software, All rights reserved Benefits of Conceptual Models How the Semantic Web benefits
More faithfully represents human view of domain of interest
Conceptual model & semantics
Explicit & available globally
Not implicit & fragmented across business logic / UI etc
Better / explicit semantics promises better search engines
Much easier heterogeneous data integration
Data on the Web is inherently heterogeneous
Application Areas – Present & Future :© 2008 OpenLink Software, All rights reserved Application Areas – Present & Future Social networking, e-commerce, collaborative working
Require shareable, standards-based, cross-platform conceptual views of data
Data portability
Needed as Web users maintain multiple points of presence – blogs, social network accounts etc.
Open business models
Require exchange & integration of large amounts of data
Scientific research – sharing of knowledge & findings
Requires transparent access to distributed heterogeneous data
Requires database integration using global schema
Autonomous intelligent agents
Free humans from large-volume information processing
Semantic Web Technology Benefits :© 2008 OpenLink Software, All rights reserved Semantic Web Technology Benefits What Semantic Web technologies bring:
Ontologies
Can represent common semantics
Spanning databases, applications, enterprises, on-line communities
Act as a shared conceptual model
Provide common models (FOAF, SIOC etc)
Common Semantics (Ontologies) & Common Data Representation (RDF)
Enable cross data source querying using SPARQL
Content from several sites can be combined / explored
Querying using proprietary APIs unnecessary
Brute force data merging unnecessary
Open Data Formats, Platform Independence, Common Models
Allow data portability and data integration
Realizing Conceptual Models :© 2008 OpenLink Software, All rights reserved Realizing Conceptual Models Ontologies
Provide the building blocks of Semantic Web conceptual models
Define the concepts and their relationships in a domain of interest
Describing Classes & Properties – Ontology Languages
RDFS
Introduces the notions of concepts (classes) & instances
OWL
Adds more vocabulary for describing:
relations between classes
cardinality
richer typing of properties, etc.
Goodness of Fit :© 2008 OpenLink Software, All rights reserved Goodness of Fit RDF was designed from the ground up as a metadata data model
RDF / RDFS / OWL work directly at the level of conceptual models
Conceptual model terminology matches RDF/OWL terminology
Concepts, entities, attributes, relationships
A natural fit!
RDF lends itself naturally to describing conceptual models
Semantic Expressivity :© 2008 OpenLink Software, All rights reserved Semantic Expressivity DDL-based Relational Model
Relationship between two entities isn’t explicit
Foreign key relating two rows in separate tables doesn’t express the nature of the relationship
Semantics must often be inferred from table definitions
RDF-based Conceptual Model
Relationship between two entities is stated explicitly by predicate in subject-predicate-object triple
Semantic expressivity of RDF/RDFS/OWL is much better than DDL
Has richer semantic content than equivalent DDL-based logical/relational model
RDF Conceptual Model – Artist / Records / Tracks :© 2008 OpenLink Software, All rights reserved RDF Conceptual Model – Artist / Records / Tracks
Global Granular Information Sharing :© 2008 OpenLink Software, All rights reserved Global Granular Information Sharing Traditional Logical/Relational Data Model
Schema described by DDL is internal to DBMS
Primary keys identifying an individual table row (i.e. entity instance) not globally unique, not easily usable outside host DBMS
Gives rise to ‘data silos’
RDF’s use of HTTP-based URLs
Externalises the data and schema
Makes both globally accessible & scalable
Provides globally unique IDs for entities/relations/classes
A vehicle for granular, global information sharing down to the equivalent of the record level
Linked Data – What is It? :© 2008 OpenLink Software, All rights reserved Linked Data – What is It? A method for exposing, sharing & connecting data on the Web
A term coined by Tim Berners-Lee that describes HTTP-based Data Access by Reference for the Web
Open Data Access & Connectivity mechanism for the Web
A richer linking mechanism for the Web that takes us from Hypertext Links (Document to Document) to Hyperdata Links (across things that documents are about)
Linked Data – Why Is It Important :© 2008 OpenLink Software, All rights reserved Linked Data – Why Is It Important It exposes the compound nature of Web Resources
Information resources (Containers) are uniquely identified & referenceable
Entities within Containers are uniquely identified & referencable
It provides an Open Data Access & Connectivity mechanism for the Web
It delivers a powerful mechanism for meshing disparate and heterogeneous data sources
Linked Data Model :© 2008 OpenLink Software, All rights reserved Linked Data Model Changes the focus from linked documents to linked entitiesThe document as a data container becomes less relevant
Hyperdata Links Between Data Objects :© 2008 OpenLink Software, All rights reserved Hyperdata Links Between Data Objects
Linked Data Benefits – Natural Navigation :© 2008 OpenLink Software, All rights reserved Linked Data Benefits – Natural Navigation Natural Navigation Through Typed Links
RDF entities are identified by dereferencable URIs (URLs)
Navigating from one data item to another is easy
One click to dereference in Semantic Web Browser
e.g. OpenLink Data Explorer
URI of object in an RDF statement is a typed link
Link’s “type” is defined by the statement predicate
Relational/Logical Model
Cumbersome
Requires SQL joins + typically Object-Relational mapping
e.g. in C# : track = lennonAlbum.Tracks[“Imagine”]
Linked Data Benefits - Aggregatable Data :© 2008 OpenLink Software, All rights reserved Linked Data Benefits - Aggregatable Data Often desirable to have an integrated view of all the data available about an item or topic
Database Realm
Integration problematic, difficult to combine logical schemas
Semantic Web
Data aggregation is easy: every resource has a unique URI
Individual items can be linked
Conceptual models can be linked
Cross-domain links enrich domain knowledge
Different facets of the same entity may be described by different URIs minted by different authors
Can be linked. e.g. owl:sameAs, rdf:type predicates
May expose facts not directly represented in any one source
Linked Data – Data Aggregation :© 2008 OpenLink Software, All rights reserved Linked Data – Data Aggregation
Linked Data Benefits - Self Describing Data :© 2008 OpenLink Software, All rights reserved Linked Data Benefits - Self Describing Data RDF
A technology for creating self-describing Web resources
Entity’s type definition ‘accompanies’ it using rdfs:type
An RDF dataset can be queried using SPARQL without knowing anything beforehand about the data
Provides the basis for powerful data exploration tools
Logical / Relational Schema
Users / applications need a detailed understanding of the schema to use and navigate the data
Application’s knowledge of the schema typically hardcoded
Ad-hoc end-user data exploration potentially error prone
Linked Data Benefits - SPARQL :© 2008 OpenLink Software, All rights reserved Linked Data Benefits - SPARQL If a user agent has no built-in knowledge of a particular RDF subject, predicate or object, it can use the URI to retrieve the information
The Power of SPARQL
Discover what sorts of things a data source contains
select distinct ?URI ?ObjectType where { ?URI a ?ObjectType }
Determine all the properties of an entity class
select * where { ?property ?hasValue }
Determine all the properties and values of an entity instance
DESCRIBE
No prior knowledge of the RDF data source is needed
Virtuoso - Linked Data Generation Options :© 2008 OpenLink Software, All rights reserved Virtuoso - Linked Data Generation Options Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity
Virtuoso RDF Views :© 2008 OpenLink Software, All rights reserved Virtuoso RDF Views Expose relational data as RDF
Provide the means to move from a logical model view to a conceptual model view
Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL)
No physical regeneration of relational data
RDF Views =
Virtuoso RDF Meta-Schema +
Meta-Schema Language
MSL =
A domain specific, declarative language for mapping a logical SQL data model to a conceptual RDF data model
Slide 29:© 2008 OpenLink Software, All rights reserved Northwind Demo Database:RDF View Definition Extract prefix northwind:
…
create iri class northwind:Customer (in customer_id varchar not null)
…
alter quad storage virtrdf:DefaultQuadStorage
…
from Demo.demo.Customers as customers
from Demo.demo.Orders as orders … { Demo.demo.Customers Northwind RDF View Definition create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) {
…
northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ;
northwind:companyName customers.CompanyName as … ;
…
northwind:fax customers.Fax as virtrdf:Customer-fax . …
} } northwind:Customer(orders.CustomerID)northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order .
Northwind Demo Database:Customer Table to RDF Entity Mapping :© 2008 OpenLink Software, All rights reserved Northwind Demo Database:Customer Table to RDF Entity Mapping Orders Table
LinqToRdf + Virtuoso :© 2008 OpenLink Software, All rights reserved LinqToRdf + Virtuoso
LinqToRdf to MusicBrainz - Conceptual Model Veneer :© 2008 OpenLink Software, All rights reserved LinqToRdf to MusicBrainz - Conceptual Model Veneer
ADO.NET Data Services & Entity Data Model :© 2008 OpenLink Software, All rights reserved ADO.NET Data Services & Entity Data Model A framework for exposing ‘pure data’ service over HTTP
No support for RDF
Fails to imbibe any of RDF’s inherent benefits
Lack of platform independence & standards compliance
Supports REST-style interfaces
Supports Atom, JSON and XML payloads
But
Server-side: Windows only
Consuming Astoria services at a higher level requires Windows .NET client or Silverlight-supported browser
ADO.NET Data Services & Entity Data Model :© 2008 OpenLink Software, All rights reserved ADO.NET Data Services & Entity Data Model Server-side only conceptual model
Powerful URL addressing to query/navigate/sort/filter etc
Customers collection:http://myserver/data.svc/Customers
Customer ALFKI: http://myserver/data.svc/Customers('ALFKI')
Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders
But
Client must know conceptual schema
e.g. to construct above URIs
Lack of Deferencable Entity IDs
Ability to discover entities and dereference their descriptions (attributes/relations) is confined to the facilities offered by .NET
c.f. SPARQL’s ability to handle unknown data sources
ADO.NET Data Services & Entity Data Model :© 2008 OpenLink Software, All rights reserved ADO.NET Data Services & Entity Data Model No Support for Non-SQL Data Sources
Astoria is aimed exclusively at making relational data Web accessible
c.f. Semantic Web & Linked Data
Recognize that vast amounts of data resides in unstructured and semi-structured data sources
Support for embedding RDF into existing (X)HTML
RDFa, GRDDL, eRDF
Emerging tools for converting non-RDF data to RDF
Emerging tools for exposing SQL data as RDF
Astoria lacks scalability & scope of Semantic Web technologies