Slide1: Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Lecture #22
Trustworthy Semantic Webs
March 24, 2008 Data and Applications Security
Developments and Directions
Outline: Outline
Semantic web
XML and XML security
RDF and RDF security
Ontologies
Rules
Applications
Reference:
A semantic web primer: Antoniou and van Harmlen, MIT Pressm 2003
Chapter 25 of text book
Building trustworthy semantic web, Thuraisingham, CRC Press, 2007 (to appear)
From Today’s Web to Semantic web: From Today’s Web to Semantic web
Today’s web
High recall, low precision: Too many web pages resulting in searches, many not relevant
Sometimes low recall
Results sensitive to vocabulary: Different words even if they mean the same thing do not results in same web pages
Results are single web pages not linked web pages
Semantic web
Machine understandable web pages
Activities on the web such as searching with little or no human intervention
Technologies for knowledge management, e-commerce, interoperability
Solutions to the problems faced by today’s web
Knowledge Management and Personal Agents: Knowledge Management and Personal Agents Knowledge Management
Corporation Need: Searching, extracting and maintaining information, uncovering hidden dependencies, viewing information
Semantic web for knowledge management: Organizing knowledge, automated tools for maintaining knowledge, question answering, querying multiple documents, controlling access to documents
Personal Agent
John is a president of a company. He needs to have a surgery for a serious but not a critical illness. With current web he has to check each web page for relevant information, make decisions depending on the information provided
With the semantic web, the agent will retrieve all the relevant information, synthesize the information, ask John if needed, and then present the various options and makes recommendations
E-Commerce: E-Commerce
Business to Consumer
Users shopping on the web; wrapper technology is used to extract information about user preferences etc. and display the products to the user
Use of semantic web: Develop software agents that can interpret privacy requirements, pricing and product information and display timely and correct information to the use; also provides information about the reputation of shops
Business to Business
Organizations work together and carrying out transactions such as collaborating on a product, supply chains etc. With today’s web lack of standards for data exchange
Use of semantic web: XML is a big improvement, but need to agree on vocabulary. Future will be the use of ontologies to agree on meanings and interpretations
Semantic Web Technologies: Semantic Web Technologies
Explicit metadata:
Metadata is data about data; Need metadata to be explicitly specified so that different groups and organizations will know what is on the web
Metadata specification languages include XML and RDF
Ontologies
Explicit and formal specification of conceptualization describes a domain of discourse; relationships
Ontology languages include XML, RDF, OWL
Logic
Logic can be used to specify facts as well as rules; New facts and derived from existing facts based on the inference rules
Descriptive Logic is the type of logic that has been developed for semantic web applications
Layered Approach: Tim Berners Lee’s Visionwww.w3c.org: Layered Approach: Tim Berners Lee’s Vision www.w3c.org
What is XML all about?: What is XML all about? XML is needed due to the limitations of HTML and complexities of SGML
It is an extensible markup language specified by the W3C (World Wide Web Consortium)
Designed to make the interchange of structured documents over the Internet easier
Key to XML used to be Document Type Definitions (DTDs)
Defines the role of each element of text in a formal model
XML schemas have now become critical to specify the structure
XML schemas are also XML documents
XML Elements: XML Elements XML Statement
John Smith is a Professor in Texas
This can be expressed as follows:
John Smith
Texas
XML Elements: XML Elements Now suppose this data can be read by anyone
then we can augment the XML statement by an additional element
called access as follows.
John Smith
Texas
All, Read
XML Elements: XML Elements If only HR can update this XML statement, then we have the following:
John Smith
Texas
HR department, Write
XML Elements: XML Elements We may not wish for everyone to know that John Smith is a professor, but we can give out the information that this professor is in Texas.
This can be expressed as:
John Smith, Govt-official, Read
Texas, All, Read
HR department, Write
XML Attributes: XML Attributes Suppose we want to specify to access based on attribute values.
One way to specify such access is given below.
XML DTD: XML DTD DTDs essentially specify the structure of XML documents.
Consider the following DTD for Professor with elements
Name and State.
This will be specified as:
XML Schema: XML Schema While DTDs were the early attempts to specify structure for
XML documents, XML schemas are far more elegant to
specify structures.
Unlike DTDs XML schemas essentially use the XML syntax for
specification.
Consider the following example:
XML Namespaces: XML Namespaces Namespaces are used for DISAMBIGUATION
XML Namespaces: XML Namespaces
Xmlns: CountryX = http://www.CountryX.edu/Instution DTD”
Xmlns: USA = “http://www.USA.edu/Instution DTD”
Xmlns: UK = “http://www.UK.edu/Instution DTD”
Federations/Distribution: Federations/Distribution Site 1 document:
111
John Smith
Texas
Site 2 document:
111
60K
Credentials in XML: Credentials in XML
Alice Brown
University of X
CS
Security
John James
University of X
CS
Senior
Policies in XML: Policies in XML
Explantaion: CS professors are entitled to access all the patents of their department. They are entitled to see only the short descriptions and authors of patents of the EE department
Access Control Strategy: Access Control Strategy Subjects request access to XML documents under two modes: Browsing and authoring
With browsing access subject can read/navigate documents
Authoring access is needed to modify, delete, append documents
Access control module checks the policy based and applies policy specs
Views of the document are created based on credentials and policy specs
In case of conflict, least access privilege rule is enforced
Works for Push/Pull modes
System Architecture for Access Control: System Architecture for Access Control User Pull/Query Push/result XML Documents X-Access X-Admin Admin
Tools Policy
base Credential
base
Third-Party Architecture: Third-Party Architecture Credential
base policy base XML Source User/Subject Owner Publisher Query Reply
document SE-XML credentials The Owner is the producer of information It specifies access control policies
The Publisher is responsible for managing (a portion of) the Owner information and answering subject queries
Goal: Untrusted Publisher with respect to Authenticity and Completeness checking
XML Databases: XML Databases Data is presented as XML documents
Query language: XML-QL
Query optimization
Managing transactions on XML documents
Metadata management: XML schemas/DTDs
Access methods and index strategies
XML security and integrity management
Inference/Privacy Control: Inference/Privacy Control Policies
Ontologies
Rules XML Database XML
Documents
Web Pages,
Databases Inference Engine/
Rules Processor Interface to the Semantic Web Technology
By UTD
Why RDF?: Why RDF? XML cannot be used to specify semantics
Example:
Professor is a subclass of Academic Staff
Professor inherits all properties of Academic Staff
RDF was specified so that the inadequacies of XML could be handled
RDF uses XML Syntax
Additional constructs are needed for RDF
RDF: RDF Resource Description Framework is the essence of the semantic web
Adds semantics with the use of ontologies, XML syntax
RDF Concepts
Basic Model
Resources, Properties and Statements
Container Model
Bag, Sequence and Alternative
RDF Basics: RDF Basics Resource: Everything is a resource
Person, Vehicle, etc.
Property: properties describe relationships between resources
E.g., Invented
Statement: (Object, Property, Value) Triple
Berners Lee invented the Semantic Web
RDF Container Model: RDF Container Model Bag: Unordered container, may contain multiple occurrences
Rdf: Bag
Seq: Ordered container, may contain multiple occurrences
Rdf: Seq
Alt: a set of alternatives
Rdf: Alt
RDF Specification: RDF Specification
Professor
semantic web
RDF Specification: RDF Specification RDF specifications have been given for Attributes, Types Nesting, Containers, etc.
How can security policies be included in the specification
Example: consider the statement “Berners Les is the Author of the book Semantic Web”
Do we allow access to the connection between author and book? Do we allow access to the connection but not to the author name and book name?
RDF Policy Specification: RDF Policy Specification
Professor
Level = L1
semantic web
Level = L2
RDF Schema: RDF Schema Need RDF Schema to specify statements such as professor is a subclass of academic staff
The class of Professors
All professors are Academic Staff Members.
RDF Schema: Security Policies: RDF Schema: Security Policies How can security policies be specified?
The class of Professors
All professors are Academic Staff Members.
Level = L
RDF Axiomatic Semantics: RDF Axiomatic Semantics First order logic to specify formulas and inferencing
Built in functions (First) and predicates (Type)
Modus Ponens
From A and If A then B, deduce B
Example: All containers are Resources
Type(?C, Container) Type(?c, Resource)
If we have Type(A, Container) then we can infer (Type A, Resource)
RDF Inferencing: RDF Inferencing While first order logic provides a proof system, it will be computationally infeasible
As a result horn clause logic was developed for logic programming; this is still computationally expensive
RDF uses If then Rules
IF E contains the triples (?u, rdfs: subClassof, ?v)
and (?v, rdfs: subClassof ?w)
THEN
E also contains the triple (?u, rdfs: subClassOf, ?w)
That is, if u is a subclass of v, and v is a subclass of w, then u is a subclass of w
RDF Query: RDF Query One can query RDF using XML, but this will be very difficult as RDF is much richer than XML
Is there an analogy between say XQuery and a query language for RDF?
RQL – an SQL-like language has been developed for RDF
Select from “RDF document” where some “condition”
Policies in RDF: Policies in RDF How can policies be specified?
Should policies be specified as shown in the examples, extensions to RDF syntax?
Should policies be specified as RDF documents?
Is there an analogy to XPath expressions for RDF policies?
Ontology: Ontology Common definitions for any entity, person or thing
Several ontologies have been defined and available for use
Defining common ontology for an entity is a challenge
Mappings have to be developed for multiple ontologies
Specific languages have been developed for ontologies
Why RDF is not sufficient?: Why RDF is not sufficient? RDF was developed as XML is not sufficient to specify semantics
E.g., class/subclass relationship
RDF has issues also
Cannot express several other properties such as Union, Interaction, relationships, etc
Need a richer language
Ontology languages were developed by the semantic web community for this purpose
Essentially RDF is not sufficient to specify ontologies
Security and Ontology: Security and Ontology Ontologies used to specify security policies
Example: OWL to specify security policies
Choice between XML, RDF, OWL, Rules ML, etc.
Security for Ontologies
Access control on Ontologies
Give access to certain parts of the Ontology
OWL: Background: OWL: Background It’s a language for ontologies and relies on RDF
DARPA (Defense Advanced Research Projects Agency) developed early language DAML (DARPA Agent Markup Language)
Europeans developed OIL (Ontology Interface Language)
DAML+OIL combines both and was the starting point for OWL
OWL was developed by W3C
OWL Features: OWL Features Subclass relationship
Class membership
Equivalence of classes
Classification
Consistency (e.g., x is an instance of A, A is a subclass of B, x is not an instance of B)
Three types of OWL: OWL-Full, OWL-DL, OWL-Lite
Automated tools for managing ontologies
Ontology engineering
OWL Specification (e.g., Classes): OWL Specification (e.g., Classes)
Faculty and Academic Staff Member are the same
Associate Professor is not a professor
Associate professor is not an Assistant professor
OWL Specification (e.g., Property): OWL Specification (e.g., Property) Courses are taught by Academic staff members
OWL Specification (e.g., Property Restriction): OWL Specification (e.g., Property Restriction) All first year courses are taught only by professors
Policies in OWL: Policies in OWL How can policies be specified?
Should policies be specified as shown in the examples, extensions to OWL syntax?
Should policies be specified as OWL documents?
Is there an analogy to XPath expressions for OWL policies?
Policies in OWL: Example: Policies in OWL: Example
Level = L1
Level = L2
Logic and Inference : Logic and Inference First order predicate logic
High level language to express knowledge
Well understood semantics
Logical consequence - inference
Proof systems exist
Sound and complete
OWL is based on a subset of logic – descriptive logic
Why Rules?: Why Rules? RDF is built on XML and OWL is built on RDF
We can express subclass relationships in RDF; additional relationships can be expressed in OWL
However reasoning power is still limited in OWL
Therefore the need for rules and subsequently a markup language for rules so that machines can understand
Example Rules: Example Rules Studies(X,Y), Lives(X,Z), Loc(Y,U), Loc(Z,U) HomeStudent(X)
i.e. if John Studies at UTDallas and John is lives on Campbell Road and the location of Campbell Road and UTDallas are Richardson then John is a Home student
Note that
Person (X) Man(X) or Woman(X) is not a rule in predicate logic
That is if X is a person then X is either a man of a woman. This can be expressed in OWL
However we can have a rule of the form
Person(X) and Not Man(X) Woman(X)
Monotonic Rules: Monotonic Rules Mother(X,Y)
Mother(X,Y) Parent(X,Y)
If Mary is the mother of John, then Mary is the parent of John
Syntax: Facts and Rules
Rule is of the form:
B1, B2, ---- Bn A
That is, if B1, B2, ---Bn hold then A holds
Logic Programming: Logic Programming Deductive logic programming is in general based on deduction
i.e., Deduce data from existing data and rules
e.g., Father of a father is a grandfather, John is the father of Peter and Peter is the father of James and therefore John is the grandfather of James
Inductive logic programming deduces rules from the data
e.g., John is the father of Peter, Peter is the father of James, John is the grandfather of James, James is the father of Robert, Peter is the grandfather of Robert
From the above data, deduce that the father of a father is a grandfather
Popular in Europe and Japan
Nonmonotonic Rules: Nonmonotonic Rules If we have X and NOT X, we do not treat them as inconsistent as in the case of monotonic reasoning.
For example, consider the example of an apartment that is acceptable to John. That is, in general John is prepared to rent an apartment unless the apartment ahs less than two bedrooms, is does not allow pets etc. This can be expressed as follows:
Acceptable(X)
Bedroom(X,Y), Y<2 NOT Acceptable(X)
NOT Pets(X) NOT Acceptable(X)
Note that there could be a contradiction. But with nonmotonic reasoning this is allowed.
Rule Markup: Rule Markup The various components of logic are expressed in the Rule Markup Language – RuleML
Both monotonic and nonmonotnic rules can be represented
Example representation of Fact P(a) - a is a parent
p
a
Policies in RuleML: Policies in RuleML
p
a
Level = L
An Application: Horizontal Information Products at Elsevier: An Application: Horizontal Information Products at Elsevier Elsevier is publishing company based in Amsterdam
E.g., publisher of Computer Standards and Interface Journal that has papers on all kinds of computer related standards
Currently the journals and books are grouped by topics such as say operating systems, databases, etc. (or at a higher level, Biology, Chemistry, etc.)
Where do we then put the journal Computer Standards and Interfaces?
Need horizontal groupings also
Horizontal Information Products at Elsevier: Horizontal Information Products at Elsevier Semantic web technologies are being used by Elsevier
RDF for document representation
RDF for ontologies
Query language based on RDF to query the documents and the ontologies
E.g. Life Science Thesaurus EMTREE
Other publishing companies are following in Elsevier’s direction
Common Threads and Challenges: Common Threads and Challenges Common Threads
Building Ontologies for Semantics
XML for Syntax
Challenges
Scalability, Resolvability
Security policy specification, Securing the documents and ontologies
Developing applications for secure semantic web technologies
Automated tools for ontology management
Creating, maintaining, evolving and querying ontologies