Ontologies and the Grid : Ontologies and the Grid Professor Carole Goble
University of Manchester UK
carole@cs.man.ac.uk Professor Nigel Shadbolt
University of Southampton UK
nrs@ecs.soton.ac.uk
Acknowledgements : Acknowledgements We are grateful for material from the following collaborators:
Alexander Maedche, Steffen Staab, University of Karlsruhe *.
Natasha Noy Friedman, Deborah McGuinness, Stanford.
Robert Meersman, Vrije University of Brussels.
Mike Uschold, Boeing Corp.
Dieter Fensel, Vrije University of Amsterdam.
Terry Payne, Katia Sycara, CMU.
Asun Gomez-Perez, University of Madrid.
Judith Blake, The Jackson Laboratory.
The AKT team, Univeristy of Southampton.
Clive embrey, Paul Smart, Epistemics Ltd
Bertram Ludäscher, San Diego Super Computing Centre.
Alan Rector, Ian Horrocks, Chris Wroe, Angus Roberts, Sean Bechhofer, Norman Paton, Jeremy Rogers, University of Manchester.
The myGrid team.
* Especially grateful to Alex and Steffen.
Roadmap : Roadmap What is an ontology?
How should I represent them?
What are they used for?
Do I have any?
How do I get one?
Methodologies & Communities
Ontology lifecycle management
What are the issues?
Where do I go for more information? Tools threaded throughout
Part I What is an Ontology? : Part I What is an Ontology? Definitions
Examples
Issues
Ontology : Ontology Semantics – the meaning of meaning.
Philosophical discipline, branch of philosophy that deals with the nature and the organisation of reality.
Science of Being (Aristotle, Metaphysics, IV,1)
What is being?
What are the features common to all beings?
Slide6 : The art of ranking things in genera and species is of no small importance and very much assists our judgment as well as our memory. You know how much it matters in botany, not to mention animals and other substances, or again moral and notional entities as some call them. Order largely depends on it, and many good authors write in such a way that their whole account could be divided and subdivided according to a procedure related to genera and species. This helps one not merely to retain things, but also to find them. And those who have laid out all sorts of notions under certain headings or categories have done something very useful.
Gottfried Wilhelm Leibniz, New Essays on Human Understanding
In computer science … : In computer science … An ontology is an explicit specification of a conceptualization [Gruber93]
An ontology is a shared understanding of some domain of interest. [Uschold, Gruninger96]
There are many definitions
a formal specification EXECUTABLE
of a conceptualization of a domain COMMUNITY
of some part of world that is of interest APPLICATION
Defines
A common vocabulary of terms
Some specification of the meaning of the terms
A shared understanding for people and machines
Why develop an ontology? : Why develop an ontology? To make domain assumptions explicit
Easier to change domain assumptions
Easier to understand and update legacy data
To separate domain knowledge from operational knowledge
Re-use domain and operational knowledge separately
A community reference for applications
To share a consistent understanding of what information means.
Ontologies: made for sharing : Ontologies: made for sharing Interoperating resources, be it by people or systems, requires a consistent shared understanding of what the information contained means
“... people [and machines] can’t share knowledge if they don’t speak a common language” [Davenport] Disparate modeling paradigms, languages and software tools limit
=> Interoperability
=> Knowledge sharing & reuse
Sharing info Sharing meaning : Sharing info Sharing meaning Metadata
Data describing the content and meaning of resources and services.
But everyone must speak the same language…
Terminologies
Shared and common vocabularies
For search engines, agents, curators, authors and users
But everyone must mean the same thing… Ontologies
Shared and common understanding of a domain
Essential for search, exchange and discovery
Origin and History : Origin and History Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is only indirect possible. We do it by creating concepts that refer to things.
The relation between symbols and things has been described in the form of the meaning triangle:
Human and machine communication : Human and machine communication ... Machine
Agent 1 Things Human
Agent 2 Ontology Description Machine
Agent 2 exchange symbol,
e.g. via nat. language ‘‘JAGUAR“ Internal
models Concept Formal
models exchange symbol,
e.g. via protocols MA1 HA1 HA2 MA2 Symbol commit commit a specific
domain, e.g.
animals commit commit Ontology Formal Semantics Human
Agent 1 Meaning
Triangle [Maedche et al., 2002]
Human and machine communication : Human and machine communication
An explicit description of a domain : An explicit description of a domain Concepts (class, set, type, predicate)
event, gene, gammaBurst, atrium, molecule, cat
Properties of concepts and relationships between them (slot)
Taxonomy: generalisation ordering among concepts isA, partOf, subProcess
Relationship, Role or Attribute: functionOf, hasActivity location, eats, size
Concepts : Concepts Primitive concepts:
properties are necessary
Globular protein must have hydrophobic core, but a protein with a hydrophobic core need not be a globular protein
Defined concepts:
properties are necessary + sufficient
Eukaryotic cells must have a nucleus. Every cell that contains a nucleus must be Eukaryotic.
What is a concept? : What is a concept? Different communities have different notions on what a concept means:
Formal concept analysis (see http://www.math.tu-dresden.de/~ganter/fba.html) talk about formal concepts
Description Logics (see http://dl.kr.org/): They talk about concept labels
ISO-704:2000 – Terminology Work: (see http://www.iso.ch/)
Often the classical notion of a frame in AI or a class in OO modeling is seen as equivalent to a concept.
An explicit description of a domain : An explicit description of a domain Constraints or axioms on properties and concepts:
value: integer
domain: cat
cardinality: at most 1
range: 0 <= X <= 100
oligonucleiotides < 20 base pairs
cows are larger than dogs
cats cannot eat only vegetation
cats and dogs are disjoint
Values or concrete domains
integer, strings
20, trypotoplan-synthetase
An explicit description of a domain : An explicit description of a domain Individuals or Instances
sulphur, trpA Gene, felix
Nominals
Concepts that cannot have instances
Instances that are used in conceptual definitions
ItalianDog = Dog bornIn Italy
Instances
An ontology = concepts+properties+axioms+values+nominals
A knowledge base = ontology+instances
Light and Heavy expressivity : Light and Heavy expressivity Lightweight
Concepts, atomic types
Is-a hierarchy
Relationships between concepts Heavyweight
Metaclasses
Type constraints on relations
Cardinality constraints
Taxonomy of relations
Reified statements
Axioms
Semantic entailments
Expressiveness
Inference systems A matter of rigour and representational expressivity
So what is an ontology? : So what is an ontology? Catalog/
ID Thesauri Terms/
glossary Informal
Is-a Formal
Is-a Formal
instance Frames
(properties) General
Logical
constraints Value
restrictions Disjointness,
Inverse, partof Gene Ontology Mouse Anatomy EcoCyc PharmGKB TAMBIS Arom [Deborah McGuinness, Stanford]
A semantic continuum : A semantic continuum [Mike Uschold, Boeing Corp] Shared human consensus Text descriptions Semantics hardwired;
used at runtime Semantics processed and used at runtime Pump: “a device for moving a gas or liquid from one place or container to another” (pump has
(superclasses (…)) Implicit Informal
(explicit) Formal
(for humans) Formal
(for machines) Further to the right means:
Less ambiguity
More likely to have correct functionality
Better inter-operation Less hardwiring
More robust to change
More difficult
EcoCyc : EcoCyc
Gene Ontology http://www.geneontology.org : Gene Ontology http://www.geneontology.org “a dynamic controlled vocabulary that can be applied to all eukaryotes”
Built by the community for the community.
Three organising principles:
Molecular function, Biological process, Cellular component
Isa and Part of taxonomy – but not good!
~10,000 concepts
Lightweight ontology, Poor semantic rigour. Ok when small and used for annotation. Obstacle when large, evolving and used for mining.
Controlled vocabulary : Controlled vocabulary AGROVOC: Agricultural Vocabulary
Thesauri : Thesauri AAT: Art & Architecture Thesaurus
Thesauri & Classification : Thesauri & Classification UNSPSC: Product Classification
A comprehensive list is provided at
http://www.lub.lu.se/metadata/subject-help.html
Thesauri act as a good starting point for developing an ontology
UMLS (Unified Medical Language System) http://umlsks.nlm.nih.gov/ : UMLS (Unified Medical Language System) http://umlsks.nlm.nih.gov/ National Library of Medicine (NLM) database of medical terminology. Terms from several medical databases (MEDLINE, SNOMED International, Read Codes, etc.) are unified so that different terms are identified as the same medical concept.
Metathesaurus provides the concordance of medical concepts: 730.000 concepts, 1.5 million concept names in different source vocabularies
Specialist lexicon provides word synonyms, derivations, lexical variants, and grammatical forms of words used in MetaThesaurus terms: 130,000 entries.
Semantic Network codifies the relationships (e.g. causality, "is a", etc.) among medical terms: 134 semantic types, 54 relationships.
KA2 Ontology : KA2 Ontology Ontology that models the knowledge acquisition community (its researchers, topics, products, etc.)
Small, application specific ontology:
73 concepts
124 relations
50 rules
Available at:
http://www.aifb.uni-karlsruhe.de/WBS/broker/ka-onto.onto
Application: Semantic Community Web Portals: http://ka2portal.aifb.uni-karlsruhe.de
Successor ontology: SWRC/OntoWeb community ontology [Staab et al., 00]
[Decker et al, 98]
The KA ontology : The KA ontology
Web-KB project at CMU : Web-KB project at CMU http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ [Craven et al, 98]
Meta-Ontologies : Meta-Ontologies Meta-ontologies describ metadata about ontologies and their associated elements.
Examples:
Interoperability issues between two ontologies, e.g. Semantic Translation Ontology or RDFT Ontology
Capturing changes supporting ontology evolution using an evolution ontology
Taxonomy remark 1 : Taxonomy remark 1 The world is not a tree, it’s a lattice animal rodent cow cat mouse dog domestic vermin wild pet working
Taxonomy remark 2 : Taxonomy remark 2 What does the taxonomy mean?
Concept A is a parent of concept B iff every instance of B is also an instance of A
Superset/subset
ICONCLASS
Classification trickiness : Classification trickiness "On those remote pages it is written that animals are divided into:
a. those that belong to the Emperor
b. embalmed ones
c. those that are trained
d. suckling pigs
e. mermaids
f. fabulous ones
g. stray dogs
h. those that are included in this classification
i. those that tremble as if they were mad
j. innumerable ones
k. those drawn with a very fine camel's hair brush
l. others
m. those that have just broken a flower vase
n. those that resemble flies from a distance" The Celestial Emporium of Benevolent Knowledge, Borges
Classification is task and culture specific : Classification is task and culture specific Dyirbal classification of objects in the universe,
Bayi: men, kangaroos, possums, bats, most snakes, most fishes, some birds, most insects, the moon, storms, rainbows, boomerangs, some spears, etc.
Balan: women, anything connected with water or fire, bandicoots, dogs, platypus, echidna, some snakes, some fishes, most birds, fireflies, scorpions, crickets, the stars, shields, some spears, some trees, etc.
Balam: all edible fruit and the plants that bear them, tubers, ferns, honey, cigarettes, wine, cake.
Bala: parts of the body, meat, bees, wind, yamsticks, some spears, most trees, grass, mud, stones, noises, language, etc.
Ontology desiderata : Ontology desiderata Precision
formal, unambiguous
high fidelity Systematic
control, quality, clarity Explicitness
clarity, commitment,
reuse Flexibility
expressivity, evolution
Ontology description space : Ontology description space Coverage
upper, domain general, domain specific Expressivity
taxonomy, relationships, axioms Knowledge representation languages and models
words, OO, frames, logics Inference mechanisms
classification, coherency
Coverage : Coverage top-level upper ontology task & problem-solving ontology application ontology domain ontology Grid service ontology [Guarino, 98] describe very general concepts like space, time, event, which are independent of a particular problem or domain. It seems reasonable to have unified top-level ontologies for large communities of users. describe the vocabulary related to a generic domain by specializing the concepts introduced in the top-level ontology. describe the vocabulary related to a generic task or activity by specializing the top-level ontologies. the most specific ontologies. Concepts in application ontologies often correspond to roles played by domain entities while performing a certain activity.
Specific ontologies : Specific ontologies Domain-oriented
Domain-specific
Medicine => cardiology => rhythm disorders
E. coli,
Domain generalizations
components, organs, documents, gene function
Task-oriented
task specific
configuration design, instruction, planning, annotation analysis
task generalisations
problem solving methods
e.g. UPML http://www.ibrow.org/
Upper Ontologies : Upper Ontologies Top Level ontologies
WordNet
EuroWordNet
CyC
SENSUS
Sowa Top Level
GUM
Etc…
A.k.a. core, generic or reference
Common high level concepts
“Physical”, “Abstract”, “Structure”, “Substance”
Useful for ontology re-use
Important when generating or analysing natural language expressions
Example upper ontologies : Example upper ontologies Sowa’s upper ontology http://www.bestweb.net/~sowa/ontology
Example upper ontologies : Example upper ontologies Generalised Upper Model 2.0 http://www.darmstadt.gmd.de/publish/komet/gen-um/newUM.html
WordNet (Miller et al.) : WordNet (Miller et al.) http://www.cogsci.princeton.edu/~wn/
WordNet : WordNet
Problems with current lexicons : Problems with current lexicons In WordNet: clear that news_item is-a item
Maybe acceptable that news_item is-a part
But what of news_item is-a relation !?
depends on context, role played…
But: “role” and “context” knowledge is missing
Also: some lexicographer’s bias is present
CYC (Lenat & Guha) : CYC (Lenat & Guha) © CYCORP, Inc. http://www.cyc.com/
DAML-S http://www.daml.org : DAML-S http://www.daml.org US DARPA Agent Markup Language – Services
An upper ontology for Services
Multi-Classification & Multi-Perspective : Multi-Classification & Multi-Perspective phrase-based classification
ID =GO:0005469 (decommissioned concept)
succinate (cytosol) to fumarate (mitochondrial) transporter is a kind of transporter
but it should also classified on the basis of its…
location in the mitochondrial membrane
orientation of the transporter
molecules transported
relationships to biological processes e.g. metabolism
Need to express these things and get the multi-axial classification sorted
Pre-enumeration vs Post-coordination : Pre-enumeration vs Post-coordination Pre-enumeration – an attempt to identify and organise all the concepts pre-hoc
Enumerating the noun phrases of the English language
Thesauri, object models
Post-coordination – controlled combination of terms when needed
A vocabulary and a grammar.
The International Statistical Classification of Diseases and Related Health Problems, 10th revision : The International Statistical Classification of Diseases and Related Health Problems, 10th revision
The exploding bicycle : The exploding bicycle ICD-9 (E826) 8
READ-2 (T30..) 81
READ-3 87
ICD-10 (V10-19) 587
V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income
W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity
X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
Defusing the exploding bicycle:500 codes in pieces : Defusing the exploding bicycle: 500 codes in pieces 10 things to hit…
Pedestrian / cycle / motorbike / car / HGV / train / unpowered vehicle / a tree / other
5 roles for the injured…
Driving / passenger / cyclist / getting in / other
5 activities when injured…
resting / at work / sporting / at leisure / other
2 contexts…
In traffic / not in traffic
V12.24 Pedal cyclist injured in collision with two- or three-wheeled motor vehicle, unspecified pedal cyclist, nontraffic accident, while resting, sleeping, eating or engaging in other vital activities
Goodbye to picking lists… : Goodbye to picking lists… What you hit
Your Role
Activity
Location Cycling Accident
Coordination: Conceptual Lego : Coordination: Conceptual Lego acute chronic ischaemic deletion bacterial polymorphism cell protein gene expression
Conceptual Lego : Conceptual Lego “SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…” “Hand which is anatomically normal”
FAQ : FAQ Whats the difference between a database schema and an ontology?
Is there only one ontology?
Is development one off?
Do I need to first get the ontology right before I use it?
How do I represent an ontology?
Current ontology standardization initiatives : Current ontology standardization initiatives SUO (SUO consortium proposal) http://suo.ieee.org/
Global WordNet Consortium
ISO SC4
eCommerce standards (UCEC, ebXML,…)
Cultural repositories standards (Harmony, CIDOC)
CEN/ISSS EC WG (MULECO)
DAML (especially DAML-S) http://www.daml.org/
W3C Web Ontology Working Group
http://www.w3.org/2001/sw/WebOnt/
Projects
OntoWeb http://www.ontoweb.org/
WonderWeb http://wonderweb.semanticweb.org/
Further Reading : Further Reading