Ontologies & Machine Learning : Ontologies & Machine Learning Marko Grobelnik
Blaz Fortuna
Jozef Stefan Institute, Slovenia
Aim of the talk : Aim of the talk
What areas of research are we trying to target? : What areas of research are we trying to target? Text-Mining, Link-Analysis and other analytic techniques dealing mainly with extracting and aggregating the information from raw data
…they maximize the quality of extracted information
Semantic Web dealing mainly with the integration and representation of the given data
…it maximizes reusability of the given information
Both areas are very much complementary and necessary for operational information engineering
Ontologies : Ontologies
What is an Ontology? : What is an Ontology? Ontologies are main formal objects within Semantic Web and recently also within Text Analytics
Ontologies have origin in philosophy, but within computer science they represent a data model that represents a domain and is used to reason about the objects in that domain and the relations between them
…their main aim is to describe and represent an area of knowledge in a formal way
What is an Ontology? : What is an Ontology? concepts, properties,
relations, functions machine
processable Consensual
knowledge Abstract model of
some domain Formal, explicit specification, of a
shared conceptualisation. Frank.van Harmelen 2003: http://seminars.ijs.si/sekt
Which elements represent an ontology? : Which elements represent an ontology? An ontology typically consists of the following elements:
Instances – the basic or “ground level” objects
Classes – sets, collections, or types of objects
Attributes – properties, features, characteristics, or parameters that objects can have and share
Relations – ways that objects can be related to one another
Analogies between ontologies and relational databases:
Instances correspond to records
Classes correspond to tables
Attributes correspond to record fields
Relations correspond to relations between the tables
Levels Semantic-Web formalisms : Levels Semantic-Web formalisms The W3C “Semantic Web Layer Cake” shows representation levels and related technologies Addressing
the information Character Level Encoding Different Levels of
Semantic Abstraction Infrastructure Higher level of representation
and reasoning (OWL) (RIF)
Top-down modeling of knowledgeCyc system : Top-down modeling of knowledge Cyc system
Cyc …a little bit of historical context : Cyc …a little bit of historical context Older AI-ers know about Cyc:
…one of the boldest attempts in AI history to encode common sense knowledge in one KB
The project started in 1984 at Stanford as US response to Japan’s project on “5th Generation Computer Systems”
In 1994 the company Cycorp was established (in Austin, TX)
In 2005 Cyc KB gets opened and available for research
OpenCyc (http://www.opencyc.org/)
ResearchCyc (http://research.cyc.com/)
In 2006 Cyc-Europe was established (in Ljubljana, Slovenia)
Till 2006 ~$80M was spent for construction of the KB
The Cyc Ontology : Cycorp © 2006 The Cyc Ontology General Knowledge about Various Domains
Specific data, facts, and observations
…part of Cyc Ontology on Human Beings : …part of Cyc Ontology on Human Beings
Structure of Cyc Ontology : Structure of Cyc Ontology
Upper
Ontology Core Theories Domain-Specific Theories Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Domain-Specific Theories Facts: Instances Knowledge
Base
Layers
Structure of Cyc Ontology : Structure of Cyc Ontology
Upper
Ontology Core Theories Domain-Specific Theories EVENT TEMPORAL-THING INDIVIDUAL THING Upper Ontology: Abstract Concepts Knowledge
Base
Layers
Structure of Cyc Ontology : Structure of Cyc Ontology
Upper
Ontology Core Theories Domain-Specific Theories EVENT TEMPORAL-THING INDIVIDUAL THING For all events a and b, a causes b implies a precedes b Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Knowledge
Base
Layers
Structure of Cyc Ontology : Structure of Cyc Ontology
Upper
Ontology Core Theories Domain-Specific Theories EVENT TEMPORAL-THING INDIVIDUAL THING For all events a and b, a causes b implies a precedes b For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Domain-Specific Theories Knowledge
Base
Layers
Structure of Cyc Ontology :
EVENT TEMPORAL-THING INDIVIDUAL THING For all events a and b, a causes b implies a precedes b For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. John is a person infected by anthrax. Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Domain-Specific Theories Facts: Instances Knowledge
Base
Layers Structure of Cyc Ontology
Cyc KB Extended w/Domain Knowledge : Cycorp © 2006 General Knowledge about Terrorism
Specific data, facts, and observations
about terrorist groups and activities Cyc KB Extended w/Domain Knowledge
Cyc KB Extended w/Domain Knowledge : Cycorp © 2006 General Knowledge about Terrorism
Specific data, facts, and observations
about terrorist groups and activities Cyc KB Extended w/Domain Knowledge
An example of Psychoanalyst’s Cyc taxonomic context : An example of Psychoanalyst’s Cyc taxonomic context #$Psychoanalyst (lexical representation: “psychoanalyst”, “psychoanalysts”)
specialization-of #$MedicalCareProfessional
| specialization-of #$HealthProfessional
| specialization-of #$Professional-Adult
| specialization-of #$Professional
specialization-of #$Psychologist
| specialization-of #$Scientist
| specialization-of #$Researcher
| | specialization-of #$PersonWithOccupation
| | | specialization-of #$Person
| | | | specialization-of #$HomoSapiens
| | | | | instance-of #$BiologicalSpecies
| | | | | | specialization-of #$BiologicalTaxon
| | | | | instance-of #$SomeSampleKindsOfMammal-Biology-Topic
Example Vocabulary: Senses of ‘In’ relation (1/3) : Cycorp © 2006 Example Vocabulary: Senses of ‘In’ relation (1/3) Can the inner object leave by passing between members of the outer group?
Yes -- Try #$in-Among
Example Vocabulary: Senses of ‘In’ relation (2/3) : Cycorp © 2006 Example Vocabulary: Senses of ‘In’ relation (2/3) Does part of the inner object stick out of the container?
None of it. -- Try
#$in-ContCompletely
Yes -- Try
#$in-ContPartially
No -- Try
#$in-ContClosed If the container were turned around could the contained object fall out?
Yes -- Try
#$in-ContOpen
Example Vocabulary: Senses of ‘In’ relation (3/3) : Cycorp © 2006 Example Vocabulary: Senses of ‘In’ relation (3/3) Can it be removed by pulling, if enough force is used, without damaging either object?
No -- Try #$in-Snugly or #$screwedIn
Slide24 : Text query Query (semi) automatically
translated in the
First Order Logic Answers to the query Cyc’s front-end: “Cyc Analytic Environment” – querying (1/2)
Slide25 : Query & Answer Justification Sources for
Reasoning and
Justification Cyc’s front-end: “Cyc Analytic Environment” – justification (2/2)
Slide26 : Document Tagging
Document Tagging : Document Tagging … Document Tagging
Annotating the document with CycKB : Annotating the document with CycKB
Probabilistic Concept Tagging : Probabilistic Concept Tagging “The plants that produced the cranes that NASA deployed in space in the 1990s are in Canada.”
The plants (#$FactoryBuildingComplex 0.8817 #$Plant 0.0967) that produced (#$Production-Generic 0.6017) the cranes (#$Crane-MotorizedDevice 0.9387 #$Crane-Bird 0.0408) that NASA (#$NASA) deployed (#$DeployingMaterial) in space (#$OuterSpace 0.51 #$SpaceInAHOC 0.1473 #$ReservedSpaceRegion 0.0459 #$Area 0) in the 1990s ((#$DecadeFn 199)) are in Canada (#$Canada 1).
Slide30 : Knowledge Template Induction
Slide32 : Learning Facts by Search
Learning Facts by Search : Query
“What are symptoms of Whooping Cough?”
(symptomOfAilment WhoopingCough ?SYMP ) “A symptom of whooping cough is ___”
“Whooping cough can cause ___”
“A symptom of Pertussis Bordetella is ___”
“Symptoms (such as ____) of whooping cough” Partial English sentences Learning Facts by Search
Parsing Results : Parsing Results “… symptoms of pertussis such as fever and a dry cough …” Looking for something that matches the argument constraints on the predicate… (symptomOfAilment WhoopingCough Fever)
(symptomOfAilment WhoopingCough Coughing-AilmentCondition) Parse back into existing CycL concepts
KB Consistency Check : KB Consistency Check Throw out provably wrong answers
Explicitly: perform one step of inference to throw out facts inconsistent with KB
Implicitly: don’t even look at things that don’t match argument constraints
Skip already known (provably right) knowledge
Initial Results : Initial Results 348 Queries 817 Searches 1016
Sentences
Found 388
Sentences
Rejected 3474 Searches
566 Rejected 61 Sentences
Asserted Total Queries: 348
Web Searches: 4290
Initial: 817
verification: 3474
Sentences Found: 1014
Rejected Results: 954
inconsistent with KB: 4
already known to the KB: 384
rejected using Google: 566
Novel formulæ: 61
Slide37 : Microtheory (context) Suggestion
Automatic Ontology Placement : Automatic Ontology Placement Cyc’s knowledge is contextualized into internally consistent Microtheories (MTs). New knowledge is inserted into that hierarchy manually by ontologists. An Mt Suggestor recommends appropriate placement of knowledge into the appropriate micro-theories (contexts)
MT Suggestor Approach : MT Suggestor Approach Problem is similar to hierarchical text classification
Much less data per instance
Very rich (deep) structure
Approaches:
Generative Bayes model
Multiclass SVM classification Inputs:
Each assertion is broken into atomic terms
Each unique term is given an index
Each assertion is a list of term indices (as few as 3 for binary assertions, as many as 180 for complex rules)
Training examples are indexed identically
SVM Classification: outputs the index of the best Mt
Bayes model: outputs probability of fit for each Mt
Results : 89,000 Assertions, 64,000 distinct terms, 28 Mts
10 fold-cross validation Precision Recall F1 Score Results
Slide41 : Induction of new rules with ILP
LearningHigher-Order Knowledge : Learning Rules with Inductive Logic Programming
Integrating ALEPH ILP system into Cyc
Verification (asking or experimenting)
Asking a human directly
Natural language processing of text
Probabilistic analysis All the mothers I know about are female… Maybe all mothers are female? Learning Higher-Order Knowledge
Performing Induction in Cyc : Facts & Background First-orderized Facts Perform
Induction Integrate Cyc and Aleph
FOL-ify CycL and export to Aleph
Produce ILP learning bias from background knowledge
Based on semantic content of predicate knowledge
CycL-ify, review, and assert ILP-produced rules Performing Induction in Cyc Background Knowledge
Induced Rules Evaluate Results Good Rules
Sample Rules Produced : Sample Rules Produced (implies
(and
(cyclistPrimaryProject ?KE ?PROJECT)
(projectTasks ?PROJECT ?TASK)
(requestedEffortPercent ?TASK ?KE ?X))
(assignedEffortPercent ?TASK ?KE ?X))
(implies
(and
(projectManagers ?PROJECT ?AGENT))
(projectParticipants ?PROJECT ?AGENT))
(implies
(and
(primarySupervisor ?AGENT AGENT-1)
(requestedEffortPercent ?TASK ?AGENT ?X)
(projectManagers ?PROJECT ?AGENT-1)
(projectTasks ?PROJECT ?TASK))
(assignedEffortPercent ?TASK ?AGENT ?X))
Sample Rules Produced : Sample Rules Produced If someone’s time has been requested for a task by that person’s primary project, the time will be assigned. People participate in the projects they manage. (One hopes!) People are assigned to tasks requested of them by projects managed by that person’s direct supervisor. These are only patterns, not always guaranteed to be true – but they’re useful and common-sensical.
Bottom-up modeling of knowledgeOntoGen system : Bottom-up modeling of knowledge OntoGen system
Underlying concepts : Underlying concepts
Main Features : Main Features
Ontology management : Ontology management Concept hierarchy List of suggested sub-concepts Ontology visualization Selected concept
Concept management : Concept management Concept’s details Concept’s instance management Selected concept Keywords Selected instance
Active Learning for concept learning : Active Learning for concept learning SVM hyperplane distance based active learning algorithm
First few labelled documents are bootstrapped from a query search
Instances for final concept are selected using the final SVM model
Query New Concept SVM
Multiple views of the same data : Reuters news articles used in the upper example with two different sets of categories: topics or list of countries that appear in the news articles.
Each set of categories offers a different view on the data.
SVM based method detects importance of keywords for each view.
Multiple views of the same data
Concept’s instances visualization : Instances are visualized as points on 2D map. The distance between two instances on the map correspond to their similarity.
Characteristic keywords are shown for all parts of the map.
User can select groups of instances on the map to create sub-concepts. Concept’s instances visualization
Ontology population : New documents Classification of selected document Selected document Ontology population System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts.
Users can finalize the classifications using an interactive user interface