Share PowerPoint. Anywhere!

mmdss07 grobelnik oml 01

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Views: 7
Like it  ( Likes) Dislike it  ( Dislikes)
Added: March 18, 2008 This presentation is Public
Presentation Category :Entertainment
Tags Add Tags
Presentation StatisticsNew!
Views on authorSTREAM: 7
Presentation Transcript

Ontologies & Machine Learning : Ontologies & Machine Learning Marko Grobelnik Blaz Fortuna Jozef Stefan Institute, Slovenia


Aim of the talk : Aim of the talk


What areas of research are we trying to target? : What areas of research are we trying to target? Text-Mining, Link-Analysis and other analytic techniques dealing mainly with extracting and aggregating the information from raw data …they maximize the quality of extracted information Semantic Web dealing mainly with the integration and representation of the given data …it maximizes reusability of the given information Both areas are very much complementary and necessary for operational information engineering


Ontologies : Ontologies


What is an Ontology? : What is an Ontology? Ontologies are main formal objects within Semantic Web and recently also within Text Analytics Ontologies have origin in philosophy, but within computer science they represent a data model that represents a domain and is used to reason about the objects in that domain and the relations between them …their main aim is to describe and represent an area of knowledge in a formal way


What is an Ontology? : What is an Ontology? concepts, properties, relations, functions machine processable Consensual knowledge Abstract model of some domain Formal, explicit specification, of a shared conceptualisation. Frank.van Harmelen 2003: http://seminars.ijs.si/sekt


Which elements represent an ontology? : Which elements represent an ontology? An ontology typically consists of the following elements: Instances – the basic or “ground level” objects Classes – sets, collections, or types of objects Attributes – properties, features, characteristics, or parameters that objects can have and share Relations – ways that objects can be related to one another Analogies between ontologies and relational databases: Instances correspond to records Classes correspond to tables Attributes correspond to record fields Relations correspond to relations between the tables


Levels Semantic-Web formalisms : Levels Semantic-Web formalisms The W3C “Semantic Web Layer Cake” shows representation levels and related technologies Addressing the information Character Level Encoding Different Levels of Semantic Abstraction Infrastructure Higher level of representation and reasoning (OWL) (RIF)


Top-down modeling of knowledge Cyc system : Top-down modeling of knowledge Cyc system


Cyc …a little bit of historical context : Cyc …a little bit of historical context Older AI-ers know about Cyc: …one of the boldest attempts in AI history to encode common sense knowledge in one KB The project started in 1984 at Stanford as US response to Japan’s project on “5th Generation Computer Systems” In 1994 the company Cycorp was established (in Austin, TX) In 2005 Cyc KB gets opened and available for research OpenCyc (http://www.opencyc.org/) ResearchCyc (http://research.cyc.com/) In 2006 Cyc-Europe was established (in Ljubljana, Slovenia) Till 2006 ~$80M was spent for construction of the KB


The Cyc Ontology : Cycorp © 2006 The Cyc Ontology General Knowledge about Various Domains Specific data, facts, and observations


…part of Cyc Ontology on Human Beings : …part of Cyc Ontology on Human Beings


Structure of Cyc Ontology : Structure of Cyc Ontology Upper Ontology Core Theories Domain-Specific Theories Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Domain-Specific Theories Facts: Instances Knowledge Base Layers


Structure of Cyc Ontology : Structure of Cyc Ontology Upper Ontology Core Theories Domain-Specific Theories EVENT  TEMPORAL-THING  INDIVIDUAL  THING Upper Ontology: Abstract Concepts Knowledge Base Layers


Structure of Cyc Ontology : Structure of Cyc Ontology Upper Ontology Core Theories Domain-Specific Theories EVENT  TEMPORAL-THING  INDIVIDUAL  THING For all events a and b, a causes b implies a precedes b Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Knowledge Base Layers


Structure of Cyc Ontology : Structure of Cyc Ontology Upper Ontology Core Theories Domain-Specific Theories EVENT  TEMPORAL-THING  INDIVIDUAL  THING For all events a and b, a causes b implies a precedes b For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Domain-Specific Theories Knowledge Base Layers


Structure of Cyc Ontology : EVENT  TEMPORAL-THING  INDIVIDUAL  THING For all events a and b, a causes b implies a precedes b For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. John is a person infected by anthrax. Upper Ontology: Abstract Concepts Core Theories: Space, Time, Causality, … Domain-Specific Theories Facts: Instances Knowledge Base Layers Structure of Cyc Ontology


Cyc KB Extended w/Domain Knowledge : Cycorp © 2006 General Knowledge about Terrorism Specific data, facts, and observations about terrorist groups and activities Cyc KB Extended w/Domain Knowledge


Cyc KB Extended w/Domain Knowledge : Cycorp © 2006 General Knowledge about Terrorism Specific data, facts, and observations about terrorist groups and activities Cyc KB Extended w/Domain Knowledge


An example of Psychoanalyst’s Cyc taxonomic context : An example of Psychoanalyst’s Cyc taxonomic context #$Psychoanalyst (lexical representation: “psychoanalyst”, “psychoanalysts”) specialization-of #$MedicalCareProfessional | specialization-of #$HealthProfessional | specialization-of #$Professional-Adult | specialization-of #$Professional specialization-of #$Psychologist | specialization-of #$Scientist | specialization-of #$Researcher | | specialization-of #$PersonWithOccupation | | | specialization-of #$Person | | | | specialization-of #$HomoSapiens | | | | | instance-of #$BiologicalSpecies | | | | | | specialization-of #$BiologicalTaxon | | | | | instance-of #$SomeSampleKindsOfMammal-Biology-Topic


Example Vocabulary: Senses of ‘In’ relation (1/3) : Cycorp © 2006 Example Vocabulary: Senses of ‘In’ relation (1/3) Can the inner object leave by passing between members of the outer group? Yes -- Try #$in-Among


Example Vocabulary: Senses of ‘In’ relation (2/3) : Cycorp © 2006 Example Vocabulary: Senses of ‘In’ relation (2/3) Does part of the inner object stick out of the container? None of it. -- Try #$in-ContCompletely Yes -- Try #$in-ContPartially No -- Try #$in-ContClosed If the container were turned around could the contained object fall out? Yes -- Try #$in-ContOpen


Example Vocabulary: Senses of ‘In’ relation (3/3) : Cycorp © 2006 Example Vocabulary: Senses of ‘In’ relation (3/3) Can it be removed by pulling, if enough force is used, without damaging either object? No -- Try #$in-Snugly or #$screwedIn


Slide24 : Text query Query (semi) automatically translated in the First Order Logic Answers to the query Cyc’s front-end: “Cyc Analytic Environment” – querying (1/2)


Slide25 : Query & Answer Justification Sources for Reasoning and Justification Cyc’s front-end: “Cyc Analytic Environment” – justification (2/2)


Slide26 : Document Tagging


Document Tagging : Document Tagging … Document Tagging


Annotating the document with CycKB : Annotating the document with CycKB


Probabilistic Concept Tagging : Probabilistic Concept Tagging “The plants that produced the cranes that NASA deployed in space in the 1990s are in Canada.” The plants (#$FactoryBuildingComplex 0.8817 #$Plant 0.0967) that produced (#$Production-Generic 0.6017) the cranes (#$Crane-MotorizedDevice 0.9387 #$Crane-Bird 0.0408) that NASA (#$NASA) deployed (#$DeployingMaterial) in space (#$OuterSpace 0.51 #$SpaceInAHOC 0.1473 #$ReservedSpaceRegion 0.0459 #$Area 0) in the 1990s ((#$DecadeFn 199)) are in Canada (#$Canada 1).


Slide30 : Knowledge Template Induction


Slide32 : Learning Facts by Search


Learning Facts by Search : Query “What are symptoms of Whooping Cough?”  (symptomOfAilment WhoopingCough ?SYMP ) “A symptom of whooping cough is ___” “Whooping cough can cause ___” “A symptom of Pertussis Bordetella is ___” “Symptoms (such as ____) of whooping cough” Partial English sentences Learning Facts by Search


Parsing Results : Parsing Results “… symptoms of pertussis such as fever and a dry cough …” Looking for something that matches the argument constraints on the predicate… (symptomOfAilment WhoopingCough Fever) (symptomOfAilment WhoopingCough Coughing-AilmentCondition) Parse back into existing CycL concepts


KB Consistency Check : KB Consistency Check Throw out provably wrong answers Explicitly: perform one step of inference to throw out facts inconsistent with KB Implicitly: don’t even look at things that don’t match argument constraints Skip already known (provably right) knowledge  


Initial Results : Initial Results 348 Queries 817 Searches 1016 Sentences Found 388 Sentences Rejected 3474 Searches 566 Rejected 61 Sentences Asserted Total Queries: 348 Web Searches: 4290 Initial: 817 verification: 3474 Sentences Found: 1014 Rejected Results: 954 inconsistent with KB: 4 already known to the KB: 384 rejected using Google: 566 Novel formulæ: 61


Slide37 : Microtheory (context) Suggestion


Automatic Ontology Placement : Automatic Ontology Placement Cyc’s knowledge is contextualized into internally consistent Microtheories (MTs). New knowledge is inserted into that hierarchy manually by ontologists. An Mt Suggestor recommends appropriate placement of knowledge into the appropriate micro-theories (contexts)


MT Suggestor Approach : MT Suggestor Approach Problem is similar to hierarchical text classification Much less data per instance Very rich (deep) structure Approaches: Generative Bayes model Multiclass SVM classification Inputs: Each assertion is broken into atomic terms Each unique term is given an index Each assertion is a list of term indices (as few as 3 for binary assertions, as many as 180 for complex rules) Training examples are indexed identically SVM Classification: outputs the index of the best Mt Bayes model: outputs probability of fit for each Mt


Results : 89,000 Assertions, 64,000 distinct terms, 28 Mts 10 fold-cross validation Precision Recall F1 Score Results


Slide41 : Induction of new rules with ILP


Learning Higher-Order Knowledge : Learning Rules with Inductive Logic Programming Integrating ALEPH ILP system into Cyc Verification (asking or experimenting) Asking a human directly Natural language processing of text Probabilistic analysis All the mothers I know about are female… Maybe all mothers are female? Learning Higher-Order Knowledge


Performing Induction in Cyc : Facts & Background First-orderized Facts Perform Induction Integrate Cyc and Aleph FOL-ify CycL and export to Aleph Produce ILP learning bias from background knowledge Based on semantic content of predicate knowledge CycL-ify, review, and assert ILP-produced rules Performing Induction in Cyc Background Knowledge Induced Rules Evaluate Results Good Rules


Sample Rules Produced : Sample Rules Produced (implies (and (cyclistPrimaryProject ?KE ?PROJECT) (projectTasks ?PROJECT ?TASK) (requestedEffortPercent ?TASK ?KE ?X)) (assignedEffortPercent ?TASK ?KE ?X)) (implies (and (projectManagers ?PROJECT ?AGENT)) (projectParticipants ?PROJECT ?AGENT)) (implies (and (primarySupervisor ?AGENT AGENT-1) (requestedEffortPercent ?TASK ?AGENT ?X) (projectManagers ?PROJECT ?AGENT-1) (projectTasks ?PROJECT ?TASK)) (assignedEffortPercent ?TASK ?AGENT ?X))


Sample Rules Produced : Sample Rules Produced If someone’s time has been requested for a task by that person’s primary project, the time will be assigned. People participate in the projects they manage. (One hopes!) People are assigned to tasks requested of them by projects managed by that person’s direct supervisor. These are only patterns, not always guaranteed to be true – but they’re useful and common-sensical.


Bottom-up modeling of knowledge OntoGen system : Bottom-up modeling of knowledge OntoGen system


Underlying concepts : Underlying concepts


Main Features : Main Features


Ontology management : Ontology management Concept hierarchy List of suggested sub-concepts Ontology visualization Selected concept


Concept management : Concept management Concept’s details Concept’s instance management Selected concept Keywords Selected instance


Active Learning for concept learning : Active Learning for concept learning SVM hyperplane distance based active learning algorithm First few labelled documents are bootstrapped from a query search Instances for final concept are selected using the final SVM model Query New Concept SVM


Multiple views of the same data : Reuters news articles used in the upper example with two different sets of categories: topics or list of countries that appear in the news articles. Each set of categories offers a different view on the data. SVM based method detects importance of keywords for each view. Multiple views of the same data


Concept’s instances visualization : Instances are visualized as points on 2D map. The distance between two instances on the map correspond to their similarity. Characteristic keywords are shown for all parts of the map. User can select groups of instances on the map to create sub-concepts. Concept’s instances visualization


Ontology population : New documents Classification of selected document Selected document Ontology population System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts. Users can finalize the classifications using an interactive user interface