KAT PRESENTATION

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Prior and Tacit Knowledge Knowledge Acquisition from Text using Language Computer’s Jaguar Tools: 

Prior and Tacit Knowledge Knowledge Acquisition from Text using Language Computer’s Jaguar Tools Lowell Boggs Language Computer Corporation 1701 N.Collins Richardson, Texas, 75080

Automatically building ontologies from documents: 

Automatically building ontologies from documents Language Computer Corporation’s Jaguar tools automatically builds ontologies from text documents given a specification of the domain of interest. To build an ontology, perform the following steps: Select a set of “seed” words that define base concepts in the domain of interest Select a set of documents that discuss the concepts of interest Use the Jaguar tool to extract concepts from the documents and to organize them into IS-A hierarchies Generated ontologies can be used directly, edited, or automatically merged other ontologies.

Seed Concepts: 

Seed Concepts Jaguar builds ontologies around the seed words that define the domain of interest: Seed words are used to select sentences from the input documents that are of part of the domain Seed words are included in the generated ontology Other relevant concepts appearing in sentences containing seeds are included in the ontology These other words can optionally be used to select additional sentences

Selecting Seeds: 

Selecting Seeds Seed words can be selected either manually or automatically Selecting seed words manually requires a good understanding of the terminology in the domain of interest Automatic selection of seed words can be accomplished by comparing the words in a small set of documents which are known to be specific to the domain against those in a bigger corpus of more general information. Statistical methods can be used to select a desirable seeds. Language Computer is working on a tools at this time to aid in seed selection.

How Jaguar builds an ontology: 

How Jaguar builds an ontology Jaguar builds an ontology using the following steps Find sentences in the input documents that contain seed words Parse those sentences and extract semantic relations Save the IS-A relations in the ontology being produced Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations. Finally, use the hypernymy information found in word net to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processes

Sentence parsing: 

Sentence parsing Sentences are parsed by Jaguar using LCC’s “RELU” software. The following sentence: Rantissi fired a b14 SCUD missile at an Israeli cargo van on Thursday

Sentence Parsing: 

Sentence Parsing Is parsed into the following tree: [TOP, 1-70] ^ [S, 1-70] [NP, 1-8] ^ [Rantissi/NNP/0, 1-8 / ] ^ [VP, 10-69] ^ [fired/VBD/0, 10-14 / ] [NP, 16-57] ^ [NP, 16-33] [a/DT/0, 16-16 / ] [B14 SCUD/NNP/0, 18-25 / ] ^ [missile/NN/0, 27-33 / ] [PP, 35-57] ^ [at/IN/0, 35-36 / ] [NP, 38-57] [an/DT/0, 38-39 / ] [Israeli/JJ/0, 41-47 / ] [cargo/NN/0, 49-53 / ] ^ [van/NNP/0, 55-57 / ] [PP, 59-69] ^ [on/IN/0, 59-60 / ] [NP, 62-69] ^ [Thursday/NNP/0, 62-69 / ] Compound Nouns

Semantic Relations: 

Semantic Relations Language Computer’s “Polaris Semantic Parser” technology is used to extract semantic relations from parsed sentences. LCC supports 35 semantic relations (meaning fragments) but Jaguar is primarily focusing on IS-A relations at this time Jaguar will eventually classify other transitive semantic relationships and include them in its output knowledge base – of which the ontology is a key part The following relations will be focused on in the near future: Part/Whole Kinship Locative Temporal

Discovered semantic relations: 

Discovered semantic relations The previous sentence: Rantissi fired a b14 SCUD missile at an Israeli cargo van on Thursday Contains the following semantic relations AGENT(fired, rantissi) THEME(fired, b14 SCUD missile) TEMPORAL(fired, Thursday) LOCATIVE(fired, at an Israeli cargo van) Because of their appearance in semantic relations in a processed sentence, all the words will appear in the generated ontology

Evaluating the Generated Ontology: 

Evaluating the Generated Ontology Input 1: 5.67 MB of raw input text from the CNS collection in the domain of chemical/biological weapons. Input 2: 158 manually selected seeds describing concepts associated with the domain of biological weapons. Output: Ontology (BW-Jaguar) whose concepts are Semantically related to the seeds Expressed in the document collection Base Lines: Teknowledge’s manually constructed WMD ontology with content about chemical and nuclear weapons removed (BW-manual) BW-manual with all concepts removed that do not appear in the CNS document collection (BW-manual-filtered)

Evaluating generated ontologies: precision: 

Evaluating generated ontologies: precision 896 = 19.02% 756 = 93.22%

Evaluating Generated Ontologies: conceptual recall: 

Evaluating Generated Ontologies: conceptual recall = 593.38% 896 = 1,317.64% 896

Evaluating Generated Ontologies: subsumption recall: 

Evaluating Generated Ontologies: subsumption recall Number of correct subsumption links in BW-Jaguar Number of subsumption links in BW-manual Number of correct subsumption links in BW-Jaguar 756 = 363.46%

Evaluating Generated Ontologies: 

Evaluating Generated Ontologies Number of relevant, unsubsumed concepts in BW-Jaguar 85 = 9.49% 896 - 158 = 467.09%

Ontology Quality: 

Ontology Quality Where we are: Both Conceptual Recall and Subsumption Precision are very high. We’re finding the concepts that are relevant to the domain. Extracted subsumption relations are almost always correct. There are few unlinked concepts. There is significant conceptual expansion. Needing improvement Need to improve domain relevance We are currently working to improve filtering of complex concepts and to improve the handling of conjunctions We are currently working on tools to help select seeds

Example Jaguar Runs: 

Example Jaguar Runs See the following web page for example Jaguar runs http://209.136.88.26/demo Select the “Jaguar” demo.