logging in or signing up KAT PRESENTATION Roxie Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 100 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 21, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Prior and Tacit Knowledge Knowledge Acquisition from Text using Language Computer’s Jaguar Tools: Prior and Tacit Knowledge Knowledge Acquisition from Text using Language Computer’s Jaguar Tools Lowell Boggs Language Computer Corporation 1701 N.Collins Richardson, Texas, 75080Automatically building ontologies from documents: Automatically building ontologies from documents Language Computer Corporation’s Jaguar tools automatically builds ontologies from text documents given a specification of the domain of interest. To build an ontology, perform the following steps: Select a set of “seed” words that define base concepts in the domain of interest Select a set of documents that discuss the concepts of interest Use the Jaguar tool to extract concepts from the documents and to organize them into IS-A hierarchies Generated ontologies can be used directly, edited, or automatically merged other ontologies. Seed Concepts: Seed Concepts Jaguar builds ontologies around the seed words that define the domain of interest: Seed words are used to select sentences from the input documents that are of part of the domain Seed words are included in the generated ontology Other relevant concepts appearing in sentences containing seeds are included in the ontology These other words can optionally be used to select additional sentencesSelecting Seeds: Selecting Seeds Seed words can be selected either manually or automatically Selecting seed words manually requires a good understanding of the terminology in the domain of interest Automatic selection of seed words can be accomplished by comparing the words in a small set of documents which are known to be specific to the domain against those in a bigger corpus of more general information. Statistical methods can be used to select a desirable seeds. Language Computer is working on a tools at this time to aid in seed selection.How Jaguar builds an ontology: How Jaguar builds an ontology Jaguar builds an ontology using the following steps Find sentences in the input documents that contain seed words Parse those sentences and extract semantic relations Save the IS-A relations in the ontology being produced Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations. Finally, use the hypernymy information found in word net to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processes Sentence parsing: Sentence parsing Sentences are parsed by Jaguar using LCC’s “RELU” software. The following sentence: Rantissi fired a b14 SCUD missile at an Israeli cargo van on Thursday Sentence Parsing: Sentence Parsing Is parsed into the following tree: [TOP, 1-70] ^ [S, 1-70] [NP, 1-8] ^ [Rantissi/NNP/0, 1-8 / ] ^ [VP, 10-69] ^ [fired/VBD/0, 10-14 / ] [NP, 16-57] ^ [NP, 16-33] [a/DT/0, 16-16 / ] [B14 SCUD/NNP/0, 18-25 / ] ^ [missile/NN/0, 27-33 / ] [PP, 35-57] ^ [at/IN/0, 35-36 / ] [NP, 38-57] [an/DT/0, 38-39 / ] [Israeli/JJ/0, 41-47 / ] [cargo/NN/0, 49-53 / ] ^ [van/NNP/0, 55-57 / ] [PP, 59-69] ^ [on/IN/0, 59-60 / ] [NP, 62-69] ^ [Thursday/NNP/0, 62-69 / ] Compound NounsSemantic Relations: Semantic Relations Language Computer’s “Polaris Semantic Parser” technology is used to extract semantic relations from parsed sentences. LCC supports 35 semantic relations (meaning fragments) but Jaguar is primarily focusing on IS-A relations at this time Jaguar will eventually classify other transitive semantic relationships and include them in its output knowledge base – of which the ontology is a key part The following relations will be focused on in the near future: Part/Whole Kinship Locative TemporalDiscovered semantic relations: Discovered semantic relations The previous sentence: Rantissi fired a b14 SCUD missile at an Israeli cargo van on Thursday Contains the following semantic relations AGENT(fired, rantissi) THEME(fired, b14 SCUD missile) TEMPORAL(fired, Thursday) LOCATIVE(fired, at an Israeli cargo van) Because of their appearance in semantic relations in a processed sentence, all the words will appear in the generated ontologyEvaluating the Generated Ontology: Evaluating the Generated Ontology Input 1: 5.67 MB of raw input text from the CNS collection in the domain of chemical/biological weapons. Input 2: 158 manually selected seeds describing concepts associated with the domain of biological weapons. Output: Ontology (BW-Jaguar) whose concepts are Semantically related to the seeds Expressed in the document collection Base Lines: Teknowledge’s manually constructed WMD ontology with content about chemical and nuclear weapons removed (BW-manual) BW-manual with all concepts removed that do not appear in the CNS document collection (BW-manual-filtered) Evaluating generated ontologies: precision: Evaluating generated ontologies: precision 896 = 19.02% 756 = 93.22%Evaluating Generated Ontologies: conceptual recall: Evaluating Generated Ontologies: conceptual recall = 593.38% 896 = 1,317.64% 896Evaluating Generated Ontologies: subsumption recall: Evaluating Generated Ontologies: subsumption recall Number of correct subsumption links in BW-Jaguar Number of subsumption links in BW-manual Number of correct subsumption links in BW-Jaguar 756 = 363.46%Evaluating Generated Ontologies: Evaluating Generated Ontologies Number of relevant, unsubsumed concepts in BW-Jaguar 85 = 9.49% 896 - 158 = 467.09%Ontology Quality: Ontology Quality Where we are: Both Conceptual Recall and Subsumption Precision are very high. We’re finding the concepts that are relevant to the domain. Extracted subsumption relations are almost always correct. There are few unlinked concepts. There is significant conceptual expansion. Needing improvement Need to improve domain relevance We are currently working to improve filtering of complex concepts and to improve the handling of conjunctions We are currently working on tools to help select seeds Example Jaguar Runs: Example Jaguar Runs See the following web page for example Jaguar runs http://209.136.88.26/demo Select the “Jaguar” demo. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
KAT PRESENTATION Roxie Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 100 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 21, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Prior and Tacit Knowledge Knowledge Acquisition from Text using Language Computer’s Jaguar Tools: Prior and Tacit Knowledge Knowledge Acquisition from Text using Language Computer’s Jaguar Tools Lowell Boggs Language Computer Corporation 1701 N.Collins Richardson, Texas, 75080Automatically building ontologies from documents: Automatically building ontologies from documents Language Computer Corporation’s Jaguar tools automatically builds ontologies from text documents given a specification of the domain of interest. To build an ontology, perform the following steps: Select a set of “seed” words that define base concepts in the domain of interest Select a set of documents that discuss the concepts of interest Use the Jaguar tool to extract concepts from the documents and to organize them into IS-A hierarchies Generated ontologies can be used directly, edited, or automatically merged other ontologies. Seed Concepts: Seed Concepts Jaguar builds ontologies around the seed words that define the domain of interest: Seed words are used to select sentences from the input documents that are of part of the domain Seed words are included in the generated ontology Other relevant concepts appearing in sentences containing seeds are included in the ontology These other words can optionally be used to select additional sentencesSelecting Seeds: Selecting Seeds Seed words can be selected either manually or automatically Selecting seed words manually requires a good understanding of the terminology in the domain of interest Automatic selection of seed words can be accomplished by comparing the words in a small set of documents which are known to be specific to the domain against those in a bigger corpus of more general information. Statistical methods can be used to select a desirable seeds. Language Computer is working on a tools at this time to aid in seed selection.How Jaguar builds an ontology: How Jaguar builds an ontology Jaguar builds an ontology using the following steps Find sentences in the input documents that contain seed words Parse those sentences and extract semantic relations Save the IS-A relations in the ontology being produced Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations. Finally, use the hypernymy information found in word net to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processes Sentence parsing: Sentence parsing Sentences are parsed by Jaguar using LCC’s “RELU” software. The following sentence: Rantissi fired a b14 SCUD missile at an Israeli cargo van on Thursday Sentence Parsing: Sentence Parsing Is parsed into the following tree: [TOP, 1-70] ^ [S, 1-70] [NP, 1-8] ^ [Rantissi/NNP/0, 1-8 / ] ^ [VP, 10-69] ^ [fired/VBD/0, 10-14 / ] [NP, 16-57] ^ [NP, 16-33] [a/DT/0, 16-16 / ] [B14 SCUD/NNP/0, 18-25 / ] ^ [missile/NN/0, 27-33 / ] [PP, 35-57] ^ [at/IN/0, 35-36 / ] [NP, 38-57] [an/DT/0, 38-39 / ] [Israeli/JJ/0, 41-47 / ] [cargo/NN/0, 49-53 / ] ^ [van/NNP/0, 55-57 / ] [PP, 59-69] ^ [on/IN/0, 59-60 / ] [NP, 62-69] ^ [Thursday/NNP/0, 62-69 / ] Compound NounsSemantic Relations: Semantic Relations Language Computer’s “Polaris Semantic Parser” technology is used to extract semantic relations from parsed sentences. LCC supports 35 semantic relations (meaning fragments) but Jaguar is primarily focusing on IS-A relations at this time Jaguar will eventually classify other transitive semantic relationships and include them in its output knowledge base – of which the ontology is a key part The following relations will be focused on in the near future: Part/Whole Kinship Locative TemporalDiscovered semantic relations: Discovered semantic relations The previous sentence: Rantissi fired a b14 SCUD missile at an Israeli cargo van on Thursday Contains the following semantic relations AGENT(fired, rantissi) THEME(fired, b14 SCUD missile) TEMPORAL(fired, Thursday) LOCATIVE(fired, at an Israeli cargo van) Because of their appearance in semantic relations in a processed sentence, all the words will appear in the generated ontologyEvaluating the Generated Ontology: Evaluating the Generated Ontology Input 1: 5.67 MB of raw input text from the CNS collection in the domain of chemical/biological weapons. Input 2: 158 manually selected seeds describing concepts associated with the domain of biological weapons. Output: Ontology (BW-Jaguar) whose concepts are Semantically related to the seeds Expressed in the document collection Base Lines: Teknowledge’s manually constructed WMD ontology with content about chemical and nuclear weapons removed (BW-manual) BW-manual with all concepts removed that do not appear in the CNS document collection (BW-manual-filtered) Evaluating generated ontologies: precision: Evaluating generated ontologies: precision 896 = 19.02% 756 = 93.22%Evaluating Generated Ontologies: conceptual recall: Evaluating Generated Ontologies: conceptual recall = 593.38% 896 = 1,317.64% 896Evaluating Generated Ontologies: subsumption recall: Evaluating Generated Ontologies: subsumption recall Number of correct subsumption links in BW-Jaguar Number of subsumption links in BW-manual Number of correct subsumption links in BW-Jaguar 756 = 363.46%Evaluating Generated Ontologies: Evaluating Generated Ontologies Number of relevant, unsubsumed concepts in BW-Jaguar 85 = 9.49% 896 - 158 = 467.09%Ontology Quality: Ontology Quality Where we are: Both Conceptual Recall and Subsumption Precision are very high. We’re finding the concepts that are relevant to the domain. Extracted subsumption relations are almost always correct. There are few unlinked concepts. There is significant conceptual expansion. Needing improvement Need to improve domain relevance We are currently working to improve filtering of complex concepts and to improve the handling of conjunctions We are currently working on tools to help select seeds Example Jaguar Runs: Example Jaguar Runs See the following web page for example Jaguar runs http://209.136.88.26/demo Select the “Jaguar” demo.