logging in or signing up LCC02062007 parker Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 90 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 20, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Extracting Rich Knowledge from Text: Extracting Rich Knowledge from Text John D. Prange President 410-964-0179 john.prange@languagecomputer.com www.languagecomputer.com Our Company: Our Company Language Computer Corporation (LCC) Human Language Understanding Research and Development Founded 11 years ago in Dallas, Texas; Established a second office in Columbia, MD in mid-2006 ~70 research scientists and engineers Research funding primarily from DTO, NSF, AFRL, DARPA and several individual Government Agencies Technology has been transferred to individual Government Organizations, Defense contractors and more recently to Commercial Customers Outline of Talk: Outline of Talk Three Lines of Research & Development within LCC that impact Semantic-Level Understanding Information Extraction CiceroLite and other Cicero Products Extracting Rich Knowledge from Text Polaris: Semantic Parser XWN KB: Extended WordNet Knowledge Base Jaquar: Knowledge Extraction from Text Context and Events: Detection, Recognition & Extraction Cogex: Reasoning and Inferencing over Extracted Knowledge Semantic Parsing & Logical Forms Lexical Chains & On-Demand Axioms Logic ProverLCC’s Areas of Research: Information Extraction Given an entire corpus of documents Extracting every instance of some particular kind of information Named Entity Recognition – extraction of entities such as person, location and organization names Event-based Extraction – extraction of real world events such as bombings, deaths, court cases, etc. LCC’s Areas of ResearchCiceroLite & Cicero-ML: Named Entity Recognition Systems: CiceroLite & Cicero-ML: Named Entity Recognition SystemsTwo High-Performance NER Systems: Two High-Performance NER Systems Accurate and customizable NE Recognition for English Classifies 8 high-frequency NE classes with over 90% precision and recall Currently extended to detect over 150 different NE classes Non-deterministic Finite-State Automata (FSA) framework resolves ambiguities in text, performs precise classification Machine Learning-based NER for multiple languages Statistical machine learning- based framework makes for rapid extension to new languages Currently deployed for Arabic, German, English, and Spanish Arabic: Classifies 18 NE classes with an average of nearly 90% F CiceroLite CiceroLite-MLCiceroLite: CiceroLite Designed specifically for English, CiceroLite categorizes 8 high-frequency NE classes with over 90% precision and recall. But it’s capable of much much more: as currently deployed, CiceroLite can categorize up to 150 different NE classes, including: Over 100 more!CiceroLite-ML (Arabic): CiceroLite-ML (Arabic) CiceroLite-ML currently detects a total 18 different classes of named entities for Arabic with between 80% - 90% F. Other Cicero Products: Other Cicero Products CiceroLite-ML (Mandarin Chinese) Similar scope and depth of Arabic Version shown on previous slide CiceroCustom User customizable event extraction system using a variant of supervised learning called “active learning” TASER (Temporal & Spatial Normalization System) Recognize 8 different types of time expressions and over 50 types of spatial expressions; Normalizies time using ISO8601; Exact Lat/Long for ~8M place names Under Contractual Development (With Deliveries in 2007) CiceroRelation Relation Detection based upon ACE 2007 specifications CiceroCoref Entity coreference utilizing CiceroLite NER; to include cross document entity tracking CiceroDiscourse Extract discourse structure & topic semanticsLCC’s Areas of Research: Extracting Rich Knowledge From Text Explicit knowledge Implicit knowledge: implicatures, humor, sarcasm, deceptions, etc. Other textual phenomena: negation, modality, quantification, coreference resolution Lexical Level & Syntax Semantic Relations Contexts Events & Event Properties Meta-Events Event Relations LCC’s Areas of Research Skip BackExtracting Rich Knowledge from Text: Extracting Rich Knowledge from Text Innovations A rich and flexibility representation of textual semantics Extract concepts and semantic relations between concepts, rich event structures Extract event properties; extend events using event relations Handle textual phenomena such as negation and modality Mark implicit knowledge and capture meaning suggested by it whenever possibleFour-Layered Representation: Four-Layered Representation Syntax Representation Syntactically link words in sentences; Apply Word Sense Disambiguation (WSD) Semantic Relations Provide deeper semantic understanding of relations between words Context Representation Place boundaries around knowledge that is not universal Event Representation Detect events, extract their properties, extend using event relationsHierarchical Representation: Hierarchical RepresentationPolaris: Semantic Parser: Polaris: Semantic ParserPolaris Semantic Relations: Polaris Semantic RelationsPropbank vs. Polaris Relations: Propbank vs. Polaris RelationsExample: Polaris on Treebank: Example: Polaris on Treebank We're talking about years ago before anyone heard of asbestos having any questionable properties.XWN KB: Extended WordNet Knowledge Base: XWN KB: Extended WordNet Knowledge BaseXWN Knowledge Base (1/2): XWN Knowledge Base (1/2) WordNet® - free from Princeton University A large lexical database of English, developed by Professor George Miller, Princeton Univ; now under the direction of Christiane Fellbaum. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. eXtended WordNet - free from UTD Glosses: parsed; word sense disambiguated; transformed into logic forms XWN Knowledge Base - done at LCC Glosses: converted into semantic relations (using Polaris semantic parser) Represented in a Knowledge Base Reasoning tool Axiom generator Lexical chain facilitator XWN Knowledge BaseXWN Knowledge Base (2/2): XWN Knowledge Base (2/2) Summary: The rich definitional glosses from WordNet are processed through LCC’s Knowledge Acquisition System (Jaguar) to produce a semantically rich upper ontology The Clusters: Noun glosses are transformed into sets of semantic relations, which are then arranged into individual semantic units called clusters, with one cluster per gloss The Hierarchy: The clusters (representing one noun synset each) are arranged in a hierarchy similar to that of WordNet The Knowledge Base: The generated KB has not only the hierarchy of WordNet, but also a rich semantic representation of each entry in the hierarchy (based on the definitional gloss)Example: WordNet Gloss: Example: WordNet Gloss Tennis is a game played with rackets by two or four players who hit a ball back and forth over a net that divides the court ISA (Tennis, game) AGT (two or four players, play) THM (game, play) INS (rackets, play) MEA (two or four, players) AGT (two or four players, hit) THM (a ball, hit) MNR (back and forth, hit) LOC (over a net that divides the court, hit) AGT (a net, divides) THM (the court, divides)Semantic Cluster of a WordNet Gloss: Semantic Cluster of a WordNet Gloss tennis ISA game player MEA two or four play AGT player THM INS game racket hit AGT player THM MNR ball back and forth LOC over a net divide AGT net THM court Synset ID: 00457626 Name: tennis, lawn_tennisHierarchy (as in WordNet): Hierarchy (as in WordNet) tennis basketball squash court game athletic game outdoor game golf croquet Jaguar: Knowledge Extraction From Text: Jaguar: Knowledge Extraction From TextJaguar: Knowledge Extraction: Jaguar: Knowledge Extraction Automatically generate ontologies and structured knowledge bases from text Ontologies form the framework or “skeleton” of the knowledge base Rich set of semantic relations form the “muscle” that connects concepts in the knowledge baseJaguar : Knowledge Extraction: Automatically generate ontologies and structured knowledge bases from text Ontologies form the framework or “skeleton” of the knowledge base Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base Jaguar : Knowledge ExtractionAutomatically Building the Ontology: Automatically Building the Ontology Jaguar builds an ontology using the following steps Seed words selected either manually or automatically Find sentences in the input documents that contain seed words Parse those sentences and extract semantic relations; focusing on selected relations such as IS-A; Part-Whole; Kinship; Locative; Temporal Integrate the selected semantic relations into the ontology being produced Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations. Finally, use the hyponymy information found in Extended WordNet to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processesResult: Jaguar Knowledge Base: Result: Jaguar Knowledge Base anthrax biological weaponContext & Events: Detection, Classification & Extraction: Context & Events: Detection, Classification & ExtractionTypes of Context: Types of Context Temporal It rained on July 7th Spatial It rained in Dallas Report John said “It rains” Belief John thinks that it rains Volitional John wants it to rain Planning It is scheduled to rain Conditional If it’s cloudy, it will rain Possibility It might rain Events in Text: Events in Text Basic Definition: X is an Event, if X is a possible answer to the question: What happened? Applying Definition to Verbs and Nouns Verb V is an Event if the sentence: Someone/something V-ed (someone/something) is an answer to the question “What happened”? Noun N is an Event if the sentence: There was/were (a/an) N is an answer to the question “What happened”?Events in Text: Events in Text Most Adjectives are not potential Events Verbal 'adjectives' are treated as verbs. eg. 'lost', 'admired' Factatives ('Light' Verbs) are not separate events Suffer-a Loss; Take-a Test; Perform-an Operation Aspectual Markers Can Combine with a Wide Range of Events e.g., Stop, Completion, Start, Continue, Fail, Succeed, Try Modalities are not separate events Possibility, Necessity, Prescription, Suggestion, OptativeEvent Detection: Event Detection Approach for Event Detection Annotate WordNet synsets that are Event concepts Annotation completed for Noun and Verb hierarchies Detect events by lexical lookup for concepts in annotated WordNet Project Status Prototype implemented for Event detection Run Benchmarks Precision: 93%, Recall: 79% Currently Tuning PerformanceEvent Extraction – Future: Event Extraction – Future Event Structures for Modelling Discourse Aspect (Start, Complete, Continue, Succeed, Fail, Try) Modality (Possibility, Necessity, Optativity) Event Participants (Actors, Undergoers, Instruments) Context (Spatial, Temporal, Intensional) Event Relations (Causation, Partonomy, Similarity, Contrast) Event Taxonomy/Classification Event CompositionLCC’s Areas of Research: Cogex: Reasoning & Inferencing Over Extracted Knowledge LCC’s Areas of ResearchReasoning & Inferences: Example Tasks that Require Both: Reasoning & Inferences: Example Tasks that Require BothTREC Question Answering Track: TREC Question Answering Track TREC Question Answering Track held annual since its inception in TREC-9 (1999) Main Task TREC-2006 QA Track AQUAINT Corpus of English News Text http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 Newswire text data in English, drawn from three sources: Xinhua News Service (People's Republic of China), New York Times News Service Associated Press Worldstream News Service. Roughly 3 GBytes of Text; Million+ documents Test Set: 75 Sets of Questions organized around a common target; where the target is a Person, Organization, Event or Thing Each Series of Question contains 6-9 questions; 4-7 Factoids, 1-2 List, and 1 Other Total: 403 Factoid Questions; 89 List Questions; 75 Other QuestionsTREC-2006 Question Answering Track: TREC-2006 Question Answering Track 145. Target Event John Williams convicted of Murder 145.1 Factoid How many non-white members of the jury were there? 145.2 Factoid Who was the foreman for the jury 145.3 Factoid Where was the Trial held? 145.4 Factoid When was King convicted? 145.5 Factoid Who was the victim of the murder 145.6 List What defense and prosecution attorneys participated in the trial? 145.7 OtherTextual Entailment: Textual Entailment Textual Entailment Textual Entailment Recognition is a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and (multi) document summarization. Task definition: T entails H, denoted by T → H, if the meaning of H can be inferred from the meaning of T PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) RTE (Recognizing Textual Entailment) Challenge RTE-1 (2004-05); RTE-2 (2005-06) and RTE-3 (2006-07) http://www.pascal-network.org/Challenges/RTE/ The Question Answering Task can be interpreted as a Textual Entailment task as follows: Given a Question Q and a possible Answer Text Passage A, the QA task is then one of applying semantic inference to the pair (Q, A) to infer whether or not A contains the Answer to Q.RTE-2: Example TH Pairs: RTE-2: Example TH Pairs Entailment?: “Yes” T: Tibone estimated diamond production at four mines operated by Debswana – Botswana’s 50-50 joint venture with DeBeers – could reach 33 million carats this year. H: Botswana is a business partner of DeBeers. Entailment?: “Yes” T: The EZLN differs from most revolutionary groups by having stopped military action after the initial uprising in the first two weeks of 1994. H: EZLN is a revolutionary group. Entailment?: “No” T: Two persons were injured in dynamite attacks perpetrated this evening against two bank branches in this Northwestern Colombian city. H: Two persons perpetrated dynamite attacks in a Northwestern Colombian city. Entailment?: “No” T: Such a margin of victory would give Abbas a clear mandate to renew peace talks with Israel, rein in militants and reform the corruption-riddled Palestinian Authority. H: The new Palestinian president combated corruption and revived the Palestinian economy.Cogex: Logic Prover: Cogex: Logic ProverSemantically Enhanced COGEX: Semantically Enhanced COGEX Answer / Entailment NL Justification Q /T A / H Q/T LF A/H LF Axioms Lex Chains Axiom Building Temporal Axioms Logic Forms XWN KBase Semantic Calculus Context Semantic Parser Relaxation Logic Prover Answer Or Entailment Ranking Linguistic Axioms World K AxiomsOutput of Semantic Parser: Output of Semantic Parser Question: What is the Muslim Brotherhood's goal? The output of the semantic parser: PURPOSE(x, Muslim Brotherhood) Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992. The output of the semantic parser: AGENT(Muslim Brotherhood, advocate) PURPOSE(turning Egypt into a strict Muslim state, advocate) TEMPORAL(1928, establish) TEMPORAL(1992, took up arms) PROPERTY(strict, Muslim state) MEANS(political means, turning Egypt into a strict Muslim state) SYNONYMY(Muslim Brotherhood, Egypt's biggest fundamentalist group)Generation of Logical Forms: Generation of Logical Forms Question: What is the Muslim Brotherhood's goal? Question Logical Form (QLF): (exists x0 x1 x2 x3 (Muslim_NN(x0) & Brotherhood_NN(x1) & nn_NNC(x2,x0,x1) & PURPOSE_SR(x3,x2))). Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992. Answer Logical Form (AFL): (exists e1 e2 e3 e4 e5 e6 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 (Muslim_NN(x1) & Brotherhood_NN(x2) & nn_NNC(x3,x1,x2) & Egypt_NN(x4) & _s_POS(x5,x4) & biggest_JJ(x5) & fundamentalist_JJ(x5) & group_NN(x5) & SYNONYMY_SR(x3,x5) & establish_VB(e1,x20,x5) & in_IN(e1,x6) & 1928_CD(x6) & TEMPORAL_SR(x6,e1) & advocate_VB(e2,x5,x21) & AGENT_SR(x5,e2) & PURPOSE_SR(e3,e2) & turn_VB(e3,x5,x7) & Egypt_NN(x7) & into_IN(e3,x8) & strict_JJ(x15,x14) & Muslim_NN(x8) & state_NN(x13) & nn_NNC(x14,x8,x13) & PROPERTY_SR(x15,x14) & by_IN(e3,x9) & political_JJ(x9) & means_NN(x9) & MEANS_SR(x9,e3) & set_VB(e5,x5,x5) & itself_PRP(x5) & apart_RB(e5) & from_IN(e5,x10) & militant_JJ(x10) & group_NN(x10) & take_VB(e6,x10,x12) & up_IN(e6,x11) & arms_NN(x11) & in_IN(e6,x12) & 1992_CD(x12) & TEMPORAL_SR(x12,e6)). Lexical Chains & Axioms: On Demand Input into Cogex: Lexical Chains & Axioms: On Demand Input into CogexLexical Chains from XWN: Lexical Chains from XWN Lexical chains Lexical Chains establish connections between semantically related concepts, i.e. WordNet synsets. (note concepts, not words which means Word Sense Disambiguation is necessary) Concepts and relations along the lexical chain explain the semantic connectivity of the end concepts Lexical chains start by using WordNet relations (ISA, Part-Whole) and gloss co-occurrence (weak relation) XWN Knowledge Base then adds more meaningful (precise) relations “Tennis a game played with rackets by two or four players…” Prior to XWN-KB: ‘tennis’ ‘two or four’ (gloss co-occurrence) With XWN-KB: ‘tennis’ ‘game’ ‘play’ ‘player’ ‘two or four’ ISA AGT THM MEAExamples of Lexical Chains: Examples of Lexical Chains Question: How were biological agents acquired by bin Laden? Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in Lexical Chain: ( V - buy#1, purchase#1 ) – HYPERNYM (V - get#1, acquire#1 ) Question: How did Adolf Hitler die? Answer: … Adolf Hitler committed suicide … Lexical Chain: ( N - suicide#1, self-destruction#1, self-annihilation#1 ) – GLOSS ( V - kill#1 ) – GLOSS ( V - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 )Propagating syntactic structures along the chain: Propagating syntactic structures along the chain The goal is to filter out unacceptable chains, and to improve the ranking of chains when multiple chains can be established Example 1: AGENT Q: Who did Floyd Patterson beat to win the title? PATIENT WA: He saw Ingemar Johanson knock down Floyd Patterson seven times there in winning the title. V - beat#2 – entail V - hit#4 – derivation N - hitting#1,striking#2 – derivation V - strike#2 – hyponym V - knock-down#2 Example 2: AGENT THEME MEASURE S1: John bought a cowboy hat for $50. AGENT MEASURE THEME S2: John paid $50 for a cowboy hat. V - buy#1 – entail V - pay#1Axioms on Demand (1/3): Axioms on Demand (1/3) Extract world knowledge, in the form of axioms, from text or other resources automatically and “on demand” When the logic prover runs out of rules to use, it can request one from external knowledge sources Will ask for a rule connecting two concepts Generate axioms on the fly from multiple knowledge sources WordNet and eXtended WordNet: glosses and lexical chains Instantiation of NLP rules Open text from a trusted source (dictionary, encyclopedia, textbook on a relevant topic, etc.) An automatically-built knowledge baseAxioms on Demand (2/3): Axioms on Demand (2/3) eXtended WordNet axiom generator Question: What all can a ‘player’ do? Look at all contexts with ‘player’ as AGT Gloss of ‘tennis’: a ‘player’ can ‘hit’ (a ball), ‘play’ (a game) Gloss of ‘squash’: A ‘player’ can ‘strike’ (a ball), etc Connect related-concepts kidnap_VB(e1,x1,x2) -> kidnapper_NN(x1) (asian_JJ(x1,x2) asia_NN(x1) & _continent_NE(x1)) World Knowledge axioms WordNet glosses jungle_cat_NN(x1) -> small_JJ(x2,x1) & Asiatic_JJ(x3,x1) & wildcat_NN(x1) NLP axioms Linguistic rewriting rules Gilda_NN(x1) & Flores_NN(x2) & nn_NNC(x3,x1,x2) -> Flores_NN(x3)Axioms on Demand (3/3): Axioms on Demand (3/3) Semantic Relation Calculus Combine two or more local semantic relations to establish broader semantic relations Increase the semantic connectivity Mike is a rich man → Mike is rich ISA_SR(Mike,man) & PAH_SR(man,rich) →PAH_SR(Mike,rich) John lives in Dallas, Texas John lives in Texas. LOC(John,Dallas) & PW(Dallas,Texas) -> LOC(John, Texas) Temporal Axioms Time Transitivity of Events during_CTMP(e1,e2) & during_CTMP(e2,e3) during_CTMP(e1,e3) Dates entail more general times October 2000 → year 2000 Semantic Calculus Temporal Axioms AxiomsContextual Knowledge Axioms: Contextual Knowledge Axioms Examples If someone boards a plane and the flight takes 3 hours, then that person travels for 3 hours The person leaves at the same time and arrives at the same time with the traveling plane If the departure of a vehicle has a destination and the vehicle arrives at the destination then the arrival is located at the destination If something is exactly located somewhere, then nothing else is exactly located in the same place If a Process is located in an area, then all sub Processes of the Process are located in the same area Contextual Knowledge Axioms AxiomsLogic Prover: The Heart of Cogex: Logic Prover: The Heart of CogexLogic Prover (1/2): Logic Prover (1/2) A first order logic resolution style theorem prover Inference rule sets are based on hyperresolution and paramodulation Transform the two text fragments into 4-layered logic forms based upon LCC’s Syntactic, Semantic, Contextual and Event Processing and Analysis Automatically create “Axioms on Demand” to be used during the proof Lexical Chains axioms World Knowledge axioms Linguistic transformation axioms Contextual / Temporal axiomsLogic Prover (2/2): Logic Prover (2/2) Load COGEX’s SOS (Set of Support) with Candidate Answer Passage(s) A and Question Q and its USABLE list of clauses with the generated axioms, semantic and temporal axioms Search for a proof by iteratively removing clauses from SOS and searching the USABLE for possible inferences until a refutation is found If no contradiction is detected Relax arguments Drop entire predicates from H Compute “Proof Score” for each Candidate Select best Result & Generate NL JustificationReasoning & Inference: How Well Does LCC Do?: Reasoning & Inference: How Well Does LCC Do?Evaluations: QA (TREC-06): Evaluations: QA (TREC-06) LCC’s PowerAnswer Question Answering (QA) system finished 1st on Factoid Questions and Overall Combined Score. A second LCC QA system, Chaucer, finished 2nd in both categories in the TREC QA 2006 evaluation. An LCC QA system has finished 1st every year that the TREC QA Evaluation has been conducted (Annually since TREC-8 in 1999) Mean: 18.5% Top Score: 57.8%Evaluations: PASCAL RTE-2: Evaluations: PASCAL RTE-2 LCC’s Groundhog system finished 1st overall at the Second PASCAL Recognizing Textual Entailment Challenge (RTE-2) and LCC’s COGEX system finished 2nd. (http://www.pascal-network.org/Challenges/RTE/ )Contact Information: Contact Information Home Office 1701 N. Collins Boulevard Suite 2000 Richardson, TX 75080 972-231-0052 (Voice) 972-231-0012 (Fax) Maryland Office 6179 Campfire Columbia, MD 21045 410-715-0777 (Voice) 410-715-0774 (Fax) 443-878-8894 (Cell)Slide60: June Sunrise over Kirkwall Bay in the Orkney Islands of Scotland Your Questions & Comments You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
LCC02062007 parker Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 90 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 20, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Extracting Rich Knowledge from Text: Extracting Rich Knowledge from Text John D. Prange President 410-964-0179 john.prange@languagecomputer.com www.languagecomputer.com Our Company: Our Company Language Computer Corporation (LCC) Human Language Understanding Research and Development Founded 11 years ago in Dallas, Texas; Established a second office in Columbia, MD in mid-2006 ~70 research scientists and engineers Research funding primarily from DTO, NSF, AFRL, DARPA and several individual Government Agencies Technology has been transferred to individual Government Organizations, Defense contractors and more recently to Commercial Customers Outline of Talk: Outline of Talk Three Lines of Research & Development within LCC that impact Semantic-Level Understanding Information Extraction CiceroLite and other Cicero Products Extracting Rich Knowledge from Text Polaris: Semantic Parser XWN KB: Extended WordNet Knowledge Base Jaquar: Knowledge Extraction from Text Context and Events: Detection, Recognition & Extraction Cogex: Reasoning and Inferencing over Extracted Knowledge Semantic Parsing & Logical Forms Lexical Chains & On-Demand Axioms Logic ProverLCC’s Areas of Research: Information Extraction Given an entire corpus of documents Extracting every instance of some particular kind of information Named Entity Recognition – extraction of entities such as person, location and organization names Event-based Extraction – extraction of real world events such as bombings, deaths, court cases, etc. LCC’s Areas of ResearchCiceroLite & Cicero-ML: Named Entity Recognition Systems: CiceroLite & Cicero-ML: Named Entity Recognition SystemsTwo High-Performance NER Systems: Two High-Performance NER Systems Accurate and customizable NE Recognition for English Classifies 8 high-frequency NE classes with over 90% precision and recall Currently extended to detect over 150 different NE classes Non-deterministic Finite-State Automata (FSA) framework resolves ambiguities in text, performs precise classification Machine Learning-based NER for multiple languages Statistical machine learning- based framework makes for rapid extension to new languages Currently deployed for Arabic, German, English, and Spanish Arabic: Classifies 18 NE classes with an average of nearly 90% F CiceroLite CiceroLite-MLCiceroLite: CiceroLite Designed specifically for English, CiceroLite categorizes 8 high-frequency NE classes with over 90% precision and recall. But it’s capable of much much more: as currently deployed, CiceroLite can categorize up to 150 different NE classes, including: Over 100 more!CiceroLite-ML (Arabic): CiceroLite-ML (Arabic) CiceroLite-ML currently detects a total 18 different classes of named entities for Arabic with between 80% - 90% F. Other Cicero Products: Other Cicero Products CiceroLite-ML (Mandarin Chinese) Similar scope and depth of Arabic Version shown on previous slide CiceroCustom User customizable event extraction system using a variant of supervised learning called “active learning” TASER (Temporal & Spatial Normalization System) Recognize 8 different types of time expressions and over 50 types of spatial expressions; Normalizies time using ISO8601; Exact Lat/Long for ~8M place names Under Contractual Development (With Deliveries in 2007) CiceroRelation Relation Detection based upon ACE 2007 specifications CiceroCoref Entity coreference utilizing CiceroLite NER; to include cross document entity tracking CiceroDiscourse Extract discourse structure & topic semanticsLCC’s Areas of Research: Extracting Rich Knowledge From Text Explicit knowledge Implicit knowledge: implicatures, humor, sarcasm, deceptions, etc. Other textual phenomena: negation, modality, quantification, coreference resolution Lexical Level & Syntax Semantic Relations Contexts Events & Event Properties Meta-Events Event Relations LCC’s Areas of Research Skip BackExtracting Rich Knowledge from Text: Extracting Rich Knowledge from Text Innovations A rich and flexibility representation of textual semantics Extract concepts and semantic relations between concepts, rich event structures Extract event properties; extend events using event relations Handle textual phenomena such as negation and modality Mark implicit knowledge and capture meaning suggested by it whenever possibleFour-Layered Representation: Four-Layered Representation Syntax Representation Syntactically link words in sentences; Apply Word Sense Disambiguation (WSD) Semantic Relations Provide deeper semantic understanding of relations between words Context Representation Place boundaries around knowledge that is not universal Event Representation Detect events, extract their properties, extend using event relationsHierarchical Representation: Hierarchical RepresentationPolaris: Semantic Parser: Polaris: Semantic ParserPolaris Semantic Relations: Polaris Semantic RelationsPropbank vs. Polaris Relations: Propbank vs. Polaris RelationsExample: Polaris on Treebank: Example: Polaris on Treebank We're talking about years ago before anyone heard of asbestos having any questionable properties.XWN KB: Extended WordNet Knowledge Base: XWN KB: Extended WordNet Knowledge BaseXWN Knowledge Base (1/2): XWN Knowledge Base (1/2) WordNet® - free from Princeton University A large lexical database of English, developed by Professor George Miller, Princeton Univ; now under the direction of Christiane Fellbaum. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. eXtended WordNet - free from UTD Glosses: parsed; word sense disambiguated; transformed into logic forms XWN Knowledge Base - done at LCC Glosses: converted into semantic relations (using Polaris semantic parser) Represented in a Knowledge Base Reasoning tool Axiom generator Lexical chain facilitator XWN Knowledge BaseXWN Knowledge Base (2/2): XWN Knowledge Base (2/2) Summary: The rich definitional glosses from WordNet are processed through LCC’s Knowledge Acquisition System (Jaguar) to produce a semantically rich upper ontology The Clusters: Noun glosses are transformed into sets of semantic relations, which are then arranged into individual semantic units called clusters, with one cluster per gloss The Hierarchy: The clusters (representing one noun synset each) are arranged in a hierarchy similar to that of WordNet The Knowledge Base: The generated KB has not only the hierarchy of WordNet, but also a rich semantic representation of each entry in the hierarchy (based on the definitional gloss)Example: WordNet Gloss: Example: WordNet Gloss Tennis is a game played with rackets by two or four players who hit a ball back and forth over a net that divides the court ISA (Tennis, game) AGT (two or four players, play) THM (game, play) INS (rackets, play) MEA (two or four, players) AGT (two or four players, hit) THM (a ball, hit) MNR (back and forth, hit) LOC (over a net that divides the court, hit) AGT (a net, divides) THM (the court, divides)Semantic Cluster of a WordNet Gloss: Semantic Cluster of a WordNet Gloss tennis ISA game player MEA two or four play AGT player THM INS game racket hit AGT player THM MNR ball back and forth LOC over a net divide AGT net THM court Synset ID: 00457626 Name: tennis, lawn_tennisHierarchy (as in WordNet): Hierarchy (as in WordNet) tennis basketball squash court game athletic game outdoor game golf croquet Jaguar: Knowledge Extraction From Text: Jaguar: Knowledge Extraction From TextJaguar: Knowledge Extraction: Jaguar: Knowledge Extraction Automatically generate ontologies and structured knowledge bases from text Ontologies form the framework or “skeleton” of the knowledge base Rich set of semantic relations form the “muscle” that connects concepts in the knowledge baseJaguar : Knowledge Extraction: Automatically generate ontologies and structured knowledge bases from text Ontologies form the framework or “skeleton” of the knowledge base Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base Jaguar : Knowledge ExtractionAutomatically Building the Ontology: Automatically Building the Ontology Jaguar builds an ontology using the following steps Seed words selected either manually or automatically Find sentences in the input documents that contain seed words Parse those sentences and extract semantic relations; focusing on selected relations such as IS-A; Part-Whole; Kinship; Locative; Temporal Integrate the selected semantic relations into the ontology being produced Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations. Finally, use the hyponymy information found in Extended WordNet to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processesResult: Jaguar Knowledge Base: Result: Jaguar Knowledge Base anthrax biological weaponContext & Events: Detection, Classification & Extraction: Context & Events: Detection, Classification & ExtractionTypes of Context: Types of Context Temporal It rained on July 7th Spatial It rained in Dallas Report John said “It rains” Belief John thinks that it rains Volitional John wants it to rain Planning It is scheduled to rain Conditional If it’s cloudy, it will rain Possibility It might rain Events in Text: Events in Text Basic Definition: X is an Event, if X is a possible answer to the question: What happened? Applying Definition to Verbs and Nouns Verb V is an Event if the sentence: Someone/something V-ed (someone/something) is an answer to the question “What happened”? Noun N is an Event if the sentence: There was/were (a/an) N is an answer to the question “What happened”?Events in Text: Events in Text Most Adjectives are not potential Events Verbal 'adjectives' are treated as verbs. eg. 'lost', 'admired' Factatives ('Light' Verbs) are not separate events Suffer-a Loss; Take-a Test; Perform-an Operation Aspectual Markers Can Combine with a Wide Range of Events e.g., Stop, Completion, Start, Continue, Fail, Succeed, Try Modalities are not separate events Possibility, Necessity, Prescription, Suggestion, OptativeEvent Detection: Event Detection Approach for Event Detection Annotate WordNet synsets that are Event concepts Annotation completed for Noun and Verb hierarchies Detect events by lexical lookup for concepts in annotated WordNet Project Status Prototype implemented for Event detection Run Benchmarks Precision: 93%, Recall: 79% Currently Tuning PerformanceEvent Extraction – Future: Event Extraction – Future Event Structures for Modelling Discourse Aspect (Start, Complete, Continue, Succeed, Fail, Try) Modality (Possibility, Necessity, Optativity) Event Participants (Actors, Undergoers, Instruments) Context (Spatial, Temporal, Intensional) Event Relations (Causation, Partonomy, Similarity, Contrast) Event Taxonomy/Classification Event CompositionLCC’s Areas of Research: Cogex: Reasoning & Inferencing Over Extracted Knowledge LCC’s Areas of ResearchReasoning & Inferences: Example Tasks that Require Both: Reasoning & Inferences: Example Tasks that Require BothTREC Question Answering Track: TREC Question Answering Track TREC Question Answering Track held annual since its inception in TREC-9 (1999) Main Task TREC-2006 QA Track AQUAINT Corpus of English News Text http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 Newswire text data in English, drawn from three sources: Xinhua News Service (People's Republic of China), New York Times News Service Associated Press Worldstream News Service. Roughly 3 GBytes of Text; Million+ documents Test Set: 75 Sets of Questions organized around a common target; where the target is a Person, Organization, Event or Thing Each Series of Question contains 6-9 questions; 4-7 Factoids, 1-2 List, and 1 Other Total: 403 Factoid Questions; 89 List Questions; 75 Other QuestionsTREC-2006 Question Answering Track: TREC-2006 Question Answering Track 145. Target Event John Williams convicted of Murder 145.1 Factoid How many non-white members of the jury were there? 145.2 Factoid Who was the foreman for the jury 145.3 Factoid Where was the Trial held? 145.4 Factoid When was King convicted? 145.5 Factoid Who was the victim of the murder 145.6 List What defense and prosecution attorneys participated in the trial? 145.7 OtherTextual Entailment: Textual Entailment Textual Entailment Textual Entailment Recognition is a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and (multi) document summarization. Task definition: T entails H, denoted by T → H, if the meaning of H can be inferred from the meaning of T PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) RTE (Recognizing Textual Entailment) Challenge RTE-1 (2004-05); RTE-2 (2005-06) and RTE-3 (2006-07) http://www.pascal-network.org/Challenges/RTE/ The Question Answering Task can be interpreted as a Textual Entailment task as follows: Given a Question Q and a possible Answer Text Passage A, the QA task is then one of applying semantic inference to the pair (Q, A) to infer whether or not A contains the Answer to Q.RTE-2: Example TH Pairs: RTE-2: Example TH Pairs Entailment?: “Yes” T: Tibone estimated diamond production at four mines operated by Debswana – Botswana’s 50-50 joint venture with DeBeers – could reach 33 million carats this year. H: Botswana is a business partner of DeBeers. Entailment?: “Yes” T: The EZLN differs from most revolutionary groups by having stopped military action after the initial uprising in the first two weeks of 1994. H: EZLN is a revolutionary group. Entailment?: “No” T: Two persons were injured in dynamite attacks perpetrated this evening against two bank branches in this Northwestern Colombian city. H: Two persons perpetrated dynamite attacks in a Northwestern Colombian city. Entailment?: “No” T: Such a margin of victory would give Abbas a clear mandate to renew peace talks with Israel, rein in militants and reform the corruption-riddled Palestinian Authority. H: The new Palestinian president combated corruption and revived the Palestinian economy.Cogex: Logic Prover: Cogex: Logic ProverSemantically Enhanced COGEX: Semantically Enhanced COGEX Answer / Entailment NL Justification Q /T A / H Q/T LF A/H LF Axioms Lex Chains Axiom Building Temporal Axioms Logic Forms XWN KBase Semantic Calculus Context Semantic Parser Relaxation Logic Prover Answer Or Entailment Ranking Linguistic Axioms World K AxiomsOutput of Semantic Parser: Output of Semantic Parser Question: What is the Muslim Brotherhood's goal? The output of the semantic parser: PURPOSE(x, Muslim Brotherhood) Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992. The output of the semantic parser: AGENT(Muslim Brotherhood, advocate) PURPOSE(turning Egypt into a strict Muslim state, advocate) TEMPORAL(1928, establish) TEMPORAL(1992, took up arms) PROPERTY(strict, Muslim state) MEANS(political means, turning Egypt into a strict Muslim state) SYNONYMY(Muslim Brotherhood, Egypt's biggest fundamentalist group)Generation of Logical Forms: Generation of Logical Forms Question: What is the Muslim Brotherhood's goal? Question Logical Form (QLF): (exists x0 x1 x2 x3 (Muslim_NN(x0) & Brotherhood_NN(x1) & nn_NNC(x2,x0,x1) & PURPOSE_SR(x3,x2))). Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992. Answer Logical Form (AFL): (exists e1 e2 e3 e4 e5 e6 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 (Muslim_NN(x1) & Brotherhood_NN(x2) & nn_NNC(x3,x1,x2) & Egypt_NN(x4) & _s_POS(x5,x4) & biggest_JJ(x5) & fundamentalist_JJ(x5) & group_NN(x5) & SYNONYMY_SR(x3,x5) & establish_VB(e1,x20,x5) & in_IN(e1,x6) & 1928_CD(x6) & TEMPORAL_SR(x6,e1) & advocate_VB(e2,x5,x21) & AGENT_SR(x5,e2) & PURPOSE_SR(e3,e2) & turn_VB(e3,x5,x7) & Egypt_NN(x7) & into_IN(e3,x8) & strict_JJ(x15,x14) & Muslim_NN(x8) & state_NN(x13) & nn_NNC(x14,x8,x13) & PROPERTY_SR(x15,x14) & by_IN(e3,x9) & political_JJ(x9) & means_NN(x9) & MEANS_SR(x9,e3) & set_VB(e5,x5,x5) & itself_PRP(x5) & apart_RB(e5) & from_IN(e5,x10) & militant_JJ(x10) & group_NN(x10) & take_VB(e6,x10,x12) & up_IN(e6,x11) & arms_NN(x11) & in_IN(e6,x12) & 1992_CD(x12) & TEMPORAL_SR(x12,e6)). Lexical Chains & Axioms: On Demand Input into Cogex: Lexical Chains & Axioms: On Demand Input into CogexLexical Chains from XWN: Lexical Chains from XWN Lexical chains Lexical Chains establish connections between semantically related concepts, i.e. WordNet synsets. (note concepts, not words which means Word Sense Disambiguation is necessary) Concepts and relations along the lexical chain explain the semantic connectivity of the end concepts Lexical chains start by using WordNet relations (ISA, Part-Whole) and gloss co-occurrence (weak relation) XWN Knowledge Base then adds more meaningful (precise) relations “Tennis a game played with rackets by two or four players…” Prior to XWN-KB: ‘tennis’ ‘two or four’ (gloss co-occurrence) With XWN-KB: ‘tennis’ ‘game’ ‘play’ ‘player’ ‘two or four’ ISA AGT THM MEAExamples of Lexical Chains: Examples of Lexical Chains Question: How were biological agents acquired by bin Laden? Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in Lexical Chain: ( V - buy#1, purchase#1 ) – HYPERNYM (V - get#1, acquire#1 ) Question: How did Adolf Hitler die? Answer: … Adolf Hitler committed suicide … Lexical Chain: ( N - suicide#1, self-destruction#1, self-annihilation#1 ) – GLOSS ( V - kill#1 ) – GLOSS ( V - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 )Propagating syntactic structures along the chain: Propagating syntactic structures along the chain The goal is to filter out unacceptable chains, and to improve the ranking of chains when multiple chains can be established Example 1: AGENT Q: Who did Floyd Patterson beat to win the title? PATIENT WA: He saw Ingemar Johanson knock down Floyd Patterson seven times there in winning the title. V - beat#2 – entail V - hit#4 – derivation N - hitting#1,striking#2 – derivation V - strike#2 – hyponym V - knock-down#2 Example 2: AGENT THEME MEASURE S1: John bought a cowboy hat for $50. AGENT MEASURE THEME S2: John paid $50 for a cowboy hat. V - buy#1 – entail V - pay#1Axioms on Demand (1/3): Axioms on Demand (1/3) Extract world knowledge, in the form of axioms, from text or other resources automatically and “on demand” When the logic prover runs out of rules to use, it can request one from external knowledge sources Will ask for a rule connecting two concepts Generate axioms on the fly from multiple knowledge sources WordNet and eXtended WordNet: glosses and lexical chains Instantiation of NLP rules Open text from a trusted source (dictionary, encyclopedia, textbook on a relevant topic, etc.) An automatically-built knowledge baseAxioms on Demand (2/3): Axioms on Demand (2/3) eXtended WordNet axiom generator Question: What all can a ‘player’ do? Look at all contexts with ‘player’ as AGT Gloss of ‘tennis’: a ‘player’ can ‘hit’ (a ball), ‘play’ (a game) Gloss of ‘squash’: A ‘player’ can ‘strike’ (a ball), etc Connect related-concepts kidnap_VB(e1,x1,x2) -> kidnapper_NN(x1) (asian_JJ(x1,x2) asia_NN(x1) & _continent_NE(x1)) World Knowledge axioms WordNet glosses jungle_cat_NN(x1) -> small_JJ(x2,x1) & Asiatic_JJ(x3,x1) & wildcat_NN(x1) NLP axioms Linguistic rewriting rules Gilda_NN(x1) & Flores_NN(x2) & nn_NNC(x3,x1,x2) -> Flores_NN(x3)Axioms on Demand (3/3): Axioms on Demand (3/3) Semantic Relation Calculus Combine two or more local semantic relations to establish broader semantic relations Increase the semantic connectivity Mike is a rich man → Mike is rich ISA_SR(Mike,man) & PAH_SR(man,rich) →PAH_SR(Mike,rich) John lives in Dallas, Texas John lives in Texas. LOC(John,Dallas) & PW(Dallas,Texas) -> LOC(John, Texas) Temporal Axioms Time Transitivity of Events during_CTMP(e1,e2) & during_CTMP(e2,e3) during_CTMP(e1,e3) Dates entail more general times October 2000 → year 2000 Semantic Calculus Temporal Axioms AxiomsContextual Knowledge Axioms: Contextual Knowledge Axioms Examples If someone boards a plane and the flight takes 3 hours, then that person travels for 3 hours The person leaves at the same time and arrives at the same time with the traveling plane If the departure of a vehicle has a destination and the vehicle arrives at the destination then the arrival is located at the destination If something is exactly located somewhere, then nothing else is exactly located in the same place If a Process is located in an area, then all sub Processes of the Process are located in the same area Contextual Knowledge Axioms AxiomsLogic Prover: The Heart of Cogex: Logic Prover: The Heart of CogexLogic Prover (1/2): Logic Prover (1/2) A first order logic resolution style theorem prover Inference rule sets are based on hyperresolution and paramodulation Transform the two text fragments into 4-layered logic forms based upon LCC’s Syntactic, Semantic, Contextual and Event Processing and Analysis Automatically create “Axioms on Demand” to be used during the proof Lexical Chains axioms World Knowledge axioms Linguistic transformation axioms Contextual / Temporal axiomsLogic Prover (2/2): Logic Prover (2/2) Load COGEX’s SOS (Set of Support) with Candidate Answer Passage(s) A and Question Q and its USABLE list of clauses with the generated axioms, semantic and temporal axioms Search for a proof by iteratively removing clauses from SOS and searching the USABLE for possible inferences until a refutation is found If no contradiction is detected Relax arguments Drop entire predicates from H Compute “Proof Score” for each Candidate Select best Result & Generate NL JustificationReasoning & Inference: How Well Does LCC Do?: Reasoning & Inference: How Well Does LCC Do?Evaluations: QA (TREC-06): Evaluations: QA (TREC-06) LCC’s PowerAnswer Question Answering (QA) system finished 1st on Factoid Questions and Overall Combined Score. A second LCC QA system, Chaucer, finished 2nd in both categories in the TREC QA 2006 evaluation. An LCC QA system has finished 1st every year that the TREC QA Evaluation has been conducted (Annually since TREC-8 in 1999) Mean: 18.5% Top Score: 57.8%Evaluations: PASCAL RTE-2: Evaluations: PASCAL RTE-2 LCC’s Groundhog system finished 1st overall at the Second PASCAL Recognizing Textual Entailment Challenge (RTE-2) and LCC’s COGEX system finished 2nd. (http://www.pascal-network.org/Challenges/RTE/ )Contact Information: Contact Information Home Office 1701 N. Collins Boulevard Suite 2000 Richardson, TX 75080 972-231-0052 (Voice) 972-231-0012 (Fax) Maryland Office 6179 Campfire Columbia, MD 21045 410-715-0777 (Voice) 410-715-0774 (Fax) 443-878-8894 (Cell)Slide60: June Sunrise over Kirkwall Bay in the Orkney Islands of Scotland Your Questions & Comments