logging in or signing up Huettner QA systems 00 04 11 Melinda Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 69 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 16, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Questioning and Answering: Questioning and Answering Alison Huettner CLARITECH Corporation April 11, 2000Why question answering?: Why question answering? Focussed information needs Lack of time to read through documents Inexperienced searchers High expectations Fragment of AV query log: Fragment of AV query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? how to do a research paper which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx ceramicsweb how to chat? where is silhouettes catelog? how to build a pyramid walleye fishing how tall is the sears tower?What techniques are available?: What techniques are available? Ordinary document search techniques Electronic dictionaries, encyclopedias, atlases Hand indexing “Clickthrough” information Knowledge-base intense systems Forums for users to exchange information Desire for a truly open-ended QA systemExisting resourcesmay be adequate...: Existing resources may be adequate... What is a codling? Who wrote The Complete Book of Running? When was the saxophone invented? Where can I find out information about West German beer steins? Show me all cases referencing Robbins vs. State of Florida. ...or answers may be elusive: ...or answers may be elusive How much is a ton of asphalt? What percentage of Americans have children? What state has the most Republicans? Who on Wall Street has been found guilty of insider trading since 1982? Text REtrieval Conference (TREC8): Text REtrieval Conference (TREC8) Standardized/judged question answering evaluation - what is the state of the art? 198 short-answer, fact-based questions 250- or 50-byte answers (five answers in order of confidence) Scoring by mean reciprocal rank 20 groups submitted out of 23 participating CLARITECH’s approach: CLARITECH’s approach NLP-based information retrieval (IR) Named entity (NE) extraction Question analysis Question/answer matching Answer ranking deeper NLPBasic CLARIT IR: Basic CLARIT IR Shallow parsing to detect candidate noun phrases (NPs) Indexing on NPs, attested subphrases, constituent words Subdocuments of 8-10 sentences Optional thesaurus extraction and feedbackCLARIT IR adapted for QA: CLARIT IR adapted for QA Requires some modifications Retain/index verbs, adjectives, adverbs Retrieve smaller subdocuments (1-3 sentences) Prefer subdocs with more of the search terms With modifications, already a reasonable strategy for the 250-byte task Narrows the problem significantly for the 50-byte taskBasic CLARIT NE extraction: Basic CLARIT NE extraction Technology developed for populating DBs and supporting relationship discovery Exploits semantic types – both lists and naming patterns Can index entities by type as part of IR Serendipity for answer identificationPreliminary question analysis: Preliminary question analysis Question word cues Who, when, where, how, why Head noun cues What city, which country, what year... Which astronaut, what blues band, ... Scalar adjective cues How long, how fast, how far, how old, ... Focus cues What is the smallest country in Europe? What is the major export from Thailand?Existing general NE extractors: Existing general NE extractors Person: Mr. Hubert J. Smith, Adm. McInnes, Grace Chan Title: Chairman, Vice President of Technology, Undersecretary of State Country: USSR, France, Haiti, Haitian Republic City: New York, Rome, Paris, Birmingham, Seneca Falls Province: Kansas, Yorkshire, Uttar Pradesh Business: GTE Corporation, FreeMarkets Inc., Ralston-Purina Co. University: Bryn Mawr College, University of Iowa Organization: Allen Art Museum, Boys and Girls Club, Irish Republican Army Currency: 400 yen, $100, DM450,000 Additional extractors for QA: Additional extractors for QA Linear: 10 feet, 100 miles, 15 centimeters Area: a square foot, 15 acres Volume: 6 cubic feet, 100 gallons Weight: 10 pounds, half a ton, 100 kilos Duration: 10 day, five minutes, 3 years, a millennium Frequency: daily, biannually, 5 times, 3 times a day Speed: 6 miles per hour, 15 feet per second, 5 kph Age: 3 weeks old, 10-year-old, 50 years of ageCLARIT NE adapted for QA: CLARIT NE adapted for QA But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, AungSan Suu Kyi - leader of the opposition party which won a landslide victoryin the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near itseastern border with Thailand, ignored a 1990 election victory by anopposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly theelderly and women and children, are crossing into Bangladesh each day. Who won the Nobel Peace Prize in 1991?Limitations: Limitations Not all questions contain semantic cues What caused the decline in India’s tiger population? Not all cues lend themselves to NE What actor was the first to be named a British peer? Passages may contain no entities, the wrong entities, or multiple entities Hisako Takahashi, a former director general of the labour ministry, has been named as Japan’s first female supreme court justice, writes Emiko Terazono. Approach is blind to structural cues Who shot Lee Harvey Oswald?NLP revisited: NLP revisited Deeper NLP is expensive over large databases, but feasible on short passages Linear order and structural information can Identify some answers in default of obvious semantic cues Differentiate among competing answers Rule out prominent but incorrect answers Basic CLARIT NLP: Basic CLARIT NLP Deterministic part-of-speech tagging Normalization Non-hierarchical, “chunking” parser Discards function words Biassed towards nouns and NPsCLARIT NLP adapted for QA: CLARIT NLP adapted for QA Improved, context-sensitive part-of-speech tagging CLARIT entity extraction “Greedy” complex noun phrase (CNP) construction Hierarchical representation capturing both syntactic and semantic informationQuestion analysis: Question analysis Question and answer patterns may reference individual words (e.g., who), extraction entities (e.g., xcity), or any constituent above the tag level (e.g., NP, CNP). Who commanded British troops at Dunkirk?Question/answer matching: Question/answer matching Question representation is compared with several hundred “sketchy patterns” A match on a sketchy pattern Associates the question with a question type Identifies and indexes the most important elements in a question of this type Indicates the possible locations of the answer with respect to the indexed question elements Indicates the semantic type of the answer, wherever possible, and requires an element of that type in retrieved subdocumentsQuestion typing: Question typing Who discovered radium? who AVerb1 CNP1 (xperson) ANSWERS (xperson) []* AVerb1 CNP1 # X discovered radium CNP1 []* PVerb1 []* by (xperson) # radium was discovered by X CNP1 []* Rel (xperson) []* AVerb1 !CNP # radium, which X discoveredMatching heuristics: Matching heuristics Elements of a question which are not indexed are treated as “bonus matches” Their structural position is unspecified They need not appear in a candidate answer passage, but the candidate answer is ranked higher when they do Dates are always treated as bonus matches Elements of a given question type may be explicitly declared to be bonus matchesCNP matching heuristics: CNP matching heuristics Complex noun phrases (CNPs) may cross constituent boundaries Who wrote the Complete Book of Running? but Who commanded British troops at Dunkirk? A complete CNP match is given extra points, but A match on the head noun phrase is sufficient British troops were commanded by...Additional matching heuristics: Additional matching heuristics Second-choice named entity matches Who first patented a DNA sequence? Who won the Boer War? Overlap (but not identity) with query term Who was President of the United States in 1982? President Ronald Reagan...Answer ranking: Answer ranking Rank of retrieved subdocument Entity type match Number of elements matched in sketchy answer pattern Goodness of sketchy pattern match Number of bonus matches found Exactness of CNP matches Number of times retrieved Data flow: Data flow Data flow, cont.: Data flow, cont. Additional factors: Additional factors Ontology Which astronaut, what mineral, ... Quoted strings Who wrote “Afternoon On A Hill”? Paraphrase Wording of answer may not match question Plausibility Stonehenge is 14 inches highSample TREC questions: Sample TREC questions 1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? 2. What was the monetary value of the Nobel Peace Prize in 1989? 3. What does the Peugeot company manufacture? 4. How much did Mercury spend on advertising in 1993? 5. What is the name of the managing director of Apricot Computer? 6. Why did David Koresh ask the FBI for a word processor? 7. What debts did Qintex group leave? 8. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?Overview of TREC strategies: Overview of TREC strategies NE POS Syn.Str. Ont. VSyn. Score Cymfony 50 .660 SMU 250/50 .646/.555 AT&T 250/250 .545 GePenn 250 .510 Mulitext 250 .471 RMIT 250 .453 Xerox 250 .453 NTTData 250 .439 MITRE 250 .434 IBM 250/250 .430/.395 UMass 250 .383 ?Conclusions: Conclusions Baseline performance is better than expected Viable question answering systems are on the horizon Good IR is necessary but not sufficient Minimal NLP is both helpful and feasible The End: The End You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Huettner QA systems 00 04 11 Melinda Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 69 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 16, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Questioning and Answering: Questioning and Answering Alison Huettner CLARITECH Corporation April 11, 2000Why question answering?: Why question answering? Focussed information needs Lack of time to read through documents Inexperienced searchers High expectations Fragment of AV query log: Fragment of AV query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? how to do a research paper which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx ceramicsweb how to chat? where is silhouettes catelog? how to build a pyramid walleye fishing how tall is the sears tower?What techniques are available?: What techniques are available? Ordinary document search techniques Electronic dictionaries, encyclopedias, atlases Hand indexing “Clickthrough” information Knowledge-base intense systems Forums for users to exchange information Desire for a truly open-ended QA systemExisting resourcesmay be adequate...: Existing resources may be adequate... What is a codling? Who wrote The Complete Book of Running? When was the saxophone invented? Where can I find out information about West German beer steins? Show me all cases referencing Robbins vs. State of Florida. ...or answers may be elusive: ...or answers may be elusive How much is a ton of asphalt? What percentage of Americans have children? What state has the most Republicans? Who on Wall Street has been found guilty of insider trading since 1982? Text REtrieval Conference (TREC8): Text REtrieval Conference (TREC8) Standardized/judged question answering evaluation - what is the state of the art? 198 short-answer, fact-based questions 250- or 50-byte answers (five answers in order of confidence) Scoring by mean reciprocal rank 20 groups submitted out of 23 participating CLARITECH’s approach: CLARITECH’s approach NLP-based information retrieval (IR) Named entity (NE) extraction Question analysis Question/answer matching Answer ranking deeper NLPBasic CLARIT IR: Basic CLARIT IR Shallow parsing to detect candidate noun phrases (NPs) Indexing on NPs, attested subphrases, constituent words Subdocuments of 8-10 sentences Optional thesaurus extraction and feedbackCLARIT IR adapted for QA: CLARIT IR adapted for QA Requires some modifications Retain/index verbs, adjectives, adverbs Retrieve smaller subdocuments (1-3 sentences) Prefer subdocs with more of the search terms With modifications, already a reasonable strategy for the 250-byte task Narrows the problem significantly for the 50-byte taskBasic CLARIT NE extraction: Basic CLARIT NE extraction Technology developed for populating DBs and supporting relationship discovery Exploits semantic types – both lists and naming patterns Can index entities by type as part of IR Serendipity for answer identificationPreliminary question analysis: Preliminary question analysis Question word cues Who, when, where, how, why Head noun cues What city, which country, what year... Which astronaut, what blues band, ... Scalar adjective cues How long, how fast, how far, how old, ... Focus cues What is the smallest country in Europe? What is the major export from Thailand?Existing general NE extractors: Existing general NE extractors Person: Mr. Hubert J. Smith, Adm. McInnes, Grace Chan Title: Chairman, Vice President of Technology, Undersecretary of State Country: USSR, France, Haiti, Haitian Republic City: New York, Rome, Paris, Birmingham, Seneca Falls Province: Kansas, Yorkshire, Uttar Pradesh Business: GTE Corporation, FreeMarkets Inc., Ralston-Purina Co. University: Bryn Mawr College, University of Iowa Organization: Allen Art Museum, Boys and Girls Club, Irish Republican Army Currency: 400 yen, $100, DM450,000 Additional extractors for QA: Additional extractors for QA Linear: 10 feet, 100 miles, 15 centimeters Area: a square foot, 15 acres Volume: 6 cubic feet, 100 gallons Weight: 10 pounds, half a ton, 100 kilos Duration: 10 day, five minutes, 3 years, a millennium Frequency: daily, biannually, 5 times, 3 times a day Speed: 6 miles per hour, 15 feet per second, 5 kph Age: 3 weeks old, 10-year-old, 50 years of ageCLARIT NE adapted for QA: CLARIT NE adapted for QA But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, AungSan Suu Kyi - leader of the opposition party which won a landslide victoryin the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near itseastern border with Thailand, ignored a 1990 election victory by anopposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly theelderly and women and children, are crossing into Bangladesh each day. Who won the Nobel Peace Prize in 1991?Limitations: Limitations Not all questions contain semantic cues What caused the decline in India’s tiger population? Not all cues lend themselves to NE What actor was the first to be named a British peer? Passages may contain no entities, the wrong entities, or multiple entities Hisako Takahashi, a former director general of the labour ministry, has been named as Japan’s first female supreme court justice, writes Emiko Terazono. Approach is blind to structural cues Who shot Lee Harvey Oswald?NLP revisited: NLP revisited Deeper NLP is expensive over large databases, but feasible on short passages Linear order and structural information can Identify some answers in default of obvious semantic cues Differentiate among competing answers Rule out prominent but incorrect answers Basic CLARIT NLP: Basic CLARIT NLP Deterministic part-of-speech tagging Normalization Non-hierarchical, “chunking” parser Discards function words Biassed towards nouns and NPsCLARIT NLP adapted for QA: CLARIT NLP adapted for QA Improved, context-sensitive part-of-speech tagging CLARIT entity extraction “Greedy” complex noun phrase (CNP) construction Hierarchical representation capturing both syntactic and semantic informationQuestion analysis: Question analysis Question and answer patterns may reference individual words (e.g., who), extraction entities (e.g., xcity), or any constituent above the tag level (e.g., NP, CNP). Who commanded British troops at Dunkirk?Question/answer matching: Question/answer matching Question representation is compared with several hundred “sketchy patterns” A match on a sketchy pattern Associates the question with a question type Identifies and indexes the most important elements in a question of this type Indicates the possible locations of the answer with respect to the indexed question elements Indicates the semantic type of the answer, wherever possible, and requires an element of that type in retrieved subdocumentsQuestion typing: Question typing Who discovered radium? who AVerb1 CNP1 (xperson) ANSWERS (xperson) []* AVerb1 CNP1 # X discovered radium CNP1 []* PVerb1 []* by (xperson) # radium was discovered by X CNP1 []* Rel (xperson) []* AVerb1 !CNP # radium, which X discoveredMatching heuristics: Matching heuristics Elements of a question which are not indexed are treated as “bonus matches” Their structural position is unspecified They need not appear in a candidate answer passage, but the candidate answer is ranked higher when they do Dates are always treated as bonus matches Elements of a given question type may be explicitly declared to be bonus matchesCNP matching heuristics: CNP matching heuristics Complex noun phrases (CNPs) may cross constituent boundaries Who wrote the Complete Book of Running? but Who commanded British troops at Dunkirk? A complete CNP match is given extra points, but A match on the head noun phrase is sufficient British troops were commanded by...Additional matching heuristics: Additional matching heuristics Second-choice named entity matches Who first patented a DNA sequence? Who won the Boer War? Overlap (but not identity) with query term Who was President of the United States in 1982? President Ronald Reagan...Answer ranking: Answer ranking Rank of retrieved subdocument Entity type match Number of elements matched in sketchy answer pattern Goodness of sketchy pattern match Number of bonus matches found Exactness of CNP matches Number of times retrieved Data flow: Data flow Data flow, cont.: Data flow, cont. Additional factors: Additional factors Ontology Which astronaut, what mineral, ... Quoted strings Who wrote “Afternoon On A Hill”? Paraphrase Wording of answer may not match question Plausibility Stonehenge is 14 inches highSample TREC questions: Sample TREC questions 1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? 2. What was the monetary value of the Nobel Peace Prize in 1989? 3. What does the Peugeot company manufacture? 4. How much did Mercury spend on advertising in 1993? 5. What is the name of the managing director of Apricot Computer? 6. Why did David Koresh ask the FBI for a word processor? 7. What debts did Qintex group leave? 8. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?Overview of TREC strategies: Overview of TREC strategies NE POS Syn.Str. Ont. VSyn. Score Cymfony 50 .660 SMU 250/50 .646/.555 AT&T 250/250 .545 GePenn 250 .510 Mulitext 250 .471 RMIT 250 .453 Xerox 250 .453 NTTData 250 .439 MITRE 250 .434 IBM 250/250 .430/.395 UMass 250 .383 ?Conclusions: Conclusions Baseline performance is better than expected Viable question answering systems are on the horizon Good IR is necessary but not sufficient Minimal NLP is both helpful and feasible The End: The End