Presentation Transcript
The Informative Role of WordNet in Open-Domain Question Answering: The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and
Sanda M. Harabagiu
(NAACL 2001)
Presented by Shauna Eggers
CS 620 February 17, 2004
Introduction: Introduction Information Extraction: not just for keywords anymore!
Massive document collections (databases, webpages) require more sophisticated search techniques than keyword matching
Need way to focus and narrow search improve precision
One solution: Open-Domain Q/A
Find answers to natural language questions from large document collections
Examples:
“What city is the capital of the United Kingdom?”
“Who is the first private citizen to fly in space?”
Text Retrieval Conferences (TREC) evaluate entered systems; show that this sort of task can be performed with “satisfactory accuracy” (Voorhees, 2000)
Q/A: Previous Approach: Q/A: Previous Approach Captures the semantics of the question by recognizing
expected answer type (i.e., its semantic category)
relationship between the answer type and the question concepts/keywords
The Q/A process:
Question processing – Extract concepts/keywords from question
Passage retrieval – Identify passages of text relevant to query
Answer extraction – Extract answer words from passage
Relies on standard IR and IE Techniques
Proximity-based features
Answer often occurs in text near to question keywords
Named-entity Recognizers
Categorize proper names into semantic types (persons, locations, organizations, etc)
Map semantic types to question types (“How long”, “Who”, “What company”)
Problems: Problems NE assumes all answers are named entities
Oversimplifies the generative power of language!
What about: “What kind of flowers did Van Gogh paint?”
Does not account well for morphological, lexical, and semantic alternations
Question terms may not exactly match answer terms; connections between alternations of Q and A terms often not documented in flat dictionary
Example: “When was Berlin’s Brandenburger Tor erected?” no guarantee to match built
Recall suffers
WordNet to the rescue!: WordNet to the rescue! WordNet can be used to inform all three steps of the Q/A process
1. Answer-type recognition (Answer Type Taxonomy)
2. Passage Retrieval (“specificity” constraints)
3. Answer extraction (recognition of keyword alternations)
Using WN’s lexico-semantic info: Examples
“What kind of flowers did Van Gogh paint?”
Answer-type recognition: need to know (a) answer is a kind of flower, and (b) sense of the word flower
WordNet encodes 470 hyponyms of flower sense #1, flowers as plants
Nouns from retrieved passages can be searched against these hyponyms
“When was Berlin’s Brandenburger Tor erected?”
Semantic alternation: erect is a hyponym of sense #1 of build
Interactions between WN and Q/A: Interactions between WN and Q/A Expected Answer Type Keyword Alternations Question Processing Document Processing Answer Processing Index Passage
Retrieval Answer Extraction Question Documents Answer(s) WordNet
WN in Answer-type Recognition: WN in Answer-type Recognition Answer Type Taxonomy
a taxonomy of answer types that incorporates WN information
Acts as an “ontological resource” that can be searched to identify a semantic category (representing answer type)
Used to associate found semantic categories with a named entity extractor
So, still using an NE, but not bound to proper nouns; have found a way to map NEs to more general semantic categories
Developed on principles conceived for Q/A environment (rather than as general onto principles)
Principle 1: Different parts of speech specialize the same answer type
Principle 2: Selected word senses are considered
Principle 3: Completeness of the top hierarchy
Principle 4: Conceptual average of answer types
Principle 5: Correlating the Answer Type Taxonomy with NEs
Principle 6: Mining WordNet for additional knowledge
Answer Type Taxonomy (example): Answer Type Taxonomy (example)
WN in Passage Retrieval: WN in Passage Retrieval Identify relevant passages from text
Extract keywords from the question, and
Pass them to the retrieval module
“Specificity” – filtering question concepts/keywords
Focuses search, improves performance and precision
Question keywords can be omitted from the search if they are too general
Specificity calculated by counting the hyponyms of a given keyword in WordNet
Count ignores proper names and same-headed concepts
Keyword is thrown out if count is above a given threshold (currently 10)
WN in Answer Extraction: WN in Answer Extraction If keywords alone cannot find an acceptable answer, look for alternations in WordNet!
Evaluation: Evaluation Paşca/Harabagiu approach measured against TREC-8 and TREC-9 test collections
WN contributions to Answer Type Recognition
Count number of questions for which acceptable answers were found; 3GB text collection, 893 questions
Evaluation (2): Evaluation (2) WN contributions to Passage Retrieval
Impact of keyword alternations
Impact of specificity knowledge
Conclusions: Conclusions Massive lexico-semantic information must be incorporated into the Q/A process
Using such information encoded in WN improved system precision by 147% (qualitative analysis)
Visions for future:
Extend WN so that online resources like encyclopedias can link to WN concepts
Answer questions like: “Which classic rock group first performed live in Alburquerque?”
Further improve Q/A precision with WN extension projects
Eg, “finding keyword morphological alternations could benefit from derivational morphology, a project extension of WordNet” (Harabagiu et al., 1999)