Question Answering Techniques and Systems : Question Answering Techniques and Systems Mihai Surdeanu TALP Research Center
Dep. Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
surdeanu@lsi.upc.edu
What is the Catalan language? : What is the Catalan language?
What is the Catalan language? : What is the Catalan language?
What is the longest ruling dynasty of Japan? : What is the longest ruling dynasty of Japan?
What is the longest ruling dynasty of Japan? : What is the longest ruling dynasty of Japan? I don’t want to learn Boolean logic:
(dynasty AND (Japan OR Japanese) AND (NOT tempura))
Overview : Overview What is Question Answering?
Generic architectures
Other relevant approaches and systems
Overview : Overview What is Question Answering?
Definition, evaluation, classes of questions
Generic architectures
Other relevant approaches and systems
Problem of Question Answering : Problem of Question Answering What is the nationality of Pope John Paul II?
… stabilize the country with its help, the Catholic hierarchy stoutly held out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the… Natural language question, not keyword queries Short text fragment, not URL list
Beyond Document Retrieval : Document Retrieval
Users submit queries corresponding to their information needs.
System returns (voluminous) list of full-length documents.
It is the responsibility of the users to find information of interest within the returned documents.
Open-Domain Question Answering (QA)
Users ask questions in natural language.
What is the highest volcano in Europe?
System returns list of short answers.
… Under Mount Etna, the highest volcano in Europe, perches the fabulous town …
Often more useful for specific information needs. Beyond Document Retrieval
Evaluating QA Systems : Evaluating QA Systems National Institute of Standards and Technology (NIST) organizes yearly the Text Retrieval Conference (TREC), which has had a QA track for the past 7 years: from 1999 to 2005. Recently, the Cross-Language Evaluation Forum (CLEF) organizes a similar evaluation in Europe.
The document set
Newswire textual documents from LA Times, San Jose Mercury News, Wall Street Journal, NY Times etcetera: over 1M documents now.
Well-formed lexically, syntactically and semantically (were reviewed by professional editors).
The questions
Hundreds of new questions every year, the total is about 2500 for all TRECs.
Task
Initially extract at most 5 answers: long (250B) and short (50B).
Now extract only one exact answer.
Several other sub-tasks added later: definition, list, context.
Metrics
Mean Reciprocal Rank (MRR): each question assigned the reciprocal rank of the first correct answer. If correct answer at position k, the score is 1/k.
Classes of QA Systems (1/2) : Classes of QA Systems (1/2) Capable of processing factual questions
Exact answer exists in text snippets
Answers extracted through keyword manipulations, maybe some morphological operations
With simple reasoning mechanisms
Exact answer exists in text snippets
Some inference required to link answer to question
World and domain knowledge (ontologies) necessary
'How did Socrates die?' '… was poisoned…'
Classes of QA Systems (2/2) : Classes of QA Systems (2/2) Capable of answer fusion
Exact answer does not exist in a single text fragment, but scattered across multiple documents
'How do I assemble a bicycle?'
'Who was president of the United States during the recession?'
Interactive systems
User-system dialog interaction
Require discourse processing, coreference resolution etc.
'What ocean is between Europe and US?' 'How wide?'
Capable of analogical reasoning
Can answer speculative questions where the answer is not explicit in the documents
Systems extract pieces of evidence and use analogical reasoning
'Is the US out of recession?'
Overview : Overview What is Question Answering?
Generic architectures
For factual questions
For definitional questions
For complex, temporal questions
Other relevant approaches and systems
QA Block Architecture : QA Block Architecture Question
Processing Passage
Retrieval Answer
Extraction WordNet NER Parser WordNet NER Parser Document
Retrieval Keywords Passages Question Semantics Q A
Question Processing : Question Processing Understand the
expected answer type Extract and prioritize
question keywords to Answer Extraction to Passage Retrieval Question Processing
Question Stems and Answer Type Examples : Question Stems and Answer Type Examples Other question stems: Who, Which, Name, How hot...
Other answer types: Country, Number, Product... Identify the semantic category of expected answers
Lexical Terms Examples : Lexical Terms Examples Questions approximated by sets of unrelated words (lexical terms)
Similar to bag-of-word IR models
Detecting the Expected Answer Type : Detecting the Expected Answer Type In some cases, the question stem is sufficient to indicate the answer type (AT)
Why REASON
When DATE
In many cases, the question stem is ambiguous
What was the name of Titanic’s captain ?
What U.S. Government agency registers trademarks?
What is the capital of Kosovo?
Question reformulations are hard to be handled with manually-crafted rules
What tourist attractions are there in Barcelona?
What are the names of the tourist attractions in Barcelona?
What do most tourists visit in Barcelona?
Detecting the Expected Answer TypeSMU’s Approach : Detecting the Expected Answer Type SMU’s Approach Converts the question parse to a graph-like 'semantic representation'
The node with the highest connectivity is the 'question focus word' (QFW)
What was the name of Titanic’s captain ?
What U.S. Government agency registers trademarks?
What is the capital of Kosovo?
All hyponyms of certains question focus words are assigned the same class
Building the Question Representation : Building the Question Representation from the question parse tree, bottom-up traversal with a set of propagation rules Why did David Koresh ask the FBI for a word processor WRB VBD NNP NNP VB DT NNP IN DT NN NN WHADVP NP NP NP PP VP SQ SBARQ - assign labels to non-skip leaf nodes
propagate label of head child node, to parent node
link head child node to other children nodes
Building the Question Representation : Building the Question Representation from the question parse tree, bottom-up traversal with a set of propagation rules Why did David Koresh ask the FBI for a word processor WRB VBD NNP NNP VB DT NNP IN DT NN NN WHADVP NP NP NP PP VP SQ SBARQ Question
representation David Koresh ask FBI word processor REASON
AT Detection Algorithm : AT Detection Algorithm Select the question focus word from the question representation:
Select the word(s) connected to the question. Some content-free words are skipped (e.g. 'name').
From the previous set select the word with the highest connectivity in the question representation.
Map the AT word in a previously built AT hierarchy
The AT hierarchy is based on WordNet, with some concepts associated with semantic categories, e.g. 'writer' PERSON.
Select the AT(s) from the first hypernym(s) associated with a semantic category.
Answer Type Hierarchy : Answer Type Hierarchy PERSON PERSON
Evaluation of Answer Type Hierarchy : Evaluation of Answer Type Hierarchy Controlled variation of the number of WordNet synsets included in answer type hierarchy.
Test on 800 TREC questions. 0% 0.296
3% 0.404
10% 0.437
25% 0.451
50% 0.461 Precision score
(50-byte answers) Hierarchy coverage The derivation of the answer type is the main source of unrecoverable errors in the QA system
Discussion (SMU Approach) : Discussion (SMU Approach) Advantages
Robust, handles a large variety of paraphrases.
Can be easily customized: just add AT categories to WordNet synsets.
Disadvantages
Mapping from answer types to WordNet synsets constructed entirely by hand.
Without robust Word Sense Disambiguation (WSD), which synsets should be marked in WordNet?
Example: the noun 'plant' has 4 WordNet senses. The first two are: 'industrial plant' and 'flora'. Obviously, they point to distinct ATs.
What about words that do not appear in WordNet?
Does not handle ambiguity too well: if a QFW maps to more than one answer type, all get the same priority.
Detecting the Expected Answer TypeUIUC’s Approach : Detecting the Expected Answer Type UIUC’s Approach Treats the problem as a typical machine learning (ML) classification task:
A taxonomy of question classes is defined offline;
Training data for each question class is annotated;
Classifiers are trained for each question class using a rich set of features.
Answer Type Taxonomy (1/2) : Answer Type Taxonomy (1/2) Two-layered taxonomy
6 coarse classes
ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC_VALUE
50 fine classes
HUMAN: group, individual, title, description
ENTITY: animal, body, color, currency…
LOCATION: city, country, mountain…
Answer Type Taxonomy (2/2) : Answer Type Taxonomy (2/2)
Answer Type Examples : Answer Type Examples coarse fine
The Ambiguity Problem : The Ambiguity Problem The classification of a specific question can be quite ambiguous
Examples
'What is bipolar disorder?' definition OR disease
'What do bats eat?' food OR plant OR animal
'What is the PH scale?' numeric_value OR definition
Solution: allow assignment of multiple class labels for a single question!
Classifier Features : Classifier Features Lexical features
The question words ('When' date)
N-grams of question words ('How_long' measure)
Syntactic features
Part-of-speech tags of all question words (unigrams, N-grams)
Sequence of phrases in the question (unigrams, N-grams)
Head words of question phrases ('captain' individual). Unigrams and N-grams of head words.
Semantic features
Named entities in the question
Semantically related words ('away' related to 'distance' measure)
Dekang Lin’s proximity-based database (http://www.cs.ualberta.ca/~lindek/).
A small semantic database developed inhouse.
Classification Results : Classification Results more features Trained on (only) 5500 questions.
Tested on 500 questions.
Used a perceptron-based ML system. Top 5 classes used. If only the best class considered, accuracy is ~88%.
Discussion (UIUC Approach) : Discussion (UIUC Approach) Advantages
Performs better than the numbers reported by SMU (informal communication)
Elegant framework to handle the ambiguity problem
Easy to train, no linguistic experience necessary
Not (so) sensitive to WSD ambiguities, because it makes its decisions based on a larger context.
Disadvantages
Somewhat harder to customize: need ~100 question examples for each new class
Question Processing : Question Processing Understand the
expected answer type Extract and prioritize
question keywords to Answer Extraction to Passage Retrieval Question Processing
Keyword Selection : Keyword Selection AT indicates what the question is looking for, but provides insufficient context to locate the answer in very large document collection
Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context
Keyword Selection Algorithm : Keyword Selection Algorithm Select all non-stop words in quotations 10
Select all NNP words in recognized named entities 9
Select all complex nominals with their adjectival modifiers 8
Select all other complex nominals 7
Select all adjectival modifiers 6
Select all other nouns 5
Select all verbs 4
Select all adverbs 3
Select the QFW word (which was skipped in all previous steps) 2
Select all other words 1
Walk-through Example : Walk-through Example Who coined the term 'cyberspace' in his novel 'Neuromancer'? cyberspace/10 Neuromancer/10 term/7 novel/7 coined/4
Keyword Selection Examples : Keyword Selection Examples What researcher discovered the vaccine against Hepatitis-B?
Hepatitis-B, vaccine, discover, researcher
What is the name of the French oceanographer who owned Calypso?
Calypso, French, own, oceanographer
What U.S. government agency registers trademarks?
U.S., government, trademarks, register, agency
What is the capital of Kosovo?
Kosovo, capital
Passage Retrieval : Passage Retrieval Question
Processing Passage
Retrieval Answer
Extraction WordNet NER Parser WordNet NER Parser Document
Retrieval Keywords Passages Question Semantics Q A
Passage Retrieval Architecture : Passage Retrieval Architecture Passage Extraction Passage
Quality Keyword
Adjustment Passage
Scoring Passage
Ordering Keywords No Passages Yes Documents Document
Retrieval Ranked
Passages Adjust the query to retrieve more/fewer passages We may want/need to eliminate some passages…
Passage Extraction Loop : Passage Extraction Loop Passage Extraction Component
Extracts passages that contain all selected keywords
Passage size dynamic.
Start position dynamic.
If the passage size or offset are static you may end the passage in the middle of the answer, or the answer context!
Passage quality and keyword adjustment
In the first iteration use the first 6 keyword selection heuristics
If the number of passages is lower than a threshold query is too strict drop a keyword
If the number of passages is higher than a threshold query is too relaxed add a keyword
Passage Scoring (1/2) : Passage Scoring (1/2) Passages are scored based on keyword windows
For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built: k1 k2
k3
k2
k1 Window 1 k1 k2
k3
k2
k1 Window 2 k1 k2
k3
k2
k1 Window 3 k1 k2
k3
k2
k1 Window 4
Passage Scoring (2/2) : Passage Scoring (2/2) Passage ordering is performed using a radix sort that involves three scores: largest SameWordSequenceScore, smallest DistanceScore, smallest MissingKeywordScore.
SameWordSequenceScore
Computes the number of words from the question that are recognized in the same sequence in the window.
Intuition: passages with the same keyword order as the question are better.
DistanceScore
The number of words that separate the most distant keywords in the window.
Intuition: passages with denser keywords are better.
MissingKeywordScore
The number of unmatched keywords in the window
Intuition: passages with fewer missing keywords are better.
Essentially it is an optimization step drop passages below a certain threshold speed improvement!
If top 1000 passages are maintained more than 80% of the questions have at least 1 correct passage.
Answer Extraction : Answer Extraction Question
Processing Passage
Retrieval Answer
Extraction WordNet NER Parser WordNet NER Parser Document
Retrieval Keywords Passages Question Semantics Q A
Ranking Candidate Answers : Ranking Candidate Answers Answer type: Person
Text passage: 'Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in 'Raiders of the Lost Ark', plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...'
Best candidate answer: Christa McAuliffe Q066: Name the first private citizen to fly in space.
Features for Answer Ranking : Features for Answer Ranking relNMW – number of question terms matched in the answer passage
relSP – number of question terms matched in the same phrase as the candidate answer
relSS – number of question terms matched in the same sentence as the candidate answer
relFP – flag set to 1 if the candidate answer is followed by a punctuation sign
relOCTW – number of question terms matched, separated from the candidate answer by at most three words and one comma
relSWS – number of terms occurring in the same order in the answer passage as in the question
relDTW – average distance from candidate answer to question term matches Robust heuristics that work on unrestricted text!
Answer Ranking based on Machine Learning : Answer Ranking based on Machine Learning Relative relevance score computed for each pair of candidates (answer windows)
relPAIR = wSWS relSWS + wFP relFP
+ wOCTW relOCTW + wSP relSP + wSS relSS
+ wNMW relNMW + wDTW relDTW + threshold
if relPAIR positive, then first candidate from pair is more relevant
Perceptron model used to learn the weights
published by Marius Pasca, SIGIR 2001
Scores in the 50% MRR for short answers (50 bytes), in the 60% MRR for long answers (250 bytes)
MRR – reciprocal rank of the first correct answer, e.g. 1/3 of the first correct answer is on the third position
Evaluation on the Web : Evaluation on the Web test on 350 questions from TREC (Q250-Q600)
extract 250-byte answers
Overview : Overview What is Question Answering?
Generic architectures
For factual questions
For definitional questions
For complex, temporal questions
Other relevant approaches and systems
System Extension:Definition Questions : System Extension: Definition Questions Definition questions ask about the definition or description of a concept:
Who is John Galt?
What is anorexia nervosa?
Many 'information nuggets' are acceptable answers
Who is George W. Bush?
… George W. Bush, the 43rd President of the United States…
George W. Bush defeated Democratic incumbent Ann Richards to become the 46th Governor of the State of Texas…
Scoring
Any information nugget is acceptable
Precision score over all information nuggets
Answer Detection with Pattern Matching : Answer Detection with Pattern Matching
Answer Detection with Concept Expansion : Answer Detection with Concept Expansion Problem: lexico/syntactic patterns have the tendency to over-match need additional semantic constraints
Solution:
Favor patterns where the AP is semantically related to the phrase to define
WordNet hypernyms (more general concepts)
Evaluation on Definition Questions : Evaluation on Definition Questions Determine the impact of answer type detection with pattern matching and concept expansion
test on the Definition questions from TREC-9 and TREC-10 (approx. 200 questions)
extract 50-byte answers
Results
precision score: 0.56
questions with a correct answer among top 5 returned answers: 0.67
Overview : Overview What is Question Answering?
Generic architectures
For factual questions
For definitional questions
For complex, temporal questions
Other relevant approaches and systems
Simple and Complex Temporal Questions : Simple and Complex Temporal Questions The previous factual QA system can answer simple temporal questions, where the AT is a date, or that include simple temporal expressions
'When did Bob Marley die?'
'Who won the U.S. Open in 1999?'
This system can not answer more complex questions that require detection of temporal properties or event ordering:
'Who was spokesman of the Soviet Embassy in Baghdad during the invasion of Kuwait?'
'Is Bill Clinton currently the President of the United States?'
Temporal Question Taxonomy (1/2) : Temporal Question Taxonomy (1/2) Simple Temporal Questions:
Type 1: Single-event temporal questions without temporal expressions (TE).
'When did Jordan close the port of Aqaba to Kuwait?'
Type 2: Single-event temporal questions with temporal expressions.
'Who won the 1998 New Hampshire republican primary?'
Temporal Question Taxonomy (2/2) : Temporal Question Taxonomy (2/2) Complex temporal questions
Type 3: Multiple-event temporal questions with temporal expression.
'What did George Bush do after the U.N. Security Council ordered a global embargo on trade with Iraq in August 1990?'
Temporal signal: 'after'
Temporal constraint: 'between 1/8/1990 and 31/8/1990'
Type 4: Multiple-event temporal questions without temporal expression.
'What happened to oil prices after the Iraqi annexation of Kuwait?'
Temporal signal: 'after'
Approach Overview : Approach Overview Decompose the question into simpler factual questions
'Who was spokesman of the Soviet Embassy in Baghdad during the invasion of Kuwait?'
'Who was spokesman of the Soviet Embassy in Baghdad?'
'When did the invasion of Kuwait occur?'
Look for all possible answers to the first question.
Look for all possible answers to the second question.
Give as final answer one the answers to the first question whose associated date is consistent with the answer to the second question.
Architecture of the Temporal QA System : Architecture of the Temporal QA System
Decision Tree for Type Identification : Decision Tree for Type Identification Temporal signals: After, When, Before, During, While, For…
Algorithm for Question Splitting : Algorithm for Question Splitting
Question Splitting Examples : Question Splitting Examples 'Where did Bill Clinton study before going to Oxford University?'
Temporal signal: 'before'
Q1: 'Where did Bill Clinton study?'
Q2: 'When did Bill Clinton go to Oxfort University?'
'What did George Bush do after the U.N. Security Council ordered a global embargo on trade with Iraq in August 1990?'
Temporal signal: 'after'
Temporal expression: 'in August 1990'
Q1: 'What did George Bush do?'
Q2: 'When did the U.N. Security Council order a global embargo on trade with Iraq in August 1990?'
Question Decomposition Evaluation : Question Decomposition Evaluation
Overview : Overview What is Question Answering?
Generic architectures
For factual questions
For definitional questions
For complex, temporal questions
Other relevant approaches and systems
LCC´s PowerAnswer + COGEX
IBM’s PIQUANT
CMU’s Javelin
ISI’s TextMap
BBN’s AQUA
PowerAnswer + COGEX (1/2) : PowerAnswer + COGEX (1/2) Automated reasoning for QA: A Q, using a logic prover. Facilititates both answer validation and answer extraction.
Both question and answer(s) transformed in logic forms. Example:
Heavy selling of Standard andamp; Poor’s 500-stock index futures in Chicago relentlessly beat stocks downwards.
Heavy_JJ(x1) andamp; selling_NN(x1) andamp; of_IN(x1,x6) andamp; Standard_NN(x2) andamp; andamp;_CC(x13,x2,x3) andamp; Poor(x3) andamp; ‘s_POS(x6,x13) andamp; 500-stock_JJ(x6) andamp; index_NN(x4) andamp; futures(x5) andamp; nn_NNC(x6,x4,x5) andamp; in_IN(x1,x8) andamp; Chicago_NNP(x8) andamp; relentlessly_RB(e12) andamp; beat_VB(e12,x1,x9) andamp; stocks_NN(x9) andamp; downward_RB(e12)
PowerAnswer + COGEX (2/2) : PowerAnswer + COGEX (2/2) World knowledge from:
WordNet glosses converted to logic forms in the eXtended WordNet (XWN) project (http://www.utdallas.edu/~moldovan)
Lexical chains
game:n#3 HYPERNYM recreation:n#1 HYPONYM sport:n#1
Argentine:a#1 GLOSS Argentina:n#1
NLP axioms to handle complex NPs, coordinations, appositions, equivalence classes for prepositions etcetera
… Barcelona, the capital of Catalonia, …
Capital AND Catalonia Barcelona
Named-entity recognizer
John Galt HUMAN
A relaxation mechanism is used to iteratively uncouple predicates, remove terms from LFs. The proofs are penalized based on the amount of relaxation involved.
PowerAnswer: Discussion : PowerAnswer: Discussion Advantages
Elegant, formal mechanism for QA
Proves if an answer is correct or not, rather than offering answer ranking
Disadvantages
Requires many NLP tools: complete syntactic analysis, WSD prone to errors, slow
Can not handle non-monotonic language constructs
Monotonicity of QA: if A answers Q, than adding more words to A does not change the fact that A is still a correct answer.
Not true for: negations, non-factive verbs (claim, think), numeric missmatches, etc.
Where is Barcelona located?
Barcelona is not located in France.
I think Barcelona is in France.
IBM’s Piquant : IBM’s Piquant Question processing conceptually similar to SMU, but a series of different strategies ('agents') available for answer extraction. For each question type, multiple agents might run in parallel.
Reasoning engine and general-purpose ontology from Cyc used as sanity checker.
Answer resolution: remaining answers are normalized and a voting strategy is used to select the 'correct' (meaning most redundant) answer.
Piquant QA Agents : Piquant QA Agents Predictive annotation agent
'Predictive annotation' = the technique of indexing named entities and other NL constructs along with lexical terms. Lemur has built-in support for this now.
General-purpose agent, used for almost all question types.
Statistical Query Agent
Derivation from a probabilistic IR model, also developed at IBM.
Also general-purpose.
Description Query
Generic descriptions: appositions, parenthetical expressions.
Applied mostly to definition questions.
Structured Knowledge Agent
Answers from WordNet/Cyc.
Applied whenever possible.
Pattern-Based Agent
Looks for specific syntactic patterns based on the question form.
Applied when the answer is expected in a well-structured form.
Dossier Agent
For 'Who is X?' questions.
A dynamic set of factual questions used to learn 'information nuggets' about persons.
Pattern-based Agent : Pattern-based Agent Motivation: some questions (with or without AT) indicate that the answer might be in a structured form
What does Knight Rider publish? transitive verb, missing object.
Knight Rider publishes X.
Patterns generated:
From a static pattern repository, e.g. birth and death dates recognition.
Dynamically from the question structure.
Matching of the expected answer pattern with the actual answer text is not at word level, but at a higher linguistic level based on full parse trees.
Informal communication: about 5% of TREC questions are answered by this agent.
Dossier Agent : Dossier Agent Addresses 'Who is X?' questions.
Generates initially a series of generic questions:
When was X born?
What was X’s profession?
Future iterations dynamically decided based on the previous answers:
If X’s profession is 'writer' the next question is: What did X write?
A static ontology of biographical questions used.
CyC Sanity Checker : CyC Sanity Checker Post-processing component that
Rejects insane answers
'How much does a grey wolf weigh?'
'300 tons'
A grey wold IS-A wolf. Weight of a wolf known in Cyc.
Cyc returns: SANE, INSANE, or DON’T KNOW.
Boosts answer confidence when the answer is SANE.
Typically called for numerical answer types:
What is the population of Maryland?
How much does a grey wolf weigh?
How high is Mt. Hood?
Answer Resolution : Answer Resolution Called when multiple agents are applied for the same question. Distribution of agents: the predictive-annotation and the statistical agent by far the most common.
Each agent provides a canonical answer (e.g. normalized named entity) and a confidence score.
Final confidence for each candidate answer computed using a ML model with SVM, which uses the answers and confidences provided by the individual agents as input.
CMU’s Javelin : CMU’s Javelin Architecture combines SMU’s and IBM’s approaches.
Question processing close to SMU’s approach.
Passage retrieval loop conceptually similar to SMU’s, but an elegant implementation.
Multiple answer strategies similar to IBM’s system. All of them are based on ML models (K nearest neighbours, decision trees) that use shallow-text features (close to SMU’s).
Answer voting, similar to IBM’s, used to exploit answer redundancy.
Javelin’s Retrieval Strategist : Javelin’s Retrieval Strategist Implements passage retrieval, including the passage retrieval loop.
Uses the Inquiry IR system, probably Lemur by now.
The retrieval loop uses all keywords in close proximity of each other initially (stricter than SMU).
Subsequent iterations relax the following query terms
Proximity for all question keywords: 20, 100, 250, AND
Phrase proximity for phrase operators: less than 3 words or PHRASE
Phrase proximity for named entities: less than 3 words or PHRASE
Inclusion/exclusion of AT word
Discarding other keywords
Accuracy for TREC-11 queries: how many questions had at least one correct document in the top N documents:
Top 30 docs: 80%
Top 60 docs: 85%
Top 120 docs: 86%
ISI’s TextMap: Pattern-Based QA : ISI’s TextMap: Pattern-Based QA Examples
Who invented the cotton gin?
andlt;whoandgt; invented the cotton gin
andlt;whoandgt;'s invention of the cotton gin
andlt;whoandgt; received a patent for the cotton gin
How did Mahatma Gandhi die?
Mahatma Gandhi died andlt;howandgt;
Mahatma Gandhi drowned
andlt;whoandgt; assassinated Mahatma Gandhi
Patterns generated from the question form (similar to IBM), learned using a pattern discovery mechanism, or added manually to a pattern repository
The pattern discovery mechanism performs a series of generalizations from annotated examples:
Babe Ruth was born in Baltimore, on February 6, 1895.
PERSON was born *g* on DATE
TextMap: QA Machine Translation : TextMap: QA Machine Translation In machine translation, one collects translations pairs (s, d) and learns a model how to transform the source s into the destination d.
QA is redefined in a similar way: collect question-answer pairs (a, q) and learn a model that computes the probability that a question is generated from the given answer: p(q | parsetree(a)). The correct answer maximizes this probability.
Only the subsets of answer parse trees where the answer lies are used as training (not the whole sentence).
An off-the-shelf machine translation package (Giza) used to train the model.
TextMap:Exploiting the Data Redundancy : TextMap: Exploiting the Data Redundancy Additional knowledge resources are used whenever applicable
WordNet glosses
What is a meerkat?
www.acronymfinder.com
What is ARDA?
Etcetera
The 'known' answers are then simply searched in the document collection together with question keywords
Google is used for answer redundancy
TREC and Web (through Google) are searched in parallel.
Final answer selected using a maximum entropy ML model.
IBM introduced redundancy for QA agents, ISI uses data redundancy.
BBN’s AQUA : BBN’s AQUA Factual system converts both question and answer to a semantic form (close to SMU’s)
Machine learning used to measure the similarity of the two representations.
Was ranked best at the TREC definition pilot organized before TREC-12
Definition system conceptually close to SMU’s
Had pronominal and nominal coreference resolution
Used a (probably) better parser (Charniak)
Post-ranking of candidate answers using a tf * idf model
References (1/2) : References (1/2) Marius Paşca. High-Performance, Open-Domain Question Answering from Large Text Collections, Ph.D. Thesis, Computer Science and Engineering Department, Southern Methodist University, Defended September 2001, Dallas, Texas
Marius Paşca. Open-Domain Question Answering from Large Text Collections, Center for the Study of Language and Information (CSLI Publications, series: Studies in Computational Linguistics), Stanford, California, Distributed by the University of Chicago Press, ISBN (Paperback): 1575864282, ISBN (Cloth): 1575864274. 2003
Dan Moldovan, Sanda Harabagiu, Marius Pasca, Rada Mihalcea, Richard Goodrum, Roxana Girju, and Vasile Rus . LASSO: A Tool for Surfing the Answer Net, Text Retrieval Conference (TREC-8), 1999
References (2/2) : References (2/2) E. Nyberg, T. Mitamura, J.Carbonell, J. Callan, K. Collins-Thompson, K. Czuba, M. Duggan, L. Hiyakumoto, N. Hu, Y. Huang, J. Ko, L.V. Lita, S. Murtagh, V. Pedro, D. Svoboda . The JAVELIN Question Answering System at TREC 2002, Text Retrieval Conference, 2002
Xin Li and Dan Roth. Learning Question Classifiers: The Role of Semantic Information, Natural Language Engineering, 2004
E. Saquete, P. Martinez-Barco, R. Munoz, J.L. Vicedo. Splitting Complex Temporal Questions for Question Answering Systems, ACL 2004
End : End Gràcies!