Question Answering Tutorial: Question Answering Tutorial John M. Prager
IBM T.J. Watson Research Center
jprager@us.ibm.com
Tutorial Overview: Tutorial Overview Ground Rules
Part I - Anatomy of QA
A Brief History of QA
Terminology
The essence of Text-based QA
Basic Structure of a QA System
NE Recognition and Answer Types
Answer Extraction
Part II - Specific Approaches
By Genre
By System
Part III - Issues and Advanced Topics
Evaluation
No Answer
Question Difficulty
Dimensions of QA
Relationship questions
Decomposition/Recursive QA
Constraint-based QA
Cross-Language QA
References
Ground Rules: Ground Rules Breaks
Questions
Topics
Focus on English Text
TREC & AQUAINT & beyond
General Principles
Tricks-of-the-Trade
State-of-the-Art Methodologies
My own System vs. My own Research
Caution
Caution: Nothing in this Tutorial is true Caution Nothing in this Tutorial is true
universally
Part I - Anatomy of QA: Part I - Anatomy of QA
A Brief History of QA
Terminology
The Essence of Text-based QA
Basic Structure of a QA System
NE Recognition and Answer Types
Answer Extraction
A Brief History of QA: A Brief History of QA NLP front-ends to Expert Systems
SHRDLU (Winograd, 1972)
User manipulated, and asked questions about, blocks world
First real demo of combination of syntax, semantics, and reasoning
NLP front-ends to Databases
LUNAR (Woods,1973)
User asked questions about moon rocks
Used ATNs and procedural semantics
LIFER/LADDER (Hendrix et al. 1977)
User asked questions about U.S. Navy ships
Used semantic grammar; domain information built into grammar
NLP + logic
CHAT-80 (Warren & Pereira, 1982)
NLP query system in Prolog, about world geography
Definite Clause Grammars
“Modern Era of QA”
MURAX (Kupiec, 2001)
NLP front-end to Encyclopaedia
NLP + hand-coded annotations to sources
AskJeeves (www.ask.com)
START (Katz, 1997)
Started with text, extended to multimedia
IR + NLP
TREC-8 (1999) (Voorhees & Tice, 2000)
Today – all of the above
Some “factoid” questions from TREC8-9: Some “factoid” questions from TREC8-9 9: How far is Yaroslavl from Moscow?
15: When was London's Docklands Light Railway constructed?
22: When did the Jurassic Period end?
29: What is the brightest star visible from Earth?
30: What are the Valdez Principles?
73: Where is the Taj Mahal?
134: Where is it planned to berth the merchant ship, Lane Victory, which Merchant Marine veterans are converting into a floating museum?
197: What did Richard Feynman say upon hearing he would receive the Nobel Prize in Physics?
198: How did Socrates die?
199: How tall is the Matterhorn?
200: How tall is the replica of the Matterhorn at Disneyland?
227: Where does dew come from?
269: Who was Picasso?
298: What is California's state tree?
Terminology: Terminology Question Type
Answer Type
Question Focus
Question Topic
Candidate Passage
Candidate Answer
Authority File/List
Terminology – Question Type: Terminology – Question Type Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats
E.g. TREC2003
FACTOID: “How far is it from Earth to Mars?”
LIST: “List the names of chewing gums”
DEFINITION: “Who is Vlad the Impaler?”
Other possibilities:
RELATIONSHIP: “What is the connection between Valentina Tereshkova and Sally Ride?”
SUPERLATIVE: “What is the largest city on Earth?”
YES-NO: “Is Saddam Hussein alive?”
OPINION: “What do most Americans think of gun control?”
CAUSE&EFFECT: “Why did Iraq invade Kuwait?”
…
Terminology – Answer Type: Terminology – Answer Type Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g.
PERSON (from “Who …”)
PLACE (from “Where …”)
DATE (from “When …”)
NUMBER (from “How many …”)
…
but also
EXPLANATION (from “Why …”)
METHOD (from “How …”)
…
Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer.
Terminology – Question Focus: Terminology – Question Focus Question Focus: The property or entity that is being sought by the question.
E.g.
“In what state is the Grand Canyon?”
“What is the population of Bulgaria?”
“What colour is a pomegranate?”
Terminology – Question Topic: Terminology – Question Topic Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus.
E.g. “What is the height of Mt. Everest?”
height is the focus
Mt. Everest is the topic
Terminology – Candidate Passage: Terminology – Candidate Passage Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question.
Depending on the query and kind of index used, there may or may not be a guarantee that a candidate passage has any candidate answers.
Candidate passages will usually have associated scores, from the search engine.
Terminology – Candidate Answer: Terminology – Candidate Answer Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type.
In some systems, the type match may be approximate, if there is the concept of confusability.
Candidate answers are found in candidate passages
E.g.
50
Queen Elizabeth II
September 8, 2003
by baking a mixture of flour and water
Terminology – Authority List: Terminology – Authority List Authority List (or File): a collection of instances of a class of interest, used to test a term for class membership.
Instances should be derived from an authoritative source and be as close to complete as possible.
Ideally, class is small, easily enumerated and with members with a limited number of lexical forms.
Good:
Days of week
Planets
Elements
Good statistically, but difficult to get 100% recall:
Animals
Plants
Colours
Problematic
People
Organizations
Impossible
All numeric quantities
Explanations and other clausal quantities
Essence of Text-based QA: Essence of Text-based QA Need to find a passage that answers the question.
Find a candidate passage (search)
Check that semantics of passage and question match
Extract the answer (Single source answers)
Essence of Text-based QA: Essence of Text-based QA For a very small corpus, can consider every passage as a candidate, but this is not interesting
Need to perform a search to locate good passages.
If search is too broad, have not achieved that much, and are faced with lots of noise
If search is too narrow, will miss good passages Search Two broad possibilities:
Optimize search
Use iteration
Essence of Text-based QA: Essence of Text-based QA Need to test whether semantics of passage match semantics of question
Count question words present in passage
Score based on proximity
Score based on syntactic relationships
Prove match Match
Essence of Text-based QA: Essence of Text-based QA Find candidate answers of same type as the answer type sought in question.
Has implications for size of type hierarchy
Where/when/whether to consider subsumption
Consider later Answer Extraction
Basic Structure of a QA-System: Basic Structure of a QA-System See for example Abney et al., 2000; Clarke et al., 2001; Harabagiu et al.; Hovy et al., 2001; Prager et al. 2000 Question
Analysis Answer
Extraction Search Corpus
or
Web Question Answer Documents/ passages Query Answer
Type
Essence of Text-based QA: Essence of Text-based QA Have three broad locations in the system where expansion takes place, for purposes of matching passages
Where is the right trade-off?
Question Analysis.
Expand individual terms to synonyms (hypernyms, hyponyms, related terms)
Reformulate question
In Search Engine
Generally avoided for reasons of computational expense
At indexing time
Stemming/lemmatization High-Level View of Recall
Essence of Text-based QA: Essence of Text-based QA Have three broad locations in the system where narrowing/filtering/matching takes place
Where is the right trade-off?
Question Analysis.
Include all question terms in query
Use IDF-style weighting to indicate preferences
Search Engine
Possibly store POS information for polysemous terms
Answer Extraction
Reward (penalize) passages/answers that (don’t) pass test
Particularly attractive for temporal modification High-Level View of Precision
Answer Types and Modifiers: Answer Types and Modifiers Most likely there is no type for “French Cities”
So will look for CITY
include “French/France” in bag of words, and hope for the best
include “French/France” in bag of words, retrieve documents, and look for evidence (deep parsing, logic)
use high-precision Language Identification on results
If you have a list of French cities, could either
Filter results by list
Use Answer-Based QA (see later)
Use longitude/latitude information of cities and countries Name 5 French Cities
Answer Types and Modifiers: Answer Types and Modifiers Most likely there is no type for “female figure skater”
Most likely there is no type for “figure skater”
Look for PERSON, with query terms {figure, skater}
What to do about “female”? Two approaches.
Include “female” in the bag-of-words.
Relies on logic that if “femaleness” is an interesting property, it might well be mentioned in answer passages.
Does not apply to, say “singer”.
Leave out “female” but test candidate answers for gender.
Needs either an authority file or a heuristic test.
Test may not be definitive. Name a female figure skater
Named Entity Recognition: Named Entity Recognition BBN’s IdentiFinder (Bikel et al. 1999)
Hidden Markov Model
Sheffield GATE (http://www.gate.ac.uk/)
Development Environment for IE and other NLP activities
IBM’s Textract/Resporator (Byrd & Ravin, 1999; Wacholder et al. 1997; Prager et al. 2000)
FSMs and Authority Files
+ others
Inventory of semantic classes recognized by NER related closely to set of answer types system can handle
Named Entity Recognition: Named Entity Recognition
Probabilistic Labelling (IBM): Probabilistic Labelling (IBM) In Textract, a Proper name can be one of the following
PERSON
PLACE
ORGANIZATION
MISC_ENTITY (e.g. names of Laws, Treaties, Reports, …)
However, NER needs another class (UNAME) for any proper name it can’t identify.
In a large corpus, many entities end up being UNAMEs.
If, for example, a “Where” question seeks a PLACE, and similarly for the others above, then is being classified as UNAME a death sentence? How will a UNAME ever be searched for?
Probabilistic Labelling (IBM): Probabilistic Labelling (IBM) When entity is ambiguous or plain unknown, use a set of disjoint special labels in NER, instead of UNAME
Assumes NER is able to rule out some possibilities, at least sometimes.
Annotate with all remaining possibilities
Use these labels as part of answer type
E.g.
UNP could be a PERSON
UNL could be a PLACE
UNO could be an ORGANIZATION
UNE could be a MISC_ENTITY
So
{UNP UNL} could be a PERSON or a PLACE
This would be a good label for Beverly Hills
Probabilistic Labelling (IBM): Probabilistic Labelling (IBM) So “Who” questions that would normally generate {PERSON} as answer type, now generate {PERSON UNP}
Question: “Who is David Beckham married to?”
Answer Passage: “David Beckham, the soccer star engaged to marry Posh Spice, is being blamed for England 's World Cup defeat.”
“Posh Spice” gets annotated with {UNP UNO}
Match occurs, answer found. Crowd erupts!
Issues with NER: Issues with NER Coreference
Should referring terms (definite noun phrases, pronouns) be labelled the same way as the referent terms?
Nested Noun Phrases (and other structures of interest)
What granularity?
Partly depends on whether multiple annotations are allowed
Subsumption and Ambiguity
What label(s) to choose?
Probabilistic labelling
How to Annotate?: How to Annotate? “… Baker will leave Jerusalem on Saturday and stop in Madrid on the way home to talk to Spanish Prime Minister Felipe Gonzales.” What about: The U.S. ambassador to Spain, Ed Romero ?
Answer Extraction: Answer Extraction Also called Answer Selection/Pinpointing
Given a question and candidate passages, the process of selecting and ranking candidate answers.
Usually, candidate answers are those terms in the passages which have the same answer type as that generated from the question
Ranking the candidate answers depends on assessing how well the passage context relates to the question
3 Approaches:
Heuristic features
Shallow parse fragments
Logical proof
Answer Extraction using Features: Answer Extraction using Features Heuristic feature sets (Prager et al. 2003+). See also (Radev at al. 2000)
Calculate feature values for each CA, and then calculate linear combination using weights learned from training data.
Ranking criteria:
Good global context:
the global context of a candidate answer evaluates the relevance of the passage from which the candidate answer is extracted to the question.
Good local context
the local context of a candidate answer assesses the likelihood that the answer fills in the gap in the question.
Right semantic type
the semantic type of a candidate answer should either be the same as or a subtype of the answer type identified by the question analysis component.
Redundancy
the degree of redundancy for a candidate answer increases as more instances of the answer occur in retrieved passages.
Answer Extraction using Features (cont.): Answer Extraction using Features (cont.) Features for Global Context
KeywordsInPassage: the ratio of keywords present in a passage to the total number of keywords issued to the search engine.
NPMatch: the number of words in noun phrases shared by both the question and the passage.
SEScore: the ratio of the search engine score for a passage to the maximum achievable score.
FirstPassage: a Boolean value which is true for the highest ranked passage returned by the search engine, and false for all other passages.
Features for Local Context
AvgDistance: the average distance between the candidate answer and keywords that occurred in the passage.
NotInQuery: the number of words in the candidate answers that are not query keywords.
Answer Extraction using Relationships: Answer Extraction using Relationships Computing Ranking Scores –
Linguistic knowledge to compute passage & candidate answer scores
Perform syntactic processing on question and candidate passages
Extract predicate-argument & modification relationships from parse
Question: “Who wrote the Declaration of Independence?”
Relationships: [X, write], [write, Declaration of Independence]
Answer Text: “Jefferson wrote the Declaration of Independence.”
Relationships: [Jefferson, write], [write, Declaration of Independence]
Compute scores based on number of question relationship matches
Passage score: consider all instantiated relationships
Candidate answer scores: consider relationships with variable
Answer Extraction using Relationships (cont.): Answer Extraction using Relationships (cont.) Example: When did Amtrak begin operations?
Question relationships
[Amtrak, begin], [begin, operation], [X, begin]
Compute passage scores: passages and relationships
In 1971, Amtrak began operations,…
[Amtrak, begin], [begin, operation], [1971, begin]…
“Today, things are looking better,” said Claytor, expressing optimism about getting the additional federal funds in future years that will allow Amtrak to begin expanding its operations.
[Amtrak, begin], [begin, expand], [expand, operation], [today, look]…
Airfone, which began operations in 1984, has installed air-to-ground phones…. Airfone also operates Railfone, a public phone service on Amtrak trains.
[Airfone, begin], [begin, operation], [1984, operation], [Amtrak, train]…
Answer Extraction using Logic: Answer Extraction using Logic Logical Proof
Convert question to a goal
Convert passage to set of logical forms representing individual assertions
Add predicates representing subsumption rules, real-world knowledge
Prove the goal
See section on LCC later
Question Answering Tutorial Part II: Question Answering Tutorial Part II John M. Prager
IBM T.J. Watson Research Center
jprager@us.ibm.com
Part II - Specific Approaches: Part II - Specific Approaches
By Genre
Statistical QA
Pattern-based QA
Web-based QA
Answer-based QA (TREC only)
By System
SMU
LCC
USC-ISI
Insight
Microsoft
IBM Statistical
IBM Rule-based
Approaches by Genre: Approaches by Genre By Genre
Statistical QA
Pattern-based QA
Web-based QA
Answer-based QA (TREC only)
Web-based QA
Database-based QA
Considerations
Effectiveness by question-type
Precision and recall
Expandability to other domains
Ease of adaptation to CL-QA
Statistical QA: Statistical QA Use statistical distributions to model likelihoods of answer type and answer
E.g. IBM (Ittycheriah, 2001) – see later section
Pattern-based QA: Pattern-based QA For a given question type, identify the typical syntactic constructions used in text to express answers to such questions
Typically very high precision, but a lot of work to get decent recall
Web-Based QA: Web-Based QA Exhaustive string transformations
Brill et al. 2002
Learning
Radev et al. 2001
Answer-Based QA: Answer-Based QA Problem: Sometimes it is very easy to find an answer to a question using resource A, but the task demands that you find it in resource B.
Solution: First find the answer in resource A, then locate the same answer, along with original question terms, in resource B.
Artificial problem, but real for TREC participants.
Answer-Based QA: Answer-Based QA Web-Based solution:
When a QA system looks for answers within a relatively small textual collection, the chance of finding strings/sentences that closely match the question string is small. However, when a QA system looks for strings/sentences that closely match the question string on the web, the chance of finding correct answer is much higher.
Hermjakob et al. 2002 Why this is true:
The Web is much larger than the TREC Corpus (3,000 : 1)
TREC questions are generated from Web logs, and the style of language (and subjects of interest) in these logs are more similar to the Web content than to newswire collections.
Answer-Based QA: Answer-Based QA Database/Knowledge-base/Ontology solution:
When question syntax is simple and reliably recognizable, can express as a logical form
Logical form represents entire semantics of question, and can be used to access structured resource:
WordNet
On-line dictionaries
Tables of facts & figures
Knowledge-bases such as Cyc
Having found answer
construct a query with original question terms + answer
Retrieve passages
Tell Answer Extraction the answer it is looking for
Approaches of Specific Systems: Approaches of Specific Systems SMU Falcon
LCC
USC-ISI
Insight
Microsoft
IBM Note: Some of the slides and/or examples in these sections are taken from papers or presentations from the respective system authors
SMU Falcon: SMU Falcon Harabagiu et al. 2000
SMU Falcon: SMU Falcon From question, dependency structure called question semantic form is created
Query is Boolean conjunction of terms
From answer passages that contain at least one instance of answer type, generate answer semantic form
3 processing loops:
Loop 1
Triggered when too few or too many passages are retrieved from search engine
Loop 2
Triggered when question semantic form and answer semantic form cannot be unified
Loop 3
Triggered when unable to perform abductive proof of answer correctness
SMU Falcon: SMU Falcon Loops provide opportunities to perform alternations
Loop 1: morphological expansions and nominalizations
Loop 2: lexical alternations – synonyms, direct hypernyms and hyponyms
Loop 3: paraphrases
Evaluation (Pasca & Harabagiu, 2001). Increase in accuracy in 50-byte task in TREC9
Loop 1: 40%
Loop 2: 52%
Loop 3: 8%
Combined: 76%
LCC: LCC Moldovan & Rus, 2001
Uses Logic Prover for answer justification
Question logical form
Candidate answers in logical form
XWN glosses
Linguistic axioms
Lexical chains
Inference engine attempts to verify answer by negating question and proving a contradiction
If proof fails, predicates in question are gradually relaxed until proof succeeds or associated proof score is below a threshold.
LCC: Lexical Chains: LCC: Lexical Chains Q:1518 What year did Marco Polo travel to Asia?
Answer: Marco polo divulged the truth after returning in 1292 from his travels, which included several months on Sumatra
Lexical Chains:
(1) travel_to:v#1 -> GLOSS -> travel:v#1 -> RGLOSS -> travel:n#1
(2) travel_to#1 -> GLOSS -> travel:v#1 -> HYPONYM -> return:v#1
(3) Sumatra:n#1 -> ISPART -> Indonesia:n#1 -> ISPART ->
Southeast _Asia:n#1 -> ISPART -> Asia:n#1
Q:1570 What is the legal age to vote in Argentina?
Answer: Voting is mandatory for all Argentines aged over 18.
Lexical Chains: (1) legal:a#1 -> GLOSS -> rule:n#1 -> RGLOSS -> mandatory:a#1
(2) age:n#1 -> RGLOSS -> aged:a#3
(3) Argentine:a#1 -> GLOSS -> Argentina:n#1
LCC: Logic Prover: LCC: Logic Prover Question
Which company created the Internet Browser Mosaic?
QLF: (_organization_AT(x2) ) & company_NN(x2) & create_VB(e1,x2,x6) & Internet_NN(x3) & browser_NN(x4) & Mosaic_NN(x5) & nn_NNC(x6,x3,x4,x5)
Answer passage
... Mosaic , developed by the National Center for Supercomputing Applications ( NCSA ) at the University of Illinois at Urbana - Champaign ...
ALF: ... Mosaic_NN(x2) & develop_VB(e2,x2,x31) & by_IN(e2,x8) & National_NN(x3) & Center_NN(x4) & for_NN(x5) & Supercomputing_NN(x6) & application_NN(x7) & nn_NNC(x8,x3,x4,x5,x6,x7) & NCSA_NN(x9) & at_IN(e2,x15) & University_NN(x10) & of_NN(x11) & Illinois_NN(x12) & at_NN(x13) & Urbana_NN(x14) & nn_NNC(x15,x10,x11,x12,x13,x14) & Champaign_NN(x16) ...
Lexical Chains develop make and make create
exists x2 x3 x4 all e2 x1 x7 (develop_vb(e2,x7,x1) make_vb(e2,x7,x1) & something_nn(x1) & new_jj(x1) & such_jj(x1) & product_nn(x2) & or_cc(x4,x1,x3) & mental_jj(x3) & artistic_jj(x3) & creation_nn(x3)).
all e1 x1 x2 (make_vb(e1,x1,x2) create_vb(e1,x1,x2) & manufacture_vb(e1,x1,x2) & man-made_jj(x2) & product_nn(x2)).
Linguistic axioms
all x0 (mosaic_nn(x0) -> internet_nn(x0) & browser_nn(x0))
USC-ISI: USC-ISI Textmap system
Ravichandran and Hovy, 2002
Hermjakob et al. 2003
Use of Surface Text Patterns
When was X born ->
Mozart was born in 1756
Gandhi (1869-1948)
Can be captured in expressions
was born in
( -
These patterns can be learned
USC-ISI TextMap: USC-ISI TextMap Use bootstrapping to learn patterns.
For an identified question type (“When was X born?”), start with known answers for some values of X
Mozart 1756
Gandhi 1869
Newton 1642
Issue Web search engine queries (e.g. “+Mozart +1756” )
Collect top 1000 documents
Filter, tokenize, smooth etc.
Use suffix tree constructor to find best substrings, e.g.
Mozart (1756-1791)
Filter
Mozart (1756-
Replace query strings with e.g. and
Determine precision of each pattern
Find documents with just question term (Mozart)
Apply patterns and calculate precision
USC-ISI TextMap: USC-ISI TextMap Finding Answers
Determine Question type
Perform IR Query
Do sentence segmentation and smoothing
Replace question term by question tag
i.e. replace Mozart with
Search for instances of patterns associated with question type
Select words matching
Assign scores according to precision of pattern
Insight: Insight Soubbotin, 2002. Soubbotin & Soubbotin, 2003.
Performed very well in TREC10/11
Comprehensive and systematic use of “Indicative patterns”
E.g.
cap word; paren; 4 digits; dash; 4 digits; paren
matches
Mozart (1756-1791)
The patterns are broader than named entities
“Semantics in syntax”
Patterns have intrinsic scores (reliability), independent of question
Insight: Insight Patterns with more sophisticated internal structure are more indicative of answer
2/3 of their correct entries in TREC10 were answered by patterns
E.g.
a == {countries}
b == {official posts}
w == {proper names (first and last)}
e == {titles or honorifics}
Patterns for “Who is the President (Prime Minister) of given country?
abeww
ewwdb,a
b,aeww
Definition questions: (A is primary query term, X is answer)
For: “Moulin Rouge, a cabaret”
For: “naturally occurring gas called methane”
For: “Michigan’s state flower is the apple blossom”
Insight: Insight Emphasis on shallow techniques, lack of NLP
Look in vicinity of text string potentially matching pattern for “zeroing” – e.g. for occupational roles:
Former
Elect
Deputy
Negation
Comments:
Relies on redundancy of large corpus
Works for factoid question types of TREC-QA – not clear how it extends
Not clear how they match questions to patterns
Named entities within patterns have to be recognized
Microsoft: Microsoft Data-Intensive QA. Brill et al. 2002
“Overcoming the surface string mismatch between the question formulation and the string containing the answer”
Approach based on the assumption/intuition that someone on the Web has answered the question in the same way it was asked.
Want to avoid dealing with:
Lexical, syntactic, semantic relationships (bet. Q & A)
Anaphora resolution
Synonymy
Alternate syntax
Indirect answers
Take advantage of redundancy on Web, then project to TREC corpus (Answer-based QA)
Microsoft AskMSR: Microsoft AskMSR Formulate multiple queries – each rewrite has intrinsic score. E.g. for “What is relative humidity?”
[“+is relative humidity”, LEFT, 5]
[“relative +is humidity”, RIGHT, 5]
[“relative humidity +is”, RIGHT, 5]
[“relative humidity”, NULL, 2]
[“relative” AND “humidity”, NULL, 1]
Get top 100 documents from Google
Extract n-grams from document summaries
Score n-grams by summing the scores of the rewrites it came from
Use tiling to merge n-grams
Search for supporting documents in TREC corpus
Microsoft AskMSR: Microsoft AskMSR Question is: “What is the rainiest place on Earth”
Answer from Web is: “Mount Waialeale”
Passage in TREC corpus is: “… In misty Seattle, Wash., last year, 32 inches of rain fell. Hong Kong gets about 80 inches a year, and even Pago Pago, noted for its prodigious showers, gets only about 196 inches annually. (The titleholder, according to the National Geographic Society, is Mount Waialeale in Hawaii, where about 460 inches of rain falls each year.) …”
Very difficult to imagine getting this passage by other means
IBM Statistical QA (Ittycheriah, 2001): IBM Statistical QA (Ittycheriah, 2001) ATM predicts, from the question and a proposed answer, the answer type they both satisfy
Given a question, an answer, and the predicted answer type, ASM seeks to model the correctness of this configuration.
Distributions are modelled using a maximum entropy formulation
Training data = human judgments
For ATM, 13K questions annotated with 31 categories
For ASM, ~ 5K questions from TREC plus trivia p(c|q,a) = Se p(c,e|q,a)
= Se p(c|e,q,a) p(e|q,a) q = question
a = answer
c = “correctness”
e = answer type p(e|q,a) is the answer type model (ATM)
p(c|e,q,a) is the answer selection model (ASM)
IBM Statistical QA (Ittycheriah): IBM Statistical QA (Ittycheriah) Question Analysis (by ATM)
Selects one out of 31 categories
Search
Question expanded by Local Context Analysis
Top 1000 documents retrieved
Passage Extraction: Top 100 passages that:
Maximize question word match
Have desired answer type
Minimize dispersion of question words
Have similar syntactic structure to question
Answer Extraction:
Candidate answers ranked using ASM
IBM Rule-based: IBM Rule-based Predictive Annotation (Prager 2000, Prager 2003)
Want to make sure passages retrieved by search engine have at least one candidate answer
Recognize that candidate answer is of correct answer type which corresponds to a label (or several) generated by Named Entity Recognizer
Annotate entire corpus and index semantic labels along with text
Identify answer types in questions and include corresponding labels in queries
IBM PIQUANT: IBM PIQUANT Predictive Annotation –
E.g.: Question is “Who invented baseball?”
“Who” can map to PERSON$ or ORGANIZATION$
Suppose we assume only people invent things (it doesn’t really matter).
So “Who invented baseball?” -> {PERSON$ invent baseball}
Consider text “… but its conclusion was based largely on the recollections of a man named Abner Graves, an elderly mining engineer, who reported that baseball had been "invented" by Doubleday between 1839 and 1841. ”
IBM PIQUANT: IBM PIQUANT Predictive Annotation –
Previous example
“Who invented baseball?” -> {PERSON$ invent baseball}
However, same structure is equally effective at answering
“What sport did Doubleday invent?” -> {SPORT$ invent Doubleday}
IBM Rule-Based: IBM Rule-Based Handling Subsumption & Disjunction
If an entity is of a type which has a parent type, then how is annotation done?
If a proposed answer type has a parent type, then what answer type should be used?
If an entity is ambiguous then what should the annotation be?
If the answer type is ambiguous, then what should be used?
Guidelines:
If an entity is of a type which has a parent type, then how is annotation done?
If a proposed answer type has a parent type, then what answer type should be used?
If an entity is ambiguous then what should the annotation be?
If the answer type is ambiguous, then what should be used?
Subsumption & Disjunction: Subsumption & Disjunction Consider New York City – both a CITY and a PLACE
To answer “Where did John Lennon die?”, it needs to be a PLACE
To answer “In what city is the Empire State Building?”, it needs to be a CITY.
Do NOT want to do subsumption calculation in search engine
Two scenarios
1. Expand Answer Type and use most specific entity annotation
1A { (CITY PLACE) John_Lennon die} matches CITY
1B {CITY Empire_State_Building} matches CITY
Or
2. Use most specific Answer Type and multiple annotations of NYC
2A {PLACE John_Lennon die} matches (CITY PLACE)
2B {CITY Empire_State_Building} matches (CITY PLACE)
Case 2 preferred for simplicity, because disjunction in #1 should contain all hyponyms of PLACE, while disjunction in #2 should contain all hypernyms of CITY
Choice #2 suggests can use disjunction in answer type to represent ambiguity:
“Who invented the laser” -> {(PERSON ORGANIZATION) invent laser}
Clausal classes: Clausal classes Any structure that can be recognized in text can be annotated.
Quotations
Explanations
Methods
Opinions
…
Any semantic class label used in annotation can be indexed, and hence used as a target of search:
What did Karl Marx say about religion?
Why is the sky blue?
How do you make bread?
What does Arnold Schwarzenegger think about global warming?
…
Named Entity Recognition: Named Entity Recognition
IBM: IBM Predictive Annotation – Improving Precision at no cost to Recall
E.g.: Question is “Where is Belize?”
“Where” can map to (CONTINENT$, WORLDREGION$, COUNTRY$, STATE$, CITY$, CAPITAL$, LAKE$, RIVER$ … ).
But we know Belize is a country.
So “Where is Belize?” -> {(CONTINENT$ WORLDREGION$) Belize}
Belize occurs 1068 times in TREC corpus
Belize and PLACE$ co-occur in only 537 sentences
Belize and CONTINENT$ or WORLDREGION$ co-occur in only 128 sentences
Virtual Annotation (Prager 2001): Virtual Annotation (Prager 2001) Use WordNet to find all candidate answers (hypernyms)
Use corpus co-occurrence statistics to select “best” ones
Rather like approach to WSD by Mihalcea and Moldovan (1999)
Parentage of “nematode”: Parentage of “nematode”
Parentage of “meerkat”: Parentage of “meerkat”
Natural Categories: Natural Categories “Basic Objects in Natural Categories” Rosch et al. (1976)
According to psychological testing, these are categorization levels of intermediate specificity that people tend to use in unconstrained settings.
What is this?: What is this?
What can we conclude?: What can we conclude? There are descriptive terms that people are drawn to use naturally.
We can expect to find instances of these in text, in the right contexts.
These terms will serve as good answers.
Virtual Annotation (cont.): Virtual Annotation (cont.) Find all parents of query term in WordNet
Look for co-occurrences of query term and parent in text corpus
Expect to find snippets such as: “… meerkats and other Y …”
Many different phrasings are possible, so we just look for proximity, rather than parse.
Scoring:
Count co-occurrences of each parent with search term, and divide by level number (only levels >= 1), generating Level-Adapted Count (LAC).
Exclude very highest levels (too general).
Select parent with highest LAC plus any others with LAC within 20%.
Parentage of “nematode”: Parentage of “nematode”
Parentage of “meerkat”: Parentage of “meerkat”
Sample Answer Passages: Sample Answer Passages “What is a nematode?” ->
“Such genes have been found in nematode worms but not yet in higher animals.”
“What is a meerkat?” ->
“South African golfer Butch Kruger had a good round going in the central Orange Free State trials, until a mongoose-like animal grabbed his ball with its mouth and dropped down its hole. Kruger wrote on his card: "Meerkat."”
Use Answer-based QA to locate answers
Use of Cyc as Sanity Checker: Use of Cyc as Sanity Checker Cyc: Large Knowledge-base and Inference engine (Lenat 1995)
A post-hoc process for
Rejecting “insane” answers
How much does a grey wolf weigh?
300 tons
Boosting confidence for “sane” answers
Sanity checker invoked with
Predicate, e.g. “weight”
Focus, e.g. “grey wolf”
Candidate value, e.g. “300 tons”
Sanity checker returns
“Sane”: + or – 10% of value in Cyc
“Insane”: outside of the reasonable range
Plan to use distributions instead of ranges
“Don’t know”
Confidence score highly boosted when answer is “sane”
Cyc Sanity Checking Example: Cyc Sanity Checking Example Trec11 Q: “What is the population of Maryland?”
Without sanity checking
PIQUANT’s top answer: “50,000”
Justification: “Maryland’s population is 50,000 and growing rapidly.”
Passage discusses an exotic species “nutria”, not humans
With sanity checking
Cyc knows the population of Maryland is 5,296,486
It rejects the top “insane” answers
PIQUANT’s new top answer: “5.1 million” with very high confidence
Question Answering Tutorial Part III: Question Answering Tutorial Part III John M. Prager
IBM T.J. Watson Research Center
jprager@us.ibm.com
Part III – Issues, Advanced Topics: Part III – Issues, Advanced Topics Evaluation
No Answer
Question Difficulty
Future of QA/Hot topics
Dimensions of QA
Relationship questions
Decomposition / Recursive QA
Constraint-based QA
Cross-Language QA
Evaluation: Evaluation Relatively straightforward for “factoid” questions.
TREC-8 (1999) & TREC-9 (2000)
50-byte and 250-byte tasks
Systems returned top 5 answers
Mean Reciprocal Rank
1 point if top answer is correct, else
0.5 point if second answer is correct, else …
0.2 point if fifth answer is correct, else 0
Evaluation: Evaluation For each question, a set of “correct” answers
“Correctness” testing is easy to automate with pattern files, but patterns are subjective
Patterns don’t/can’t test for justification
Evaluation: Evaluation TREC-10 (2001)
Dropped 250-byte task
Introduced NIL (No Answer ) questions
TREC-11 (2002)
Instead of top 5 answers, systems returned top 1
Answer must be “exact”
Definition questions (“What/who is X?”) dropped
Results returned sorted in order of system’s confidence
Scored by Confidence Weighted Score (= Average Precision)
TREC-12 (2003)
Definition questions re-introduced, but answers assumed to be a collection of “nuggets”
List questions introduced, answers must be exact
Definition and List questions evaluated by F-measure biased to favour recall
Factoid questions evaluated by fraction correct
Confidence-Weighted Score (Average Precision): Confidence-Weighted Score (Average Precision) = average of N different precision measures
Score1 participates in every term
Score2 participates in all but first,
…
ScoreN participates in just last term
Much more weight given to early terms in sum
Contribution by Rank Position: Contribution by Rank Position For N questions, if contribution of correct answer in position k is ck ck = ck+1 + 1/kN
cN+1 = 0 → N =500
Average Precision: Average Precision N =500
Evaluation Issues: Evaluation Issues What is really meant by “exact answer”?
What if there is a mistake in question?
Suppose question is “Who said X?”, where X is a famous saying with a mistake in it.
Maybe the answer is NIL
What granularity is required?
“Where is Chicago?”
“What is acetominophen?”
Difficult to answer without model of user.
Questions with No Answer: Questions with No Answer Subtle difference between:
This question has no answer (within the available resources),
This question has no answer (at all), and
I don’t know the answer
TREC-QA tests #1 (“NIL questions”), but systems typically answer as if #3
Strategies used:
When allowed top 5 answers (with confidences)
Always put NIL in position X (X in {2,3,4,5})
If some criterion succeeds, put NIL in position X (X in {1,2,3,4,5})
Determine some threshold T, and insert NIL at corresponding position in confidence ranking (1-5, or not)
When single answer
Determine some threshold T, and insert NIL if answer confidence < T
NIL and CWS: NIL and CWS When Confidence-Weighted Score is used, what should the NIL strategy be?
If an answer has low confidence and is replaced by NIL, then what is its new confidence?
Study strategy used by IBM in TREC11 (Chu-Carroll et al. 2003)
No-Answer Confidence-Based Calculation : No-Answer Confidence-Based Calculation
Use TREC10 Data to determine strategy and thresholds
Observe that lowest-confidence questions are more often No-Answer than correct
Examine TREC10 distribution to determine cut-off threshold.
Convert all questions below this to NIL.
Improves average confidence of block.
Move converted block to rank with same average precision. Confidences based on
Grammatical Relationships
Semantic Relationships
Redundancy
TREC10 Distribution: TREC10 Distribution NIL CORRECT OUT OF
xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41
xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50
xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50
.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50
........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50
..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50
..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50
x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50
.x--......xx....-.-..x.-....-.-..x...........--... 9 5 50
-.-.-..--...-x.xx....-.-x......-.....-..-...-.x.-. 13 5 50 Key: X Correct
. Incorrect
- NIL
TREC10 Distribution: Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C. TREC10 Distribution NIL CORRECT OUT OF
xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41
xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50
xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50
.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50
........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50
..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50
..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50
x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50
.x--......xx....-.-..x.-....-.-..x...........--... 9 5 50
-.-.-..--...-x.xx....-.-x......-.....-..-...-.x.-. 13 5 50 Key: X Correct
. Incorrect
- NIL C
TREC10 Distribution: Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C. TREC10 Distribution NIL CORRECT OUT OF
xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41
xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50
xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50
.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50
........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50
..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50
..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50
x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50
..xx............x.x....x....x.x..............xx... all 9 50
x.x.x..xx...x........x.x.......x.....x..x...x...x. all 13 50 Key: X Correct
. Incorrect
- NIL C
TREC10 Distribution: Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C. TREC10 Distribution NIL CORRECT OUT OF
xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41
xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50
xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50
.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50
........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50
..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50
..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50
x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50
..xx............x.x....x....x.x..............xx... all 9 50
x.x.x..xx...x........x.x.......x.....x..x...x...x. all 13 50 Key: X Correct
. Incorrect
- NIL Calculate precision of block P = 22/100 C
TREC10 Distribution: Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C. TREC10 Distribution NIL CORRECT OUT OF
xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41
xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50
xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50
.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50
........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50
..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50
..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50
x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50
..xx............x.x....x....x.x..............xx... all 9 50
x.x.x..xx...x........x.x.......x.....x..x...x...x. all 13 50 Key: X Correct
. Incorrect
- NIL Calculate precision of block P = 22/100 Calculate point with same local precision P. Note confidence K. C K
NIL Placement in TREC11 Answers:
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
?????????????????????????????????????????????????? NIL Placement in TREC11 Answers C Sorted by confidence, but correctness unknown Find point with confidence C. (Block is of size 147)
NIL Placement in TREC11 Answers: NIL Placement in TREC11 Answers
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
???????????????-----------------------------------
--------------------------------------------------
--------------------------------------------------
------------??????????????????????????????????????
?????????????????????????????????????????????????? Find point with confidence K. Insert block at this point. Subtract C from all confidences to the right. Sorted by confidence, but correctness unknown Find point with confidence C. (Block is of size 147) Find point with confidence C. (Block is of size 147) Make all answers in block NIL, and add K-C to each confidence.
NIL Placement in TREC11 Answers - Impact: NIL Placement in TREC11 Answers - Impact
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
???????????????-----------------------------------
--------------------------------------------------
--------------------------------------------------
------------??????????????????????????????????????
?????????????????????????????????????????????????? 29 out of 46 NIL answers located – recall of .63 9 previously-correct answers lost Total of 20 correct questions gained … 20/500 = 4% Minimal (< 0.5%) improvement in final AP score
Question Complexity: Question Complexity “Simple” questions are not a solved problem:
Complex questions can be decomposed into simpler components.
If simpler questions cannot be handled successfully, there’s no hope for more complex ones. Areas not explored (intentionally) by TREC to date:
spelling errors
grammatical errors
syntactic precision e.g. significance of articles
“not”, “only”, “just” …
Question Complexity: Question Complexity When was Queen Victoria born? … King George III’s only granddaughter to survive infancy was born in 1819 …
… Victoria was the only daughter of Edward, Duke of Kent …
… George III’s fourth son Edward became Duke of Kent … All of the current leading economic indicators point in the direction of the Federal Reserve Bank raising interest rates at next week’s meeting. Alan Greenspan, Fed chairman. 42. (The Hitchhiker’s Guide to the Galaxy) Should the Fed raise interest rates? What is the meaning of life?
Question Complexity: Question Complexity Not a function of question alone, but rather the pair {question, corpus}
In general, it is a function of the question and the resources to answer it, which include text corpora, databases, knowledge bases, ontologies and processing modules
Complexity ≡ Impedance Match
Future of QA: Future of QA By fixing resources, can make factoid QA more difficult by intentionally exploiting requirements for advanced NLP and/or reasoning
Questions that require more than one resource / document for an answer
E.g. What is the relationship between A and B?
Question decomposition
Cross-language QA How to advance the field
Dimensions of QA: Dimensions of QA “Answer Topology”
Characteristics of correct answer set
Language
Vocabulary & Syntax
Question as a problem
Enumeration, arithmetic, inference
User Model
Who’s asking the question
Opinions, hypotheses, predictions, beliefs
Answer Set Topology: Answer Set Topology No Answer, one, many
When are two different answers the same –
Natural variation
Size of an elephant
Estimation
Populations
Variation over time
Populations, Prime Ministers
Choose correct presentation format
Lists, charts, graphs, dialogues
Language: Language The biggest current roadblock to Question Answering is arguably Natural Language:
Anaphora
Definite Noun Phrases
Synonyms
Subsumption
Metonyms
Paraphrases
Negation & other such qualification
Nonce words
Idioms
Figures of speech
Poetic & other stylistic variations
…
Negation (1): Negation (1) Q: Who invented the electric guitar?
A: While Mr. Fender did not invent the electric guitar, he did revolutionize and perfect it. Note: Not all instances of “not” will invalidate a passage.
Questions as Word Problems: Questions as Word Problems Text Match
Find text that says “London is the largest city in England” (or paraphrase).
“Superlative” Search
Find a table of English cities and their populations, and sort.
Find a list of the 10 largest cities in the world, and see which are in England.
Uses logic: if L > all objects in set R then L > all objects in set E < R.
Find the population of as many individual English cities as possible, and choose the largest.
Heuristics
London is the capital of England. (Not guaranteed to imply it is the largest city, but quite likely.)
Complex Inference
E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”. What is the largest city in England?
Negation (2): Negation (2) Name a US state where cars are manufactured.
versus
Name a US state where cars are not manufactured. Certain kinds of negative events or instances are rarely asserted explicitly in text, but must be deduced by other means
Other Adverbial Modifiers (Only, Just etc.): Other Adverbial Modifiers (Only, Just etc.) Name an astronaut who nearly made it to the moon To satisfactorily answer such questions, need to know what are the different ways in which events can fail to happen. In this case there are several.
Need for User Model: Need for User Model What is meant?
The city: what granularity is required?
The rock group
The play/movie
The sports team (which one?)
Can hardly choose the right answer without knowing who is asking the question, and why. Where is Chicago? What is mold?
Not all “What is” Questions are definitional: Not all “What is” Questions are definitional Subclass or instance
What is a powerful adhesive? Distinction from co-members of class
What is a star fruit? Value or more common synonym
What is a nanometer?
What is rubella? Subclass/instance with property
What is a yellow spotted lizard? Ambiguous: definition or instance
What is an antacid? From a Web log:
Attention to Details: Attention to Details Tenses
Who is the Prime Minister of Japan?
Number
What are the largest snakes in the world?
Articles
What is mold?
Where is the Taj Mahal? ^ ^
Opinions, Hypotheses, Predictions and Beliefs: Opinions, Hypotheses, Predictions and Beliefs What does X think about Y?
Will X happen?
‘ “X will happen”, says Dr. A’
‘Prof. B believes that X will happen.’
‘X will happen’ (asserted by article writer) e.g. Is global warming real?
What is appropriate for QA?: What is appropriate for QA? How much emphasis should be placed on:
Retrieval
Built-in knowledge
Computation
Estimation
Inference
Sample questions
What is one plus one?
How many $2 pencils can I buy for $10?
How many genders are there?
How many legs does a person have?
How many books are there in a local library?
What was the dilemma facing Hamlet?
Relationship Questions: Relationship Questions An exercise in the ARDA AQUAINT program.
“What has been the relationship between Osama bin Laden and Sudan?”
“What does Soviet Cosmonaut Valentina Tereshkova (Vladinrouna) and U.S. Astronaut Sally Ride have in common?”
“What is the connection between actor and comedian Chris Rock and former Washington, D.C. mayor Marion Barry?”
Two approaches (Cycorp and IBM)
Cycorp Approach: Cycorp Approach Use original question terms as IR query
Break top retrieved documents into sentences
Generate Bayesian network with words as nodes from Sentence x Word matrix
Select ancestor terms to augment query
E.g. “What is the connection between actor and comedian Chris Rock and former Washington, D.C. mayor Marion Barry?”
Augmentation terms = {drug, arrested}
Iterate but where new network has sentences as nodes
Output sentences that are neighbours of augmented query Single Strategy
IBM Approach: IBM Approach Extending pattern-based agent
“What is the relationship between X and Y?” -> locate syntactic contexts with X and Y:
conjunction
subject-verb-object
objects of prepositions.
New profile-based agent
Local Context Analysis on documents containing either X or Y
Form vector of terms, normalize, intersect, sort
“What do Valentina Tereshkova and Sally Ride have in common?” ->
Space
First
Woman
Collins (the first woman to ever fly the space shuttle) Multi-part Strategy, including:
Decomposition/Recursive QA: Decomposition/Recursive QA “Who/What is X” require a profile of the subject – QA-by-Dossier
Can generate auxiliary questions based on type of question focus.
When/where was X born?
When/where/how did X die?
What occupation did X have?
Can generate follow-up questions based on earlier answers
What did X win?
What did X write?
What did X discover?
Constraint-based QA: Constraint-based QA QA-by-Dossier-with-Constraints
Variation of QA-by-Dossier
Ask auxiliary questions that constrain the answer to the original question.
Prager et al. (submitted)
When did Leonardo paint the Mona Lisa?: When did Leonardo paint the Mona Lisa?
Constraints: Capitalize on existence of natural relationships between events/situations that can be used as constraints
E.g. A person’s achievements occurred during his/her lifetime.
Develop constraints for a person and an achievement event:
date(died) = date(born) + 10
date(event) <= date(died)
For each constraint variable, ask Auxiliary Question to generate set of candidate answers, e.g.
When was Leonardo born?
When did Leonardo die? Constraints
Auxiliary Questions: Auxiliary Questions When was Leonardo born? When did Leonardo die?
Dossier-with-Constraints Process: Dossier-with-Constraints Process Original
Question Auxiliary
Questions Constraints Constraint Satisfaction +
Confidence Combination + +
Cross-Language QA: Cross-Language QA Probably easiest approach is to translate question to language of collection, and perform monolingual QA
All considerations that apply to CL-IR apply to CL-QA, and then some:
Named Entity Recognition
Parsers
Ontologies
…
Cross-Language QA: Cross-Language QA Jung and Lee, 2002.
User Query -> NLP -> SQL -> Relational Database
Morphological Analysis and Linguistic Resources are language dependent. Generate Lexico-Semantic patterns
Cross-Language QA: Cross-Language QA TREC CLIR for several years
CLEF (Cross-Language Evaluation Forum) http://clef.iei.pi.cnr.it:2002/
CLIR activities for several years
CL-QA in 2003 http://clef-qa.itc.it/
References: References Abney, S., Collins, M. and Singhal, A. “Answer Extraction”. In Proceedings ANLP 2000.
E. Brill, J. Lin, M. Banko, S. Dumais and A. Ng, “Data-Intensive Question Answering”, in Proceedings of the 10th Text Retrieval Conference (TREC-2001), NIST, Gaithersburg, MD, 2002.
D. Bikel, R. Schwartz, R. Weischedel, "An Algorithm that Learns What's in a Name," Machine Learning, 1999.
Byrd, R. and Ravin, Y. “Identifying and Extracting Relations in Text.” In Proceedings of NLDB 99, Klagenfurt, Austria, 1999.
Jennifer Chu-Carroll, John Prager, Christopher Welty, Krzysztof Czuba and David Ferrucci. "A Multi-Strategy and Multi-Source Approach to Question Answering", Proceedings of TREC2002, Gaithersburg, MD, 2003.
Clarke, C.L.A., Cormack, G.V., Kisman, D.I.E. and Lynam, T.R. “Question answering by passage selection (Multitext experiments for TREC-9)” in Proceedings of the 9th Text Retrieval Conference, pp. 673-683, NIST, Gaithersburg, MD, 2001.
Sanda Harabagiu, Dan Moldovan, Marius Pasca, Rada Mihalcea, Mihai Surdeanu, Razvan Bunescu, Roxana Girju, Vasile Rus and Paul Morarescu, FALCON: Boosting Knowledge for Answer Engines, in Proceedings of the 9th Text Retrieval Conference, pp. 479-488, NIST, Gaithersburg MD, 2001.
Sanda Harabagiu, Dan Moldovan, Marius Pasca, Rada Mihalcea, Mihai Surdeanu, Razvan Bunescu, Roxana Girju, Vasile Rus and Paul Morarescu, The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering, in Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL-2001), July 2001, Toulouse France, pages 274-281.
Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, Jonathan Slocum: Developing a Natural Language Interface to Complex Data. VLDB 1977: 292
References: References Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C-Y. “Question answering in Webclopedia” in Proceedings of the 9th Text Retrieval Conference, pp. 655-664, NIST, Gaithersburg, MD, 2001.
Ulf Hermjakob, Abdessamad Echihabi and Daniel Marcu, Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Proceedings of TREC2002, Gaithersburg MD, 2003.
Hanmin Jung, Gary Geunbae Lee, Multilingual Question Answering with High Portability on Relational Databases Workshop on Multilingual Summarization and Question Answering, COLING 2002
Boris Katz. “Annotating the World Wide Web using natural language”. Proceedings RIAO 1997.
Kupiec, J. “Murax: A robust linguistic approach for question answering using an on-line encyclopedia”. Proceedings 16th SIGIR, Pittsburgh, PA 2001.
Lenat, D. B. 1995. "Cyc: A Large-Scale Investment in Knowledge Infrastructure." Communications of the ACM 38, no. 11.
Mihalcea, R. and Moldovan, D. “A Method for Word Sense Disambiguation of Unrestricted Text”. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), pp. 152-158, College Park, MD, 1999.
Miller, G. “WordNet: A Lexical Database for English”, Communications of the ACM 38(11) pp. 39-41, 1995.
Dan I. Moldovan and Vasile Rus, ``Logic Form Transformation of WordNet and its Applicability to Question Answering'', Proceedings of the ACL 2001 Conference, July 2001,Toulouse, France.
Marius Pasca and Sanda Harabagiu, High Performance Question/Answering, in Proceedings of the 24th Annual International ACL SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2001), September 2001, New Orleans LA, pages 366-374.
References: References John M. Prager, Jennifer Chu-Carroll and Krzysztof Czuba, "A Multi-Strategy, Multi-Question Approach to Question Answering" submitted for publication.
Prager, J.M., Chu-Carroll, J., Brown, E.W. and Czuba, K. "Question Answering by Predictive Annotation”, in Advances in Open-Domain Question-Answering", Strzalkowski, T. and Harabagiu, S. Eds., Kluwer Academic Publishers, to appear 2003?.
Prager, J.M., Radev, D.R. and Czuba, K. “Answering What-Is Questions by Virtual Annotation”. Proceedings of Human Language Technologies Conference, San Diego CA, March 2001.
Prager, J.M., Brown, E.W., Coden, A. and Radev, R. "Question-Answering by Predictive Annotation”. Proceedings of SIGIR 2000, pp. 184-191, Athens, Greece.
Radev, D.R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Zhang, Z., Fan, W. & Prager, J.M. “Mining the Web for Answers to Natural Language Questions”, Proceedings of CIKM, Altlanta GA., 2001.
Radev, D.R., Prager, J.M. and Samn, V. "Ranking Suspected Answers to Natural Language Questions using Predictive Annotation”. Proceedings of ANLP 2000, pp. 150-157, Seattle, WA.
Deepak Ravichandran and Eduard Hovy, “Learning Surface Text Patterns for a Question Answering System”. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 41-47.
Rosch, E. et al. “Basic Objects in Natural Categories”, Cognitive Psychology 8, pp. 382-439, 1976.
Soubbotin, M. “Patterns of Potential Answer Expressions as Clues to the Right Answers” in Proceedings of the 10th Text Retrieval Conference, pp. 293-302, NIST, Gaithersburg, MD, 2002.
Soubbotin, M. and Soubbotin, S. “Use of Patterns for Detection of Answer Strings: A Systematic Approach” in Proceedings of the 11th Text Retrieval Conference, pp. 325-331, NIST, Gaithersburg, MD, 2003.
References: References Ellen M. Voorhees and Dawn Tice. 2000. Building a question answering test collection. In 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 200-207, Athens, August.
N. Wacholder, Y. Ravin and M. Choi. “Disambiguation of Proper Names in Text”, Proceedings of ANLP’97. Washington, DC, April 1997.
Warren, David H.D., & Fernando C.N. Pereira (1982) "An efficient easily adaptable system for interpreting natural language queries," Computational Linguistics, 8:3-4, 110-122.
Terry Winograd. 1972. Procedures as a representation for data in a computer program for under-standing natural language. Cognitive Psychology, 3(1).