The Effect of Pseudo Relevance Feedback on MT-Based CLIR: The Effect of Pseudo Relevance Feedback on MT-Based CLIR Yan Qu, Alla N. Eilerman
Hongming Jin, David A. Evans
CLARITECH Corporation
Outline: Outline Our approach to Cross-Language Information Retrieval (CLIR)
Objectives of this work
Review of previous work with Pseudo Relevance Feedback (PRF)
System diagram
Data for experiments
Error analysis of MT-based query translation
The effect of PRF on French monolingual retrieval
The effect of PRF on English-to-French cross-language retrieval
Summary and conclusions
Our Approach to CLIR: Our Approach to CLIR Used MT-based query translation to bridge the language gap
Adapted pseudo relevance feedback to CLIR
pre-translation query expansion
post-translation query expansion
combined (pre- and post-translation) query expansion
Objectives: Objectives Identify factors that affect the quality of MT-based query translation
Evaluate the effectiveness of using pseudo relevance feedback for improving CLIR performance
Identify contexts for selecting these feedback methods
Relevance Feedback in Monolingual Retrieval: Relevance Feedback in Monolingual Retrieval Relevance feedback
(Salton & Buckley, 1990; Evans et al., 1999)
Pseudo relevance feedback (PRF) (Evans & Lefferts, 1994; Milic-Frayling et al., 1998)
Both have been demonstrated to be effective in improving retrieval performance
Pseudo Relevance Feedback in CLIR: Pseudo Relevance Feedback in CLIR Pseudo relevance feedback in CLIR using bilingual corpus
(Carbonell et al., 1997)
Pseudo relevance feedback in CLIR using bilingual dictionaries
(Hull & Grefenstette, 1996; Ballesteros & Croft, 1998)
Pseudo relevance feedback in CLIR using machine translation
(Qu et al., 2000)
Pseudo Relevance Feedback in CLIR: Pseudo Relevance Feedback in CLIR
CLIR with Simple MT-based Query Translation: CLIR with Simple MT-based Query Translation Queries in SL
CLIR with Query Expansion Before MT: CLIR with Query Expansion Before MT Queries in SL
CLIR with Query Expansion After MT: CLIR with Query Expansion After MT Queries in SL
Process Summary: Process Summary
Processes: Processes English NLP to process English corpus and queries
French NLP to process French corpus and queries
SYSTRAN client-server based translation software for English-to-French query translation
Automatic processing of English source queries
Rocchio formula for term selection in pseudo relevance feedback
French target corpus and English reference corpus are indexed using simplex NPs and all attested subterms
CLARIT English NLP: CLARIT English NLP Used for processing the English corpus and the English queries
Consists of a parser and morphological analyzer
Uses an English lexicon and grammar to identify linguistic structures in texts
Supplemented by a “stop word” list to filter out substantive words that are extraneous to the topics (e.g., document, relevant)
French Text Processing (Pseudo-NLP Approach): French Text Processing (Pseudo-NLP Approach) Goal: to obtain mostly correct phrase segmentation
Manually constructed resources
lexicon of closed-class categories with 1081 entries
“stop word” lexicon including 525 words and their inflected forms that are extraneous to the topics (e.g., document, pertinent)
grammar based on the CLARIT English grammar and adapted to accommodate French categories
no French morphological normalization
English-to-French Translation: English-to-French Translation SYSTRAN Enterprise translation software
Translation direction: English to French
Client-server architecture
Translation is a black box to our system
No special or additional resources were used to supplement the translation process
Data Sources for Experiments: Data Sources for Experiments TREC-6 CLIR track data collections provided by NIST
(Voorhees & Harman, 1998)
250 MB collection of French SDA news (1988-1990) from the Swiss News Agency: 141,656 documents
750 MB collection of English AP news (1988-1990) from the Associated Press: 242,918 documents
Topics for Experiments: Topics for Experiments TREC-6 CLIR track topics provided by NIST (Voorhees & Harman, 1998)
22 English topics for the English-to-French cross-language runs
22 French topics for the French monolingual runs
Equivalent across languages
Prepared by humans
Composed of the title, description, and the narrative fields
Sample English Topic: Sample English Topic
Number: CL1
Waldheim Affair
Description:
Reasons for controversy surrounding Waldheim's World War II actions.
Narrative:
Revelations about Austrian President Kurt Waldheim’s participation in Nazi crimes during World War II are argued on both sides. Relevant documents are those that express doubts about the truth of these revelations. Documents that just discuss the affair are not relevant.
Ideal French Topics: Ideal French Topics
Number: CL1
Affaire Waldheim
Description:
Raisons de la controverse à l'égard des agissements de Waldheim pendant la deuxième guerre mondiale.
Narrative:
Les révélations sur la participation du président autrichien Kurt Waldheim aux crimes nazis pendant la deuxième guerre mondiale font l'objet de controverses. Les documents pertinents font état de doutes sur la culpabilité de Waldheim. Les articles qui ne font que mentionner l'affaire ne sont pas valables.
CLARIT Queries: CLARIT Queries Composed of the title, description, and the narrative fields
Processed automatically into query vectors
Sample English Query Vector: Sample English Query Vector waldheim affair
waldheim world war ii
nazi crime
austrian president kurt waldheim
austrian president
controversy surround
president kurt waldheim
kurt waldheim
waldheim
kurt
revelation
austrian
participation
surround
truth
Sample French Query Vector: Sample French Query Vector crimes nazis
affaire waldheim
président autrichien kurt waldheim
président autrichien
controverses
agissements
kurt waldheim
culpabilité
waldheim
deuxième guerre mondiale
deuxième guerre
doutes
révélations
nazis
Topic and Query Statistics: Topic and Query Statistics
Evaluation: Evaluation Relevance judgements on the French SDA news, prepared by NIST judges (TREC-6)
Evaluation measures:
eleven-point average precision (N=1000 documents)
precision at low recall levels (10, 20, and 100 documents)
recall
exact precision
English-to-French Retrieval vs. French Monolingual Retrieval(without PRF): English-to-French Retrieval vs. French Monolingual Retrieval (without PRF)
Types of Translation Errors: Types of Translation Errors E1: missing translation of an English term
E2: unnecessary translation of a borrowed English term
E3: wrong sense disambiguation
E4: wrong sense disambiguation caused by removed capitalization
E5: word-by-word translation of a multiword (idiomatic) term
E6: wrong phrase construction
E7: broken phrase
Error Type 1: Missing Translation: Error Type 1: Missing Translation English: agencies’
Ideal French translation: (des) agences
MT output: (d’)agencies
Error Type 2: Unnecessary Translation: Error Type 2: Unnecessary Translation English: fast food
Ideal French translation: fast food
MT output:
aliments de préparation rapide
(food of fast preparation)
Error Type 3: Wrong Sense Disambiguation: Error Type 3: Wrong Sense Disambiguation English: logging
Ideal French translation:
déforestation (deforestation)
MT output: notation (notation)
Error Type 4: Wrong Disambiguation Caused by Removed Capitalization: Error Type 4: Wrong Disambiguation Caused by Removed Capitalization English: aids (AIDS)
Ideal French translation:
sida (SIDA “AIDS”)
MT output: aides (assistants)
Error Type 5: Word-by-Word Translation of a Multiword Idiomatic Term: Error Type 5: Word-by-Word Translation of a Multiword Idiomatic Term English: death penalty
Ideal French translation:
la peine de mort
MT output: la pénalité de la mort
Error Type 6: Wrong Phrase Construction: Error Type 6: Wrong Phrase Construction
English:
austrian president kurt waldheim’s participation
Ideal French translation:
la participation du président autrichien kurt waldheim
MT output:
la participation autrichienne de waldheim de kurt de président
Error Type 7: Broken Phrase: Error Type 7: Broken Phrase English: sex education
Ideal French translation:
éducation sexuelle
MT output: éducation de sexe
Error Distributions: Error Distributions 0 5 10 15 20 25 E1 E2 E3 E4 E5 E6 E7 Error Type Frequency Frequency
Slide35: The Effect of PRF on French Monolingual Retrieval
The Effect of PRF on English-to-French Retrieval: The Effect of PRF on English-to-French Retrieval
English-to-French Retrieval vs French Monolingual Retrieval (with PRF): English-to-French Retrieval vs French Monolingual Retrieval (with PRF)
Cross-Language Retrieval vs. Monolingual Retrieval: Cross-Language Retrieval vs. Monolingual Retrieval
Cross-Language Retrieval vs. Monolingual Retrieval: Cross-Language Retrieval vs. Monolingual Retrieval
Performance of Different PRF Methods: Performance of Different PRF Methods
Topic 1009 “Effects of Logging”: Topic 1009 “Effects of Logging” Key concept lost due to wrong sense disambiguation (E3 error): logging (felling trees) notation (notation)
Pre-translation feedback
neutralized the effect of the translation error by bringing useful thesaurus terms (tropical forest, tree, earth, sea, ocean, land, atmosphere, carbone dioxide, ozone depletion, greenhouse effect, global warming, destruction, pollution, damage, environment, environmentalist, conference, organization, world, nation, country).
Result: 688% increase in average precision
Post-translation feedback
returned some useful terms
introduced noise caused by the wrong translation of logging
Result: 29% increase in average precision
Combined feedback
created a strong base query prior to translation
further improved it with appropriate terms after translation
avoided too much noise
Result: 621% increase in average precision
Topic 1015 “Death Penalty”: Topic 1015 “Death Penalty” Key concept lost due to word-by-word translation (E5 error):
death penalty la pénalité de la mort (instead of la peine de mort)
Pre-translation feedback
neutralized the effect of the translation error by bringing useful thesaurus terms (crime, murder, murderer, law, justice, legislature, legislation, supreme court, prosecutor, prison, execution, etc.), most of which were translated correctly.
Result: 1200% increase in average precision
Post-translation feedback
didn’t introduce any terms specifically related to the topic of death penalty, because the key term was missing
Result: 79% decrease in average precision
Combined feedback
created a strong base query prior to translation
further improved the query by bringing more relevant terms after translation (condamnation, condamné, exécution, exécuté, peine capitale, chaise électrique, etc.)
Result: 2007% increase in average precision
Topic 1010 “Solar Powered Cars”: Topic 1010 “Solar Powered Cars” One key term is translated incorrectly:
solar powered cars automobiles/voitures actionnées solaires
Other key terms are translated correctly:
solar automobiles automobiles solaires
alternative energy sources des souces ènergétiques alternatives
fossil fuels combustibles fossiles
Pre-translation and combined feedback created additional sources of errors and noise by introducing many extraneous terms related to automobile air pollution
Result: 33-48% decrease in average precision
Post-translation feedback contained fewer translation errors and less noise due to sufficient context
Result: 39% increase in average precision
Topic 1016 “Tuberculosis”: Topic 1016 “Tuberculosis” Key term is translated correctly: tuberculosis tuberculose
Translation errors affected some important terms:
aids (AIDS) aides (assistants)
third-world (countries) le troisième-monde (the third world)
Pre-translation and combined feedback created additional sources of errors and noise by introducing
ambiguous thesaurus terms (cases, tests), which were mistranslated (caisse instead of cas, essai instead of test)
acronyms (AIDS, CDC, HIV), either mistranslated or not translated
Result: 29-30% decrease in average precision
Post-translation feedback compensated for translation errors by bringing
correct terms (SIDA “AIDS”, tiers monde “third world)
additional useful terms (bacille, tuberculeux, virus, infectées, maladie, risque, santé, problème, etc.)
Result: 32% increase in average precision
Performance of Different PRF Methods: Performance of Different PRF Methods
The Effect of PRF Methods: The Effect of PRF Methods Pre-translation and combined PRF
can neutralize the effect of wrong sense disambiguation and literal translation of idiomatic phrases
may create noise by introducing additional ambiguous or extraneous terms
often returns English proper names and acronyms that may be translated incorrectly due to removed capitalization
The Effect of PRF Methods: The Effect of PRF Methods Post-translation PRF
effective when there is sufficient context in the translated query even if some terms are translated incorrectly
often restores multiword terms that were broken down during the query translation
finds additional useful multiword terms
may fail when important key terms are translated incorrectly and there is no sufficient context
Decision Tree for Selecting PRF Methods: Decision Tree for Selecting PRF Methods
Summary: Summary Adopted pseudo relevance feedback for query expansion in CLIR with MT-based query translation
Conducted analysis of translation errors
Evaluated empirically the effect of three feedback methods on retrieval performance
Examined contexts where different feedback methods are effective
Conclusions: Conclusions Wrong sense disambiguation and inappropriate translation of multi-word terms are the most frequent translation errors when using MT.
All feedback methods demonstrated significant performance improvement in CLIR compared with not using feedback.
The use of PRF in general helps to reduce the negative effect of translation errors.
Post-translation feedback generally outperforms pre-translation and combined feedback.
The effectiveness of different feedback methods depends on the types of translation errors and the relative importance of the terms affected by these errors.
Future Work: Future Work Investigate the effect of query length
Investigate the effect of context
Develop measures to evaluate the original query quality
Develop measures to evaluate the translated query quality
Investigate the empirical conditions for selecting different feedback methods
The End: The End