logging in or signing up Qu RIAO 2000 00 04 12 Presentation Dante Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 46 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 20, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript The Effect of Pseudo Relevance Feedback on MT-Based CLIR: The Effect of Pseudo Relevance Feedback on MT-Based CLIR Yan Qu, Alla N. Eilerman Hongming Jin, David A. Evans CLARITECH Corporation Outline: Outline Our approach to Cross-Language Information Retrieval (CLIR) Objectives of this work Review of previous work with Pseudo Relevance Feedback (PRF) System diagram Data for experiments Error analysis of MT-based query translation The effect of PRF on French monolingual retrieval The effect of PRF on English-to-French cross-language retrieval Summary and conclusions Our Approach to CLIR: Our Approach to CLIR Used MT-based query translation to bridge the language gap Adapted pseudo relevance feedback to CLIR pre-translation query expansion post-translation query expansion combined (pre- and post-translation) query expansion Objectives: Objectives Identify factors that affect the quality of MT-based query translation Evaluate the effectiveness of using pseudo relevance feedback for improving CLIR performance Identify contexts for selecting these feedback methods Relevance Feedback in Monolingual Retrieval: Relevance Feedback in Monolingual Retrieval Relevance feedback (Salton & Buckley, 1990; Evans et al., 1999) Pseudo relevance feedback (PRF) (Evans & Lefferts, 1994; Milic-Frayling et al., 1998) Both have been demonstrated to be effective in improving retrieval performance Pseudo Relevance Feedback in CLIR: Pseudo Relevance Feedback in CLIR Pseudo relevance feedback in CLIR using bilingual corpus (Carbonell et al., 1997) Pseudo relevance feedback in CLIR using bilingual dictionaries (Hull & Grefenstette, 1996; Ballesteros & Croft, 1998) Pseudo relevance feedback in CLIR using machine translation (Qu et al., 2000) Pseudo Relevance Feedback in CLIR: Pseudo Relevance Feedback in CLIR CLIR with Simple MT-based Query Translation: CLIR with Simple MT-based Query Translation Queries in SLCLIR with Query Expansion Before MT: CLIR with Query Expansion Before MT Queries in SLCLIR with Query Expansion After MT: CLIR with Query Expansion After MT Queries in SLProcess Summary: Process Summary Processes: Processes English NLP to process English corpus and queries French NLP to process French corpus and queries SYSTRAN client-server based translation software for English-to-French query translation Automatic processing of English source queries Rocchio formula for term selection in pseudo relevance feedback French target corpus and English reference corpus are indexed using simplex NPs and all attested subterms CLARIT English NLP: CLARIT English NLP Used for processing the English corpus and the English queries Consists of a parser and morphological analyzer Uses an English lexicon and grammar to identify linguistic structures in texts Supplemented by a “stop word” list to filter out substantive words that are extraneous to the topics (e.g., document, relevant) French Text Processing (Pseudo-NLP Approach): French Text Processing (Pseudo-NLP Approach) Goal: to obtain mostly correct phrase segmentation Manually constructed resources lexicon of closed-class categories with 1081 entries “stop word” lexicon including 525 words and their inflected forms that are extraneous to the topics (e.g., document, pertinent) grammar based on the CLARIT English grammar and adapted to accommodate French categories no French morphological normalizationEnglish-to-French Translation: English-to-French Translation SYSTRAN Enterprise translation software Translation direction: English to French Client-server architecture Translation is a black box to our system No special or additional resources were used to supplement the translation process Data Sources for Experiments: Data Sources for Experiments TREC-6 CLIR track data collections provided by NIST (Voorhees & Harman, 1998) 250 MB collection of French SDA news (1988-1990) from the Swiss News Agency: 141,656 documents 750 MB collection of English AP news (1988-1990) from the Associated Press: 242,918 documents Topics for Experiments: Topics for Experiments TREC-6 CLIR track topics provided by NIST (Voorhees & Harman, 1998) 22 English topics for the English-to-French cross-language runs 22 French topics for the French monolingual runs Equivalent across languages Prepared by humans Composed of the title, description, and the narrative fields Sample English Topic: Sample English Topic <num> Number: CL1 <E-title> Waldheim Affair <E-desc> Description: Reasons for controversy surrounding Waldheim's World War II actions. <E-narr> Narrative: Revelations about Austrian President Kurt Waldheim’s participation in Nazi crimes during World War II are argued on both sides. Relevant documents are those that express doubts about the truth of these revelations. Documents that just discuss the affair are not relevant. Ideal French Topics: Ideal French Topics <num> Number: CL1 <F-title> Affaire Waldheim <F-desc> Description: Raisons de la controverse à l'égard des agissements de Waldheim pendant la deuxième guerre mondiale. <F-narr> Narrative: Les révélations sur la participation du président autrichien Kurt Waldheim aux crimes nazis pendant la deuxième guerre mondiale font l'objet de controverses. Les documents pertinents font état de doutes sur la culpabilité de Waldheim. Les articles qui ne font que mentionner l'affaire ne sont pas valables. CLARIT Queries: CLARIT Queries Composed of the title, description, and the narrative fields Processed automatically into query vectorsSample English Query Vector: Sample English Query Vector <cf="1" tf="1">waldheim affair</> <cf="1" tf="1">waldheim world war ii</> <cf="1" tf="1">nazi crime</> <cf="1" tf="1">austrian president kurt waldheim</> <cf="1" tf="1">austrian president</> <cf="1" tf="1">controversy surround</> <cf="1" tf="1">president kurt waldheim</> <cf="1" tf="1">kurt waldheim</> <cf="1" tf="3">waldheim</> <cf="1" tf="1">kurt</> <cf="1" tf="2">revelation</> <cf="1" tf="1">austrian</> <cf="1" tf="1">participation</> <cf="1" tf="1">surround</> <cf="1" tf="1">truth</>Sample French Query Vector: Sample French Query Vector <cf="1" tf="1">crimes nazis</> <cf="1" tf="1">affaire waldheim</> <cf="1" tf="1">président autrichien kurt waldheim</> <cf="1" tf="1">président autrichien</> <cf="1" tf="1">controverses</> <cf="1" tf="1">agissements</> <cf="1" tf="1">kurt waldheim</> <cf="1" tf="1">culpabilité</> <cf="1" tf="4">waldheim</> <cf="1" tf=”2">deuxième guerre mondiale</> <cf="1" tf="2">deuxième guerre</> <cf="1" tf="1">doutes</> <cf="1" tf="1">révélations</> <cf="1" tf="1">nazis</>Topic and Query Statistics: Topic and Query StatisticsEvaluation: Evaluation Relevance judgements on the French SDA news, prepared by NIST judges (TREC-6) Evaluation measures: eleven-point average precision (N=1000 documents) precision at low recall levels (10, 20, and 100 documents) recall exact precisionEnglish-to-French Retrieval vs. French Monolingual Retrieval(without PRF): English-to-French Retrieval vs. French Monolingual Retrieval (without PRF) Types of Translation Errors: Types of Translation Errors E1: missing translation of an English term E2: unnecessary translation of a borrowed English term E3: wrong sense disambiguation E4: wrong sense disambiguation caused by removed capitalization E5: word-by-word translation of a multiword (idiomatic) term E6: wrong phrase construction E7: broken phraseError Type 1: Missing Translation: Error Type 1: Missing Translation English: agencies’ Ideal French translation: (des) agences MT output: (d’)agencies Error Type 2: Unnecessary Translation: Error Type 2: Unnecessary Translation English: fast food Ideal French translation: fast food MT output: aliments de préparation rapide (food of fast preparation) Error Type 3: Wrong Sense Disambiguation: Error Type 3: Wrong Sense Disambiguation English: logging Ideal French translation: déforestation (deforestation) MT output: notation (notation) Error Type 4: Wrong Disambiguation Caused by Removed Capitalization: Error Type 4: Wrong Disambiguation Caused by Removed Capitalization English: aids (AIDS) Ideal French translation: sida (SIDA “AIDS”) MT output: aides (assistants) Error Type 5: Word-by-Word Translation of a Multiword Idiomatic Term: Error Type 5: Word-by-Word Translation of a Multiword Idiomatic Term English: death penalty Ideal French translation: la peine de mort MT output: la pénalité de la mort Error Type 6: Wrong Phrase Construction: Error Type 6: Wrong Phrase Construction English: austrian president kurt waldheim’s participation Ideal French translation: la participation du président autrichien kurt waldheim MT output: la participation autrichienne de waldheim de kurt de président Error Type 7: Broken Phrase: Error Type 7: Broken Phrase English: sex education Ideal French translation: éducation sexuelle MT output: éducation de sexe Error Distributions: Error Distributions 0 5 10 15 20 25 E1 E2 E3 E4 E5 E6 E7 Error Type Frequency FrequencySlide35: The Effect of PRF on French Monolingual Retrieval The Effect of PRF on English-to-French Retrieval: The Effect of PRF on English-to-French RetrievalEnglish-to-French Retrieval vs French Monolingual Retrieval (with PRF): English-to-French Retrieval vs French Monolingual Retrieval (with PRF) Cross-Language Retrieval vs. Monolingual Retrieval: Cross-Language Retrieval vs. Monolingual RetrievalCross-Language Retrieval vs. Monolingual Retrieval: Cross-Language Retrieval vs. Monolingual RetrievalPerformance of Different PRF Methods: Performance of Different PRF Methods Topic 1009 “Effects of Logging”: Topic 1009 “Effects of Logging” Key concept lost due to wrong sense disambiguation (E3 error): logging (felling trees) notation (notation) Pre-translation feedback neutralized the effect of the translation error by bringing useful thesaurus terms (tropical forest, tree, earth, sea, ocean, land, atmosphere, carbone dioxide, ozone depletion, greenhouse effect, global warming, destruction, pollution, damage, environment, environmentalist, conference, organization, world, nation, country). Result: 688% increase in average precision Post-translation feedback returned some useful terms introduced noise caused by the wrong translation of logging Result: 29% increase in average precision Combined feedback created a strong base query prior to translation further improved it with appropriate terms after translation avoided too much noise Result: 621% increase in average precision Topic 1015 “Death Penalty”: Topic 1015 “Death Penalty” Key concept lost due to word-by-word translation (E5 error): death penalty la pénalité de la mort (instead of la peine de mort) Pre-translation feedback neutralized the effect of the translation error by bringing useful thesaurus terms (crime, murder, murderer, law, justice, legislature, legislation, supreme court, prosecutor, prison, execution, etc.), most of which were translated correctly. Result: 1200% increase in average precision Post-translation feedback didn’t introduce any terms specifically related to the topic of death penalty, because the key term was missing Result: 79% decrease in average precision Combined feedback created a strong base query prior to translation further improved the query by bringing more relevant terms after translation (condamnation, condamné, exécution, exécuté, peine capitale, chaise électrique, etc.) Result: 2007% increase in average precision Topic 1010 “Solar Powered Cars”: Topic 1010 “Solar Powered Cars” One key term is translated incorrectly: solar powered cars automobiles/voitures actionnées solaires Other key terms are translated correctly: solar automobiles automobiles solaires alternative energy sources des souces ènergétiques alternatives fossil fuels combustibles fossiles Pre-translation and combined feedback created additional sources of errors and noise by introducing many extraneous terms related to automobile air pollution Result: 33-48% decrease in average precision Post-translation feedback contained fewer translation errors and less noise due to sufficient context Result: 39% increase in average precision Topic 1016 “Tuberculosis”: Topic 1016 “Tuberculosis” Key term is translated correctly: tuberculosis tuberculose Translation errors affected some important terms: aids (AIDS) aides (assistants) third-world (countries) le troisième-monde (the third world) Pre-translation and combined feedback created additional sources of errors and noise by introducing ambiguous thesaurus terms (cases, tests), which were mistranslated (caisse instead of cas, essai instead of test) acronyms (AIDS, CDC, HIV), either mistranslated or not translated Result: 29-30% decrease in average precision Post-translation feedback compensated for translation errors by bringing correct terms (SIDA “AIDS”, tiers monde “third world) additional useful terms (bacille, tuberculeux, virus, infectées, maladie, risque, santé, problème, etc.) Result: 32% increase in average precisionPerformance of Different PRF Methods: Performance of Different PRF MethodsThe Effect of PRF Methods: The Effect of PRF Methods Pre-translation and combined PRF can neutralize the effect of wrong sense disambiguation and literal translation of idiomatic phrases may create noise by introducing additional ambiguous or extraneous terms often returns English proper names and acronyms that may be translated incorrectly due to removed capitalizationThe Effect of PRF Methods: The Effect of PRF Methods Post-translation PRF effective when there is sufficient context in the translated query even if some terms are translated incorrectly often restores multiword terms that were broken down during the query translation finds additional useful multiword terms may fail when important key terms are translated incorrectly and there is no sufficient contextDecision Tree for Selecting PRF Methods: Decision Tree for Selecting PRF MethodsSummary: Summary Adopted pseudo relevance feedback for query expansion in CLIR with MT-based query translation Conducted analysis of translation errors Evaluated empirically the effect of three feedback methods on retrieval performance Examined contexts where different feedback methods are effective Conclusions: Conclusions Wrong sense disambiguation and inappropriate translation of multi-word terms are the most frequent translation errors when using MT. All feedback methods demonstrated significant performance improvement in CLIR compared with not using feedback. The use of PRF in general helps to reduce the negative effect of translation errors. Post-translation feedback generally outperforms pre-translation and combined feedback. The effectiveness of different feedback methods depends on the types of translation errors and the relative importance of the terms affected by these errors. Future Work: Future Work Investigate the effect of query length Investigate the effect of context Develop measures to evaluate the original query quality Develop measures to evaluate the translated query quality Investigate the empirical conditions for selecting different feedback methods The End: The End You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Qu RIAO 2000 00 04 12 Presentation Dante Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 46 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 20, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript The Effect of Pseudo Relevance Feedback on MT-Based CLIR: The Effect of Pseudo Relevance Feedback on MT-Based CLIR Yan Qu, Alla N. Eilerman Hongming Jin, David A. Evans CLARITECH Corporation Outline: Outline Our approach to Cross-Language Information Retrieval (CLIR) Objectives of this work Review of previous work with Pseudo Relevance Feedback (PRF) System diagram Data for experiments Error analysis of MT-based query translation The effect of PRF on French monolingual retrieval The effect of PRF on English-to-French cross-language retrieval Summary and conclusions Our Approach to CLIR: Our Approach to CLIR Used MT-based query translation to bridge the language gap Adapted pseudo relevance feedback to CLIR pre-translation query expansion post-translation query expansion combined (pre- and post-translation) query expansion Objectives: Objectives Identify factors that affect the quality of MT-based query translation Evaluate the effectiveness of using pseudo relevance feedback for improving CLIR performance Identify contexts for selecting these feedback methods Relevance Feedback in Monolingual Retrieval: Relevance Feedback in Monolingual Retrieval Relevance feedback (Salton & Buckley, 1990; Evans et al., 1999) Pseudo relevance feedback (PRF) (Evans & Lefferts, 1994; Milic-Frayling et al., 1998) Both have been demonstrated to be effective in improving retrieval performance Pseudo Relevance Feedback in CLIR: Pseudo Relevance Feedback in CLIR Pseudo relevance feedback in CLIR using bilingual corpus (Carbonell et al., 1997) Pseudo relevance feedback in CLIR using bilingual dictionaries (Hull & Grefenstette, 1996; Ballesteros & Croft, 1998) Pseudo relevance feedback in CLIR using machine translation (Qu et al., 2000) Pseudo Relevance Feedback in CLIR: Pseudo Relevance Feedback in CLIR CLIR with Simple MT-based Query Translation: CLIR with Simple MT-based Query Translation Queries in SLCLIR with Query Expansion Before MT: CLIR with Query Expansion Before MT Queries in SLCLIR with Query Expansion After MT: CLIR with Query Expansion After MT Queries in SLProcess Summary: Process Summary Processes: Processes English NLP to process English corpus and queries French NLP to process French corpus and queries SYSTRAN client-server based translation software for English-to-French query translation Automatic processing of English source queries Rocchio formula for term selection in pseudo relevance feedback French target corpus and English reference corpus are indexed using simplex NPs and all attested subterms CLARIT English NLP: CLARIT English NLP Used for processing the English corpus and the English queries Consists of a parser and morphological analyzer Uses an English lexicon and grammar to identify linguistic structures in texts Supplemented by a “stop word” list to filter out substantive words that are extraneous to the topics (e.g., document, relevant) French Text Processing (Pseudo-NLP Approach): French Text Processing (Pseudo-NLP Approach) Goal: to obtain mostly correct phrase segmentation Manually constructed resources lexicon of closed-class categories with 1081 entries “stop word” lexicon including 525 words and their inflected forms that are extraneous to the topics (e.g., document, pertinent) grammar based on the CLARIT English grammar and adapted to accommodate French categories no French morphological normalizationEnglish-to-French Translation: English-to-French Translation SYSTRAN Enterprise translation software Translation direction: English to French Client-server architecture Translation is a black box to our system No special or additional resources were used to supplement the translation process Data Sources for Experiments: Data Sources for Experiments TREC-6 CLIR track data collections provided by NIST (Voorhees & Harman, 1998) 250 MB collection of French SDA news (1988-1990) from the Swiss News Agency: 141,656 documents 750 MB collection of English AP news (1988-1990) from the Associated Press: 242,918 documents Topics for Experiments: Topics for Experiments TREC-6 CLIR track topics provided by NIST (Voorhees & Harman, 1998) 22 English topics for the English-to-French cross-language runs 22 French topics for the French monolingual runs Equivalent across languages Prepared by humans Composed of the title, description, and the narrative fields Sample English Topic: Sample English Topic <num> Number: CL1 <E-title> Waldheim Affair <E-desc> Description: Reasons for controversy surrounding Waldheim's World War II actions. <E-narr> Narrative: Revelations about Austrian President Kurt Waldheim’s participation in Nazi crimes during World War II are argued on both sides. Relevant documents are those that express doubts about the truth of these revelations. Documents that just discuss the affair are not relevant. Ideal French Topics: Ideal French Topics <num> Number: CL1 <F-title> Affaire Waldheim <F-desc> Description: Raisons de la controverse à l'égard des agissements de Waldheim pendant la deuxième guerre mondiale. <F-narr> Narrative: Les révélations sur la participation du président autrichien Kurt Waldheim aux crimes nazis pendant la deuxième guerre mondiale font l'objet de controverses. Les documents pertinents font état de doutes sur la culpabilité de Waldheim. Les articles qui ne font que mentionner l'affaire ne sont pas valables. CLARIT Queries: CLARIT Queries Composed of the title, description, and the narrative fields Processed automatically into query vectorsSample English Query Vector: Sample English Query Vector <cf="1" tf="1">waldheim affair</> <cf="1" tf="1">waldheim world war ii</> <cf="1" tf="1">nazi crime</> <cf="1" tf="1">austrian president kurt waldheim</> <cf="1" tf="1">austrian president</> <cf="1" tf="1">controversy surround</> <cf="1" tf="1">president kurt waldheim</> <cf="1" tf="1">kurt waldheim</> <cf="1" tf="3">waldheim</> <cf="1" tf="1">kurt</> <cf="1" tf="2">revelation</> <cf="1" tf="1">austrian</> <cf="1" tf="1">participation</> <cf="1" tf="1">surround</> <cf="1" tf="1">truth</>Sample French Query Vector: Sample French Query Vector <cf="1" tf="1">crimes nazis</> <cf="1" tf="1">affaire waldheim</> <cf="1" tf="1">président autrichien kurt waldheim</> <cf="1" tf="1">président autrichien</> <cf="1" tf="1">controverses</> <cf="1" tf="1">agissements</> <cf="1" tf="1">kurt waldheim</> <cf="1" tf="1">culpabilité</> <cf="1" tf="4">waldheim</> <cf="1" tf=”2">deuxième guerre mondiale</> <cf="1" tf="2">deuxième guerre</> <cf="1" tf="1">doutes</> <cf="1" tf="1">révélations</> <cf="1" tf="1">nazis</>Topic and Query Statistics: Topic and Query StatisticsEvaluation: Evaluation Relevance judgements on the French SDA news, prepared by NIST judges (TREC-6) Evaluation measures: eleven-point average precision (N=1000 documents) precision at low recall levels (10, 20, and 100 documents) recall exact precisionEnglish-to-French Retrieval vs. French Monolingual Retrieval(without PRF): English-to-French Retrieval vs. French Monolingual Retrieval (without PRF) Types of Translation Errors: Types of Translation Errors E1: missing translation of an English term E2: unnecessary translation of a borrowed English term E3: wrong sense disambiguation E4: wrong sense disambiguation caused by removed capitalization E5: word-by-word translation of a multiword (idiomatic) term E6: wrong phrase construction E7: broken phraseError Type 1: Missing Translation: Error Type 1: Missing Translation English: agencies’ Ideal French translation: (des) agences MT output: (d’)agencies Error Type 2: Unnecessary Translation: Error Type 2: Unnecessary Translation English: fast food Ideal French translation: fast food MT output: aliments de préparation rapide (food of fast preparation) Error Type 3: Wrong Sense Disambiguation: Error Type 3: Wrong Sense Disambiguation English: logging Ideal French translation: déforestation (deforestation) MT output: notation (notation) Error Type 4: Wrong Disambiguation Caused by Removed Capitalization: Error Type 4: Wrong Disambiguation Caused by Removed Capitalization English: aids (AIDS) Ideal French translation: sida (SIDA “AIDS”) MT output: aides (assistants) Error Type 5: Word-by-Word Translation of a Multiword Idiomatic Term: Error Type 5: Word-by-Word Translation of a Multiword Idiomatic Term English: death penalty Ideal French translation: la peine de mort MT output: la pénalité de la mort Error Type 6: Wrong Phrase Construction: Error Type 6: Wrong Phrase Construction English: austrian president kurt waldheim’s participation Ideal French translation: la participation du président autrichien kurt waldheim MT output: la participation autrichienne de waldheim de kurt de président Error Type 7: Broken Phrase: Error Type 7: Broken Phrase English: sex education Ideal French translation: éducation sexuelle MT output: éducation de sexe Error Distributions: Error Distributions 0 5 10 15 20 25 E1 E2 E3 E4 E5 E6 E7 Error Type Frequency FrequencySlide35: The Effect of PRF on French Monolingual Retrieval The Effect of PRF on English-to-French Retrieval: The Effect of PRF on English-to-French RetrievalEnglish-to-French Retrieval vs French Monolingual Retrieval (with PRF): English-to-French Retrieval vs French Monolingual Retrieval (with PRF) Cross-Language Retrieval vs. Monolingual Retrieval: Cross-Language Retrieval vs. Monolingual RetrievalCross-Language Retrieval vs. Monolingual Retrieval: Cross-Language Retrieval vs. Monolingual RetrievalPerformance of Different PRF Methods: Performance of Different PRF Methods Topic 1009 “Effects of Logging”: Topic 1009 “Effects of Logging” Key concept lost due to wrong sense disambiguation (E3 error): logging (felling trees) notation (notation) Pre-translation feedback neutralized the effect of the translation error by bringing useful thesaurus terms (tropical forest, tree, earth, sea, ocean, land, atmosphere, carbone dioxide, ozone depletion, greenhouse effect, global warming, destruction, pollution, damage, environment, environmentalist, conference, organization, world, nation, country). Result: 688% increase in average precision Post-translation feedback returned some useful terms introduced noise caused by the wrong translation of logging Result: 29% increase in average precision Combined feedback created a strong base query prior to translation further improved it with appropriate terms after translation avoided too much noise Result: 621% increase in average precision Topic 1015 “Death Penalty”: Topic 1015 “Death Penalty” Key concept lost due to word-by-word translation (E5 error): death penalty la pénalité de la mort (instead of la peine de mort) Pre-translation feedback neutralized the effect of the translation error by bringing useful thesaurus terms (crime, murder, murderer, law, justice, legislature, legislation, supreme court, prosecutor, prison, execution, etc.), most of which were translated correctly. Result: 1200% increase in average precision Post-translation feedback didn’t introduce any terms specifically related to the topic of death penalty, because the key term was missing Result: 79% decrease in average precision Combined feedback created a strong base query prior to translation further improved the query by bringing more relevant terms after translation (condamnation, condamné, exécution, exécuté, peine capitale, chaise électrique, etc.) Result: 2007% increase in average precision Topic 1010 “Solar Powered Cars”: Topic 1010 “Solar Powered Cars” One key term is translated incorrectly: solar powered cars automobiles/voitures actionnées solaires Other key terms are translated correctly: solar automobiles automobiles solaires alternative energy sources des souces ènergétiques alternatives fossil fuels combustibles fossiles Pre-translation and combined feedback created additional sources of errors and noise by introducing many extraneous terms related to automobile air pollution Result: 33-48% decrease in average precision Post-translation feedback contained fewer translation errors and less noise due to sufficient context Result: 39% increase in average precision Topic 1016 “Tuberculosis”: Topic 1016 “Tuberculosis” Key term is translated correctly: tuberculosis tuberculose Translation errors affected some important terms: aids (AIDS) aides (assistants) third-world (countries) le troisième-monde (the third world) Pre-translation and combined feedback created additional sources of errors and noise by introducing ambiguous thesaurus terms (cases, tests), which were mistranslated (caisse instead of cas, essai instead of test) acronyms (AIDS, CDC, HIV), either mistranslated or not translated Result: 29-30% decrease in average precision Post-translation feedback compensated for translation errors by bringing correct terms (SIDA “AIDS”, tiers monde “third world) additional useful terms (bacille, tuberculeux, virus, infectées, maladie, risque, santé, problème, etc.) Result: 32% increase in average precisionPerformance of Different PRF Methods: Performance of Different PRF MethodsThe Effect of PRF Methods: The Effect of PRF Methods Pre-translation and combined PRF can neutralize the effect of wrong sense disambiguation and literal translation of idiomatic phrases may create noise by introducing additional ambiguous or extraneous terms often returns English proper names and acronyms that may be translated incorrectly due to removed capitalizationThe Effect of PRF Methods: The Effect of PRF Methods Post-translation PRF effective when there is sufficient context in the translated query even if some terms are translated incorrectly often restores multiword terms that were broken down during the query translation finds additional useful multiword terms may fail when important key terms are translated incorrectly and there is no sufficient contextDecision Tree for Selecting PRF Methods: Decision Tree for Selecting PRF MethodsSummary: Summary Adopted pseudo relevance feedback for query expansion in CLIR with MT-based query translation Conducted analysis of translation errors Evaluated empirically the effect of three feedback methods on retrieval performance Examined contexts where different feedback methods are effective Conclusions: Conclusions Wrong sense disambiguation and inappropriate translation of multi-word terms are the most frequent translation errors when using MT. All feedback methods demonstrated significant performance improvement in CLIR compared with not using feedback. The use of PRF in general helps to reduce the negative effect of translation errors. Post-translation feedback generally outperforms pre-translation and combined feedback. The effectiveness of different feedback methods depends on the types of translation errors and the relative importance of the terms affected by these errors. Future Work: Future Work Investigate the effect of query length Investigate the effect of context Develop measures to evaluate the original query quality Develop measures to evaluate the translated query quality Investigate the empirical conditions for selecting different feedback methods The End: The End