logging in or signing up Edinburgh IGK 06 Savin Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 157 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Hybrid Data-Driven Models of Machine Translation: Hybrid Data-Driven Models of Machine Translation Andy Way (& Declan Groves) National Centre for Language Technology, School of Computing, Dublin City University, Dublin 9, Ireland away@computing.dcu.ieOutline: Outline Motivations Example-Based Machine Translation Marker-Based EBMT Statistical Machine Translation Experiments: Language Pairs & Corpora Used EBMT and PBSMT baseline systems Hybrid System Experiments Making use of merged data sets ‘Phrases’, ‘Chunks’ and Training-Test Corpora Conclusions Future WorkMotivations: Motivations Most MT research carried out today is corpus-based: Example-Based Machine Translation (EBMT) Statistical Machine Translation (SMT) Lack of comparative research: Relative unavailability of EBMT systems Lack of participation of EBMT researchers in competitive evaluations Dominance of the SMT approach Example-Based Machine Translation: Example-Based Machine Translation As with SMT, EBMT makes use of information extracted from sententially-aligned bilingual corpora. In general: SMT only uses parameters, throws away data EBMT makes use of linguistic units directly During Translation: Source side of bitext is searched for close matches Source-target subsentential links are determined Relevant target fragments retrieved and recombined to derive final translation.EBMT: An Example: EBMT: An Example Assumes an aligned bilingual corpus of examples against which input text is matched Best match is found using a similarity metric based on word co-occurrence, POS, generalized templates and bilingual dictionaries (exact and fuzzy matching)Slide6: EBMT: An Example Assumes an aligned bilingual corpus of examples against which input text is matched Best match is found using a similarity metric based on word co-occurrence, POS, generalized templates and bilingual dictionaries (exact and fuzzy matching)Slide7: EBMT: An Example Identify useful fragmentsSlide8: EBMT: An Example on Monday lundi John went to Jean est allé à the baker’s la boulangerie Identify useful fragments Recombination depends on nature of examples usedMarker-Based EBMT at DCU: Marker-Based EBMT at DCU Marker-Based EBMT at DCU Gaijin: [Veale & Way], RANLP ‘97 [Gough et al.], AMTA ‘02 wEBMT: [Way & Gough], Comp. Linguistics ‘03 [Gough & Way], EAMT ‘04 [Way & Gough], TMI ‘04 [Gough], PhD Thesis ‘05 [Way & Gough], Natural Language Engineering ‘05 [Way & Gough], Machine Translation ‘05 [Groves & Way], ACL w/shop on Data-Driven MT ‘05 [Groves & Way], Machine Translation & EAMT ‘06 MaTrEx: [Armstrong et al.], TC-STAR OpenLab ‘06 [Stroppa et al.], NIST MT-Eval ‘06, AMTA ’06, IWSLT-06System Development: System DevelopmentSystem Development: System DevelopmentSystem Development: System DevelopmentMarker-Based EBMT: Marker-Based EBMT “The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context.” [Green, 1979] Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes The Dearborn Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant.Marker-Based EBMT: Marker-Based EBMT “The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context.” [Green, 1979] Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes The Dearborn Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant. Three NPs start with determiners, one with a possessive pronoun Nominal element will appear soon to the right Sets of determiners and possessive pronouns small and finiteMarker-Based EBMT: Marker-Based EBMT “The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context.” [Green, 1979] Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes The Dearborn Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant. Four prepositional phrases, with prepositional heads NP object will appear soon to the right Set of prepositions small and finiteMarker-Based EBMT: Chunking: Marker-Based EBMT: Chunking Use a set of closed-class marker words to segment aligned source and target sentences during a pre-processing stage <PUNC> now used as end-of-chunk marker English Marker words extracted from CELEXMarker-Based EBMT: Chunking (2): Marker-Based EBMT: Chunking (2) Enables the use of basic syntactic markup for extraction of translation resources Source-target sentence pairs are tagged with marker categories in pre-processing stage EN: <PRON> you click apply <PREP> to view <DET> the effect <PREP> of <DET> the selection FR: <PRON> vous cliquez <PRON> sur appliquer <PREP> pour visualiser <DET>l’ effet <PREP> de <DET> la sélection Aligned source-target chunks created by segmenting sentences based on these marker tags along with cognate and word co-occurrence information: <PRON> you click apply : <PRON> vous cliquez sur appliquer <PREP> to view : <PREP> pour visualiser <DET> the effect : <DET> l’effet <PREP> of the selection : <PREP> de la sélection Marker-Based EBMT: Chunking (2): Marker-Based EBMT: Chunking (2) Enables the use of basic syntactic markup for extraction of translation resources Source-target sentence pairs are tagged with marker categories in pre-processing stage EN: <PRON> you click apply <PREP> to view <DET> the effect <PREP> of <DET> the selection FR: <PRON> vous cliquez <PRON> sur appliquer <PREP> pour visualiser <DET>l’ effet <PREP> de <DET> la sélection Aligned source-target chunks created by segmenting sentences based on these marker tags along with cognate and word co-occurrence information: <PRON> you click apply : <PRON> vous cliquez sur appliquer <PREP> to view : <PREP> pour visualiser <DET> the effect : <DET> l’effet <PREP> of the selection : <PREP> de la sélection Chunks must contain at least one non-marker word—ensures chunks contain useful contextual informationMarker-Based EBMT: Lexicon & Template Extraction: Marker-Based EBMT: Lexicon & Template Extraction Chunks containing only one non-marker word in both source and target languages can then be used to extract a word-level lexicon: <PREP> to: <PREP> pour <LEX> view: <LEX> visualiser <LEX> effect: <LEX> effet <DET> the: <DET> l <PREP> of: <PREP> de In a final pre-processing stage, we produce a set of generalized marker templates by replacing marker words with their tags: <PRON> click apply : <PRON> cliquez sur appliquer <PREP> view : <PREP> visualiser <DET> effect : <DET> effet <PREP> the selection : <PREP> la sélection Any marker word pair can now be inserted at the appropriate tag location. More general examples add flexibility to the matching process and improve coverage (and quality) Marker-Based EBMT: Marker-Based EBMT During translation: Resources are searched from maximal (specific source-target sentence-pairs) to minimal context (word-for-word translation). Retrieved example translation candidates are recombined, along with their weights, based on source sentence order System outputs n-best list of translations Phrase-Based SMT: Phrase-Based SMT SMT translation and language models now make use of phrase-translations in TM, along with word correspondences, to improve translation output. Better modelling of syntax and local word-reordering Phrase extraction heuristics based on word alignments shown to be better than more syntactically motivated approaches [Koehn et al., 2003] Perform word alignment in both source-target and target-source directions Take intersection of unidirectional alignments Extend the intersection iteratively into the union by adding adjacent alignments within the alignment space [Och & Ney 2003, Koehn et al., 2003]. Extract all possible phrases from sentence pairs which correspond to these alignments Phrase probabilities can be calculated from relative frequencies Outline: Recap: Outline: Recap Motivations Example-Based Machine Translation Marker-Based EBMT Statistical Machine Translation Experiments: Language Pairs & Corpora Used EBMT and PBSMT baseline systems Hybrid System Experiments Making use of merged data sets ‘Phrases’, ‘Chunks’ and Training-Test Corpora Conclusions Future WorkExperiments: ExperimentsEBMT vs. WB-SMT: EBMT vs. WB-SMT [Way & Gough, 05] (cf. talk here in May 05): on 203K-$ Sun TM (4.8M words), and a 4K-$ test set (ave. $-length 13.1 words EN, 15.2 words FR), EBMT>vanilla WB-SMT (Giza++, CMU-Cambridge statistical toolkit, ISI ReWrite Decoder) for FREN Best BLEU scores: ENFR: .453 EBMT, .338 WB-SMT FREN: .461 EBMT, .446 WB-SMTSlide25: EBMT & PB-SMT (on Sun TM) English-French The Phrase-Based system using GIZA-Data outperforms the same system seeded with EBMT-Data on all metrics, bar Precision (0.6598 vs. 0.6661) Marker-Based EBMT system beats both Phrase-Based SMT systems, particularly for BLEU (0.4409 vs. 0.3758) and Recall (0.6877 vs. 0.5759).Slide26: EBMT & PB-SMT (on Sun TM) French-English Scores for all systems are better for FREN than for ENFR Again, the Phrase-Based system using GIZA data outperforms the same system seeded with EBMT data. As for ENFR, the Marker-Based EBMT system significantly outperforms both Phrase-Based SMT systems for FREN.Towards Hybridity: Towards Hybridity Decided to merge data sources Combine parts of EBMT sub-sentential alignments with parts of the data induced using GIZA++ Performed a number of experiments using: EBMT Phrases + GIZA++ Words (SEMI-HYBRID) Investigate if quality of EBMT phrases is better than GIZA++ phrases All Data (HYBRID); GIZA++ Words & Phrases + EBMT Words & Phrases EBMT phrases will be used instead of SMT n-grams EBMT phrases should add extra probability to ‘more useful’ SMT phrases; i.e. the probabilities of the phrases in the intersection of these two sets are boosted EBMT Phrases Giza++ PhrasesSlide28: Merging Data Sources: ENFR Results Using EBMT phrases + GIZA words improves significantly on using EBMT data alone Merging all the EBMT and GIZA data improves on all metrics, most significantly for BLEU score (0.4259 vs. 0.3643 SEMI-HYBRID). EBMT system still wins out for BLEU score, Recall and WER Slide29: Merging Data Sources: FREN Results Using EBMT phrases + GIZA words shows improvements on PBSMT system seeded with EBMT data, but improves only on the GIZA seeded system’s BLEU score (0.4888 vs. 0.4198). However, merging all data improves on both PBSMT systems on all metrics EBMT system beats Hybrid system only on Recall and WERSlide30: Results: Discussion PBSMT Best PBSMT BLEU scores (with Giza++ data only): 0.375 (E-F), 0.420 (F-E); Seeding PBSMT with EBMT data gets good scores: for BLEU, 0.364 (E-F), 0.395 (F-E); note differences in data size (1.73M vs. 403K) PBSMT loses out to EBMT system Semi-Hybrid System Seeding Pharaoh with SMT words and EBMT phrases improves over baseline Giza++ seeded system; Data size diminishes considerably (430K vs. 1.73M); Worse results than for EBMT system. Fully-Hybrid System Better results than for ‘semi-hybrid’ system: E-F 0.426 (0.396), F-E 0.489 (0.427); Data size increases to 2.04M phrase table entries For F-E, Hybrid system beats EBMT on BLEU (0.4888 vs. 0.4611) & Precision (0.6927 vs. 0.6782); EBMT ahead for Recall & WER. EBMT & PB-SMT (on Europarl): EBMT & PB-SMT (on Europarl) [Groves & Way, 06a/b] Added SMT-chunks to EBMT system hybrid ‘statistical EBMT’ system New domain: Europarl (FREN, 322K-$ ) [Koehn, 05] Extracted training data from designated training sets, filtering based on sentence length and relative sentence length (ratio of 1.5 used). Allowed us to extract high-quality training sets For testing, randomly extracted 5000 sentences from the Europarl common test set. Avg. sentence lengths: 20.5 words (French), 19.0 words (English)EBMT vs. PBSMT: EBMT vs. PBSMT Compared the performance of our Marker-Based EBMT system against that of a PB-SMT system built using: Pharaoh Phrase-Based Decoder [Koehn, 04] SRI LM toolkit [Stolcke, 02]. Refined alignment strategy [Och & Ney, 03] Trained on incremental data sets, tested on 5000 sentence test set Effect of increasing training data on translation quality Performed translation for FREN Evaluated translation quality automatically using BLEU [Papineni et al., 02], Precision & Recall (GTM toolkit [Turian et al., 03]) and Word-error rate (WER)EBMT vs. PBSMT: French-English: EBMT vs. PBSMT: French-English 78K 156K 322K Doubling the amount of data improves performance across the board for both EBMT and PBSMT PBSMT system clearly outperforms EBMT system, on average achieving 0.07 BLEU score higher PBSMT achieves a significantly lower WER (e.g. 68.55 vs. 82.43 for the 322K data set) Increasing amount of training data results in: 3-5% increase in relative BLEU for PBSMT 6.2% to 10.3% relative BLEU score improvement for EBMT EBMT vs. PBSMT: English-French: EBMT vs. PBSMT: English-French 78K 156K 322K PBSMT continues to outperform EBMT system by some distance e.g. 0.1933 vs. 0.1488 BLEU score, 0.518 vs. 0.4578 Recall for 322K data set Difference between systems is somewhat less for ENFR than for FREN EBMT system performance much more consistent for both directions PBSMT system performs 2% BLEU score worse (10% relative) for ENFR than for FREN French-English is ‘easier’ Fewer agreement errors, problems with boundary friction e.g. le the (FREN), the le, la, les, l’ (ENFR) EBMT scores higher for ENFR than for FREN in terms of BLEU score Cf. [Callison-Burch et al., 06], BLEU for evaluating non-n-gram-based systemsHybrid System Experiments: Hybrid System Experiments Decided to merge elements of EBMT marker-based alignments with PBSMT phrases and words induced via GIZA++ Number of Hybrid Systems LEX-EBMT: Replaced EBMT lexicon with higher quality PBSMT word-alignments, to lower WER H-EBMT vs. H-PBSMT: Merged PBSMT words and phrases with EBMT data (words and phrases) and passed resulting data to baseline EBMT and baseline PBSMT systems H-EBMT-LM: Reranked the output of H-EBMT systems using the PBSMT system’s equivalent language modelHybrid Experiments: French-English: Hybrid Experiments: French-EnglishHybrid Experiments: French-English: Hybrid Experiments: French-EnglishHybrid Experiments: French-English: Hybrid Experiments: French-EnglishHybrid Experiments: French-English: Hybrid Experiments: French-English Use of the improved lexicon (LEX-EBMT), leads to only slight improvements (average relative increase of 2.9% BLEU) Adding Hybrid data improves above baselines, for both EBMT (H-EBMT) and PBSMT (H-PBSMT) H-PBSMT system achieves higher BLEU score trained on 78K & 156K compared with PBSMT system when trained on twice as much data. The addition of the language model to the H-EBMT system helps guide word order after lexical selection and thus improves results furtherHybrid Experiments: English-French: Hybrid Experiments: English-French We see similar results for ENFR as for FREN The more SMT-like the EBMT system becomes, the more the BLEU scores fall in line with other metrics, i.e. higher for FREN than for ENFR Using the hybrid data set we get a 15% average relative increase in BLEU score for the EBMT system, and 6.2% for the H-PBSMT system over its baseline The H-PBSMT system performs almost as well as the baseline system trained on over 4 times the amount of dataSMT ‘phrases’ vs. EBMT ‘chunks’: SMT ‘phrases’ vs. EBMT ‘chunks’ Many more SMT phrases are derived than EBMT chunks Not reflected in scores Doubling amount of data, doubles amount of sub-sentential alignments for both systems Indicates the heterogeneous nature of the Europarl corpus Taking the 322K training set : 93.0% SMT chunks found only once, 99.4% occur < 10 times 96.6% EBMT chunks found only once, 99.8% occur < 10 times Of the top 10 most frequent chunks in SMT-only set, 7 are made up solely of marker words: du of the de la of the union européenne union états membres member states de l of the dans le in the n est is parlement européen parliament que nous that we que la that the Translation Examples: Translation Examples PBSMT: we have all accepted the lesson of the food crisis the 1990s H-PBSMT: we have all accepted the lesson of the food crisis in the 1990s REF: we have all learned our lesson from the food crisis of the 90s --------------------------------------------------------------------------------------------------------------------- PBSMT: indeed if the second-pillar example were less frequent there would be fewer poor H-PBSMT: indeed if pensions for example were less frequent there would be fewer poor REF: if indeed for example pensions were less inadequate there would be fewer poor people --------------------------------------------------------------------------------------------------------------------- PBSMT: in this regard the port controls there should be making the regulations still more stringent H-PBSMT: when it comes to port controls we must make the regulations still more stringent REF: it is important to tighten up regulations regarding the control of harbours and ports even further --------------------------------------------------------------------------------------------------------------------- PBSMT: it also requires that we continue to discussed the entry into force of fiscal harmonization H-PBSMT: we also need to continue to ask ourselves questions about the implementation of fiscal harmonization REF: we also still need to continue to question the implementing of fiscal harmonisationRemarks: Remarks [Groves & Way, 05] showed how an EBMT system outperforms a PBSMT system when trained on the Sun Microsystems’ data set This time around, the baseline PBSMT system achieves higher quality than all variants of the EBMT system Heterogeneous Europarl vs. Homogeneous Sun data Chunk coverage is lower on Europarl data set: 6% translations produced using chunks alone (Sun) vs. 1% on Europarl EBMT system considered 13 words on average for direct translation (vs. 7 for Sun data) Significant improvements seen when using higher-quality lexicon Improvements also seen when LM introduced H-PBSMT system able to outperform baseline PBSMT system Further gains to be made from hybrid corpus-based approaches Small overlap on chunks extracted via EBMT and SMT methodsHybrid ‘Example-Based SMT’: The MaTrEx system: Hybrid ‘Example-Based SMT’: The MaTrEx systemHybrid Example-Based SMT: Hybrid Example-Based SMT [Armstrong et al., 06]: OpenLab MT-EVAL (March 06)—adding EBMT chunks to ‘vanilla Pharaoh’ PB-SMT system adds about 4 BLEU points for ESEN [Stroppa et al., 06]: adding EBMT chunks to ‘vanilla Pharaoh’ PB-SMT system adds about 5 BLEU points for BasqueEN Good performance in IWSLT-06Outline: Recap: Outline: Recap Motivations Example-Based Machine Translation Marker-Based EBMT Statistical Machine Translation Experiments: Language Pairs & Corpora Used EBMT and PBSMT baseline systems Hybrid System Experiments Making use of merged data sets ‘Phrases’, ‘Chunks’ and Training-Test Corpora Conclusions Future Work‘Phrases’, ‘Chunks’ and Training-Test Corpora: ‘Phrases’, ‘Chunks’ and Training-Test Corpora SMT phrases are contiguous sequences of n-grams Typically, EBMT performance is comparable with PB-SMT with fewer sub-sentential alignments As EBMT chunks are different from SMT ‘phrases’, use them if available in your PB-SMT systems (cf. OpenLab ESEN and AMTA BasqueEN results). They: Provide longer sequences of context better translations Reinforce probability of good but infrequent SMT ‘phrases’ As SMT ‘phrases’ are different from EBMT chunks, use them if available in your EBMT systems SMT ‘phrases’ typically shorter than EBMT chunks, so more useful where training/test material is more heterogeneous—where EBMT chunks are ‘too long’ to cover the input data, SMT n-grams can fill in before we need to resort to W2W translation (always last resort) cf. CMU findings in recent NIST MT-Eval …‘Phrases’, ‘Chunks’ and Training-Test Corpora: ‘Phrases’, ‘Chunks’ and Training-Test Corpora Looks like EBMT better on homogeneous training data: EBMT > PB-SMT on Sun TM (ENFR) EBMT > PB-SMT on EF TM (BasqueEN) SMT better on (more) heterogeneous data PB-SMT > EBMT on Europarl (ENFR) Predictors of Usefulness of Approach given Text Type: Chunk coverage Amount of W2W TranslationConclusions: Conclusions Combining SMT ‘phrases’ and EBMT chunks in a hybrid ‘statistical EBMT’ or ‘example-based SMT’ system will improve your system output Blind adherence to one approach will guarantee that your performance is less than it could otherwise be John Hutchins: “EBMT is Hybrid MT” Joe Olive: “Need combination of ‘rules’ and statistics”Ongoing & Future Work: Ongoing & Future Work Automatic detection of Marker Words Most common SMT phrases consist mainly of marker words Plan to increase levels of hybridity Code a simple EBMT decoder, factoring in Marker-Based recombination approach along with probabilities Use exact sentence matching in PBSMT, as in EBMT Integration of generalized templates into PBSMT system (and reintegrate them into EBMT system) Integrate marker tag information into SMT language and translation models Hybrid EBMT-EBMT System (with CMU)?! What’s the contribution of EBMT chunks if an SMT system is allowed as much training data as it likes?Slide51: Thank you for your attention. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Edinburgh IGK 06 Savin Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 157 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Hybrid Data-Driven Models of Machine Translation: Hybrid Data-Driven Models of Machine Translation Andy Way (& Declan Groves) National Centre for Language Technology, School of Computing, Dublin City University, Dublin 9, Ireland away@computing.dcu.ieOutline: Outline Motivations Example-Based Machine Translation Marker-Based EBMT Statistical Machine Translation Experiments: Language Pairs & Corpora Used EBMT and PBSMT baseline systems Hybrid System Experiments Making use of merged data sets ‘Phrases’, ‘Chunks’ and Training-Test Corpora Conclusions Future WorkMotivations: Motivations Most MT research carried out today is corpus-based: Example-Based Machine Translation (EBMT) Statistical Machine Translation (SMT) Lack of comparative research: Relative unavailability of EBMT systems Lack of participation of EBMT researchers in competitive evaluations Dominance of the SMT approach Example-Based Machine Translation: Example-Based Machine Translation As with SMT, EBMT makes use of information extracted from sententially-aligned bilingual corpora. In general: SMT only uses parameters, throws away data EBMT makes use of linguistic units directly During Translation: Source side of bitext is searched for close matches Source-target subsentential links are determined Relevant target fragments retrieved and recombined to derive final translation.EBMT: An Example: EBMT: An Example Assumes an aligned bilingual corpus of examples against which input text is matched Best match is found using a similarity metric based on word co-occurrence, POS, generalized templates and bilingual dictionaries (exact and fuzzy matching)Slide6: EBMT: An Example Assumes an aligned bilingual corpus of examples against which input text is matched Best match is found using a similarity metric based on word co-occurrence, POS, generalized templates and bilingual dictionaries (exact and fuzzy matching)Slide7: EBMT: An Example Identify useful fragmentsSlide8: EBMT: An Example on Monday lundi John went to Jean est allé à the baker’s la boulangerie Identify useful fragments Recombination depends on nature of examples usedMarker-Based EBMT at DCU: Marker-Based EBMT at DCU Marker-Based EBMT at DCU Gaijin: [Veale & Way], RANLP ‘97 [Gough et al.], AMTA ‘02 wEBMT: [Way & Gough], Comp. Linguistics ‘03 [Gough & Way], EAMT ‘04 [Way & Gough], TMI ‘04 [Gough], PhD Thesis ‘05 [Way & Gough], Natural Language Engineering ‘05 [Way & Gough], Machine Translation ‘05 [Groves & Way], ACL w/shop on Data-Driven MT ‘05 [Groves & Way], Machine Translation & EAMT ‘06 MaTrEx: [Armstrong et al.], TC-STAR OpenLab ‘06 [Stroppa et al.], NIST MT-Eval ‘06, AMTA ’06, IWSLT-06System Development: System DevelopmentSystem Development: System DevelopmentSystem Development: System DevelopmentMarker-Based EBMT: Marker-Based EBMT “The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context.” [Green, 1979] Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes The Dearborn Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant.Marker-Based EBMT: Marker-Based EBMT “The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context.” [Green, 1979] Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes The Dearborn Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant. Three NPs start with determiners, one with a possessive pronoun Nominal element will appear soon to the right Sets of determiners and possessive pronouns small and finiteMarker-Based EBMT: Marker-Based EBMT “The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context.” [Green, 1979] Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes The Dearborn Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant. Four prepositional phrases, with prepositional heads NP object will appear soon to the right Set of prepositions small and finiteMarker-Based EBMT: Chunking: Marker-Based EBMT: Chunking Use a set of closed-class marker words to segment aligned source and target sentences during a pre-processing stage <PUNC> now used as end-of-chunk marker English Marker words extracted from CELEXMarker-Based EBMT: Chunking (2): Marker-Based EBMT: Chunking (2) Enables the use of basic syntactic markup for extraction of translation resources Source-target sentence pairs are tagged with marker categories in pre-processing stage EN: <PRON> you click apply <PREP> to view <DET> the effect <PREP> of <DET> the selection FR: <PRON> vous cliquez <PRON> sur appliquer <PREP> pour visualiser <DET>l’ effet <PREP> de <DET> la sélection Aligned source-target chunks created by segmenting sentences based on these marker tags along with cognate and word co-occurrence information: <PRON> you click apply : <PRON> vous cliquez sur appliquer <PREP> to view : <PREP> pour visualiser <DET> the effect : <DET> l’effet <PREP> of the selection : <PREP> de la sélection Marker-Based EBMT: Chunking (2): Marker-Based EBMT: Chunking (2) Enables the use of basic syntactic markup for extraction of translation resources Source-target sentence pairs are tagged with marker categories in pre-processing stage EN: <PRON> you click apply <PREP> to view <DET> the effect <PREP> of <DET> the selection FR: <PRON> vous cliquez <PRON> sur appliquer <PREP> pour visualiser <DET>l’ effet <PREP> de <DET> la sélection Aligned source-target chunks created by segmenting sentences based on these marker tags along with cognate and word co-occurrence information: <PRON> you click apply : <PRON> vous cliquez sur appliquer <PREP> to view : <PREP> pour visualiser <DET> the effect : <DET> l’effet <PREP> of the selection : <PREP> de la sélection Chunks must contain at least one non-marker word—ensures chunks contain useful contextual informationMarker-Based EBMT: Lexicon & Template Extraction: Marker-Based EBMT: Lexicon & Template Extraction Chunks containing only one non-marker word in both source and target languages can then be used to extract a word-level lexicon: <PREP> to: <PREP> pour <LEX> view: <LEX> visualiser <LEX> effect: <LEX> effet <DET> the: <DET> l <PREP> of: <PREP> de In a final pre-processing stage, we produce a set of generalized marker templates by replacing marker words with their tags: <PRON> click apply : <PRON> cliquez sur appliquer <PREP> view : <PREP> visualiser <DET> effect : <DET> effet <PREP> the selection : <PREP> la sélection Any marker word pair can now be inserted at the appropriate tag location. More general examples add flexibility to the matching process and improve coverage (and quality) Marker-Based EBMT: Marker-Based EBMT During translation: Resources are searched from maximal (specific source-target sentence-pairs) to minimal context (word-for-word translation). Retrieved example translation candidates are recombined, along with their weights, based on source sentence order System outputs n-best list of translations Phrase-Based SMT: Phrase-Based SMT SMT translation and language models now make use of phrase-translations in TM, along with word correspondences, to improve translation output. Better modelling of syntax and local word-reordering Phrase extraction heuristics based on word alignments shown to be better than more syntactically motivated approaches [Koehn et al., 2003] Perform word alignment in both source-target and target-source directions Take intersection of unidirectional alignments Extend the intersection iteratively into the union by adding adjacent alignments within the alignment space [Och & Ney 2003, Koehn et al., 2003]. Extract all possible phrases from sentence pairs which correspond to these alignments Phrase probabilities can be calculated from relative frequencies Outline: Recap: Outline: Recap Motivations Example-Based Machine Translation Marker-Based EBMT Statistical Machine Translation Experiments: Language Pairs & Corpora Used EBMT and PBSMT baseline systems Hybrid System Experiments Making use of merged data sets ‘Phrases’, ‘Chunks’ and Training-Test Corpora Conclusions Future WorkExperiments: ExperimentsEBMT vs. WB-SMT: EBMT vs. WB-SMT [Way & Gough, 05] (cf. talk here in May 05): on 203K-$ Sun TM (4.8M words), and a 4K-$ test set (ave. $-length 13.1 words EN, 15.2 words FR), EBMT>vanilla WB-SMT (Giza++, CMU-Cambridge statistical toolkit, ISI ReWrite Decoder) for FREN Best BLEU scores: ENFR: .453 EBMT, .338 WB-SMT FREN: .461 EBMT, .446 WB-SMTSlide25: EBMT & PB-SMT (on Sun TM) English-French The Phrase-Based system using GIZA-Data outperforms the same system seeded with EBMT-Data on all metrics, bar Precision (0.6598 vs. 0.6661) Marker-Based EBMT system beats both Phrase-Based SMT systems, particularly for BLEU (0.4409 vs. 0.3758) and Recall (0.6877 vs. 0.5759).Slide26: EBMT & PB-SMT (on Sun TM) French-English Scores for all systems are better for FREN than for ENFR Again, the Phrase-Based system using GIZA data outperforms the same system seeded with EBMT data. As for ENFR, the Marker-Based EBMT system significantly outperforms both Phrase-Based SMT systems for FREN.Towards Hybridity: Towards Hybridity Decided to merge data sources Combine parts of EBMT sub-sentential alignments with parts of the data induced using GIZA++ Performed a number of experiments using: EBMT Phrases + GIZA++ Words (SEMI-HYBRID) Investigate if quality of EBMT phrases is better than GIZA++ phrases All Data (HYBRID); GIZA++ Words & Phrases + EBMT Words & Phrases EBMT phrases will be used instead of SMT n-grams EBMT phrases should add extra probability to ‘more useful’ SMT phrases; i.e. the probabilities of the phrases in the intersection of these two sets are boosted EBMT Phrases Giza++ PhrasesSlide28: Merging Data Sources: ENFR Results Using EBMT phrases + GIZA words improves significantly on using EBMT data alone Merging all the EBMT and GIZA data improves on all metrics, most significantly for BLEU score (0.4259 vs. 0.3643 SEMI-HYBRID). EBMT system still wins out for BLEU score, Recall and WER Slide29: Merging Data Sources: FREN Results Using EBMT phrases + GIZA words shows improvements on PBSMT system seeded with EBMT data, but improves only on the GIZA seeded system’s BLEU score (0.4888 vs. 0.4198). However, merging all data improves on both PBSMT systems on all metrics EBMT system beats Hybrid system only on Recall and WERSlide30: Results: Discussion PBSMT Best PBSMT BLEU scores (with Giza++ data only): 0.375 (E-F), 0.420 (F-E); Seeding PBSMT with EBMT data gets good scores: for BLEU, 0.364 (E-F), 0.395 (F-E); note differences in data size (1.73M vs. 403K) PBSMT loses out to EBMT system Semi-Hybrid System Seeding Pharaoh with SMT words and EBMT phrases improves over baseline Giza++ seeded system; Data size diminishes considerably (430K vs. 1.73M); Worse results than for EBMT system. Fully-Hybrid System Better results than for ‘semi-hybrid’ system: E-F 0.426 (0.396), F-E 0.489 (0.427); Data size increases to 2.04M phrase table entries For F-E, Hybrid system beats EBMT on BLEU (0.4888 vs. 0.4611) & Precision (0.6927 vs. 0.6782); EBMT ahead for Recall & WER. EBMT & PB-SMT (on Europarl): EBMT & PB-SMT (on Europarl) [Groves & Way, 06a/b] Added SMT-chunks to EBMT system hybrid ‘statistical EBMT’ system New domain: Europarl (FREN, 322K-$ ) [Koehn, 05] Extracted training data from designated training sets, filtering based on sentence length and relative sentence length (ratio of 1.5 used). Allowed us to extract high-quality training sets For testing, randomly extracted 5000 sentences from the Europarl common test set. Avg. sentence lengths: 20.5 words (French), 19.0 words (English)EBMT vs. PBSMT: EBMT vs. PBSMT Compared the performance of our Marker-Based EBMT system against that of a PB-SMT system built using: Pharaoh Phrase-Based Decoder [Koehn, 04] SRI LM toolkit [Stolcke, 02]. Refined alignment strategy [Och & Ney, 03] Trained on incremental data sets, tested on 5000 sentence test set Effect of increasing training data on translation quality Performed translation for FREN Evaluated translation quality automatically using BLEU [Papineni et al., 02], Precision & Recall (GTM toolkit [Turian et al., 03]) and Word-error rate (WER)EBMT vs. PBSMT: French-English: EBMT vs. PBSMT: French-English 78K 156K 322K Doubling the amount of data improves performance across the board for both EBMT and PBSMT PBSMT system clearly outperforms EBMT system, on average achieving 0.07 BLEU score higher PBSMT achieves a significantly lower WER (e.g. 68.55 vs. 82.43 for the 322K data set) Increasing amount of training data results in: 3-5% increase in relative BLEU for PBSMT 6.2% to 10.3% relative BLEU score improvement for EBMT EBMT vs. PBSMT: English-French: EBMT vs. PBSMT: English-French 78K 156K 322K PBSMT continues to outperform EBMT system by some distance e.g. 0.1933 vs. 0.1488 BLEU score, 0.518 vs. 0.4578 Recall for 322K data set Difference between systems is somewhat less for ENFR than for FREN EBMT system performance much more consistent for both directions PBSMT system performs 2% BLEU score worse (10% relative) for ENFR than for FREN French-English is ‘easier’ Fewer agreement errors, problems with boundary friction e.g. le the (FREN), the le, la, les, l’ (ENFR) EBMT scores higher for ENFR than for FREN in terms of BLEU score Cf. [Callison-Burch et al., 06], BLEU for evaluating non-n-gram-based systemsHybrid System Experiments: Hybrid System Experiments Decided to merge elements of EBMT marker-based alignments with PBSMT phrases and words induced via GIZA++ Number of Hybrid Systems LEX-EBMT: Replaced EBMT lexicon with higher quality PBSMT word-alignments, to lower WER H-EBMT vs. H-PBSMT: Merged PBSMT words and phrases with EBMT data (words and phrases) and passed resulting data to baseline EBMT and baseline PBSMT systems H-EBMT-LM: Reranked the output of H-EBMT systems using the PBSMT system’s equivalent language modelHybrid Experiments: French-English: Hybrid Experiments: French-EnglishHybrid Experiments: French-English: Hybrid Experiments: French-EnglishHybrid Experiments: French-English: Hybrid Experiments: French-EnglishHybrid Experiments: French-English: Hybrid Experiments: French-English Use of the improved lexicon (LEX-EBMT), leads to only slight improvements (average relative increase of 2.9% BLEU) Adding Hybrid data improves above baselines, for both EBMT (H-EBMT) and PBSMT (H-PBSMT) H-PBSMT system achieves higher BLEU score trained on 78K & 156K compared with PBSMT system when trained on twice as much data. The addition of the language model to the H-EBMT system helps guide word order after lexical selection and thus improves results furtherHybrid Experiments: English-French: Hybrid Experiments: English-French We see similar results for ENFR as for FREN The more SMT-like the EBMT system becomes, the more the BLEU scores fall in line with other metrics, i.e. higher for FREN than for ENFR Using the hybrid data set we get a 15% average relative increase in BLEU score for the EBMT system, and 6.2% for the H-PBSMT system over its baseline The H-PBSMT system performs almost as well as the baseline system trained on over 4 times the amount of dataSMT ‘phrases’ vs. EBMT ‘chunks’: SMT ‘phrases’ vs. EBMT ‘chunks’ Many more SMT phrases are derived than EBMT chunks Not reflected in scores Doubling amount of data, doubles amount of sub-sentential alignments for both systems Indicates the heterogeneous nature of the Europarl corpus Taking the 322K training set : 93.0% SMT chunks found only once, 99.4% occur < 10 times 96.6% EBMT chunks found only once, 99.8% occur < 10 times Of the top 10 most frequent chunks in SMT-only set, 7 are made up solely of marker words: du of the de la of the union européenne union états membres member states de l of the dans le in the n est is parlement européen parliament que nous that we que la that the Translation Examples: Translation Examples PBSMT: we have all accepted the lesson of the food crisis the 1990s H-PBSMT: we have all accepted the lesson of the food crisis in the 1990s REF: we have all learned our lesson from the food crisis of the 90s --------------------------------------------------------------------------------------------------------------------- PBSMT: indeed if the second-pillar example were less frequent there would be fewer poor H-PBSMT: indeed if pensions for example were less frequent there would be fewer poor REF: if indeed for example pensions were less inadequate there would be fewer poor people --------------------------------------------------------------------------------------------------------------------- PBSMT: in this regard the port controls there should be making the regulations still more stringent H-PBSMT: when it comes to port controls we must make the regulations still more stringent REF: it is important to tighten up regulations regarding the control of harbours and ports even further --------------------------------------------------------------------------------------------------------------------- PBSMT: it also requires that we continue to discussed the entry into force of fiscal harmonization H-PBSMT: we also need to continue to ask ourselves questions about the implementation of fiscal harmonization REF: we also still need to continue to question the implementing of fiscal harmonisationRemarks: Remarks [Groves & Way, 05] showed how an EBMT system outperforms a PBSMT system when trained on the Sun Microsystems’ data set This time around, the baseline PBSMT system achieves higher quality than all variants of the EBMT system Heterogeneous Europarl vs. Homogeneous Sun data Chunk coverage is lower on Europarl data set: 6% translations produced using chunks alone (Sun) vs. 1% on Europarl EBMT system considered 13 words on average for direct translation (vs. 7 for Sun data) Significant improvements seen when using higher-quality lexicon Improvements also seen when LM introduced H-PBSMT system able to outperform baseline PBSMT system Further gains to be made from hybrid corpus-based approaches Small overlap on chunks extracted via EBMT and SMT methodsHybrid ‘Example-Based SMT’: The MaTrEx system: Hybrid ‘Example-Based SMT’: The MaTrEx systemHybrid Example-Based SMT: Hybrid Example-Based SMT [Armstrong et al., 06]: OpenLab MT-EVAL (March 06)—adding EBMT chunks to ‘vanilla Pharaoh’ PB-SMT system adds about 4 BLEU points for ESEN [Stroppa et al., 06]: adding EBMT chunks to ‘vanilla Pharaoh’ PB-SMT system adds about 5 BLEU points for BasqueEN Good performance in IWSLT-06Outline: Recap: Outline: Recap Motivations Example-Based Machine Translation Marker-Based EBMT Statistical Machine Translation Experiments: Language Pairs & Corpora Used EBMT and PBSMT baseline systems Hybrid System Experiments Making use of merged data sets ‘Phrases’, ‘Chunks’ and Training-Test Corpora Conclusions Future Work‘Phrases’, ‘Chunks’ and Training-Test Corpora: ‘Phrases’, ‘Chunks’ and Training-Test Corpora SMT phrases are contiguous sequences of n-grams Typically, EBMT performance is comparable with PB-SMT with fewer sub-sentential alignments As EBMT chunks are different from SMT ‘phrases’, use them if available in your PB-SMT systems (cf. OpenLab ESEN and AMTA BasqueEN results). They: Provide longer sequences of context better translations Reinforce probability of good but infrequent SMT ‘phrases’ As SMT ‘phrases’ are different from EBMT chunks, use them if available in your EBMT systems SMT ‘phrases’ typically shorter than EBMT chunks, so more useful where training/test material is more heterogeneous—where EBMT chunks are ‘too long’ to cover the input data, SMT n-grams can fill in before we need to resort to W2W translation (always last resort) cf. CMU findings in recent NIST MT-Eval …‘Phrases’, ‘Chunks’ and Training-Test Corpora: ‘Phrases’, ‘Chunks’ and Training-Test Corpora Looks like EBMT better on homogeneous training data: EBMT > PB-SMT on Sun TM (ENFR) EBMT > PB-SMT on EF TM (BasqueEN) SMT better on (more) heterogeneous data PB-SMT > EBMT on Europarl (ENFR) Predictors of Usefulness of Approach given Text Type: Chunk coverage Amount of W2W TranslationConclusions: Conclusions Combining SMT ‘phrases’ and EBMT chunks in a hybrid ‘statistical EBMT’ or ‘example-based SMT’ system will improve your system output Blind adherence to one approach will guarantee that your performance is less than it could otherwise be John Hutchins: “EBMT is Hybrid MT” Joe Olive: “Need combination of ‘rules’ and statistics”Ongoing & Future Work: Ongoing & Future Work Automatic detection of Marker Words Most common SMT phrases consist mainly of marker words Plan to increase levels of hybridity Code a simple EBMT decoder, factoring in Marker-Based recombination approach along with probabilities Use exact sentence matching in PBSMT, as in EBMT Integration of generalized templates into PBSMT system (and reintegrate them into EBMT system) Integrate marker tag information into SMT language and translation models Hybrid EBMT-EBMT System (with CMU)?! What’s the contribution of EBMT chunks if an SMT system is allowed as much training data as it likes?Slide51: Thank you for your attention.