MT

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Machine Translation Lecture 20: 

Machine Translation Lecture 20 Motivation, Difficulties and Core Techniques

Motivation: 

Motivation Huge commercial need for translation $2,000,000,000 industry. 10% of EU budget. Mostly technical documentation for products with international market. Also EU documents! Potential saving (and potential quality improvements) from machine translation. Successful deployment of MT already: 80% of EU documents between Spanish and French are results of machine translation

Difficulties (words/syntax): 

Difficulties (words/syntax) One word in source language may have many translations. E.g., “wear”  “haku”, “kiru”, “kaburu” Complex expression in source language => Single word in target language. (“nach dem krieg”  “postwar”) Order/phrasing different: “I swam across the river”  “J’ai traverse le fleuve en nageant”

Difficulties (pragmatics): 

Difficulties (pragmatics) Conventional structures for reports may be different in each language. Politeness conventions differ greatly.

Example: Google: 

Example: Google “Que ce soit à l’Halloween ou dans ta vie de tous les jours, ils y a des règles à suivre lorsque tu dois traverser la rue. »  “That it is in Halloween or in your life of the every day, they has rules there to follow when you must cross the street.”

Traditional Approach: 

Traditional Approach Do some level of analysis of source, then use transfer rules to map to representation of target. Source Text Target Text Interlingua Source Syntax Source Semantics Target Semantics Target Syntax Direct Translation Shallow Transfer Deep Transfer Varying approaches depending on depth of analysis.

Transfer vs Interlingua: 

Transfer vs Interlingua Transfer systems map words/syntax/semantics in one language to words/syntax/semantics in another. May need many special case rules to deal with language idiosyncrasies. Interlingua systems translate from a source language to a language independent representation, then can translate from that to a target language.

Translating between multiple languages: 

Translating between multiple languages Spanish analysis German analysis English analysis Spanish-German transfer Spanish-English transfer etc Spanish generation German generation English generation Versus Interlingua Spanish analysis German analysis English analysis Interlingua Spanish Generation German Generation English Generation Transfer Systems

Interlingua versus transfer: 

Interlingua versus transfer So.. Interlingua simpler for multiple languages. But required language independent representation is complex. Target language has no influence on analysis process so can be harder to deal with some idiosyncratic translations.

Speech versus Text: 

Speech versus Text Spoken language translation is particularly hard. Speech recognition process error prone. Intonation/prosody often discarded in recognition, and hard to add into synthesised speech in other language. Text translation Easier, but spelling errors etc still common. In either there may be many unknown words.

Machine Aided Translation: 

Machine Aided Translation Often translation may be made easier by sharing the task between man and machine – still saves on manual translation. Human (+ specialised pre-processing tools) may pre-edit source documents: substituting unknown words, identifying proper nouns, indicating sense/class of ambiguous words.

Machine Aided Translation: 

Machine Aided Translation Human may also post-edit: Correcting output from MT-system, e.g, “In this study it will be sought to answer..”  “This study will seek to answer..” MT system may interact with user allowing user to select from or correct ambiguous or potentially incorrect translations. Pre and post-editing does not always require knowledge of both languages.

Machine Aided Translation: 

Machine Aided Translation More extreme example: Translation memories. Store human translations of words and phrases. Make these easily retrievable for the human translator translating same things again. Can be made more “Intelligent” with fuzzy matching and replacement. Popular with human translators.

Modern MT Approaches: 

Modern MT Approaches Rule Based Transfer rules map from source to target language representations. Statistical Given alternative possible translations, find the most probable one in the target language (given a corpus in the target language) Use P(S|T) and P(T) for possible translations Google moving from rule-based to more statistical methods. Example-based Extend idea of translation memories. Reuse existing translation fragments.

MT Evaluation: 

MT Evaluation Evaluating translation systems is hard. Expensive to evaluate “by hand” – requires human translator to make judgements. May compare results on sample documents to fixed set of reference translations. But hard to automatically compare translations to assess closeness.

Summary: 

Summary Translating is hard – languages vary by more than by words. Main approaches – transfer or interlingua. Statistical and example-based techniques also used. Full, automatic, high quality translation is beyond state of the art. Machine aided translation is nevertheless very useful.