logging in or signing up lrec06 assist pres Connor Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 44 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 07, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Using collocations from comparable corpora to find translation equivalents: Using collocations from comparable corpora to find translation equivalents Serge Sharoff, Bogdan Babych, Anthony Hartley Centre for Translation Studies University of Leeds {s.sharoff,b.babych,a.hartley}@leeds.ac.ukOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workWhat are translation equivalents?: What are translation equivalents? Terminology: ignitron=Ignitron=игнитрон General lexicon: исчерпывающий ответ=irrefragable answer strong: 57 subentries in Oxford Russian Dictionary (ORD), but no strong feeling, field, opposition, sense, voice Parallel corpora are not always available: strong voice: 16 in Europarl vs. 46 in the BNC Comparable corpora for terminology: (Dagan, Church, 1997; Bennison, et al, 2000), but not for words from the general lexicon Comparable corpora for translators: absolutely vs. assolutamente, but not a procedure for finding equivalentsThe problems we address: The problems we address Hospital admission can prove a particularly daunting experience. I did all the cleaning, cooking and kept his books in order, which was no mean feat. The problem of finding a bridge between two comparable corpora Main steps 1. Generalising source contexts in SL 2. Translating generalisations using bilingual MRDs and generalising them 3. Filtering suggestions down to what occurs in TLCorpora and tools used: Corpora and tools used Databases of multiword expressions IMS Corpus Workbench (Christ, Evert) Distributional similarity classes (Rapp) Oxford Russian Dictionary from OUPOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workStep 1: Generalising contexts: Step 1: Generalising contexts Distributional similarity list: Θ(s0) = s1, . . . , sN Simcluster S(s0) (words with intersecting similarity lists) ∀w ∈ S(s0) ⇔ w ∈ Θ(s0)&w ∈ ∪Θ(si ) strong ~ powerful, weak, strength, potent, heavy, good, overwhelming, intense, robust, tough, weaken, compelling, fierce experience ~ knowledge, opportunity, life, encounter, skill, feeling, reality, sensation, dream, vision, learning, perception, learnStep 2: Translating generalisations: Step 2: Translating generalisations Full translation class: TF = S(T(S(s0))) Reduced translation class: TR = S(T(s0)) + T(S(s0)) опыт (experience; experiment) ~ ability, acquire, aptitude, capability, capacity, competence, courage, evidence, experience, experiment, expertise, feasibility, flair, hypothesis, ingenuity, intelligence, investigation, knowledge, laboratory, learning, method, opportunity, perception, qualification, rat, research, skill, stamina, statistical, strength, study, talent, technique, test, training, vision.Step 3: finding MWEs in TL: Step 3: finding MWEs in TL Cartesian product of translation classes produced for words in the query Filtering them against MWEs really occurring in corpora четкая программа (‘precise programme’) ~ clear idea (486) detailed plan (247) right idea (123) detailed proposal (112) detailed work (109) detailed research (108) clear policy (88) clear strategy (83) clear plan (70) right policy (64) right strategy (52)Building the MWE database: Building the MWE database permissive vs. prudent filtering (Manning, Schütze, 1999) weapon~NN of~IN mass~JJ, Filter: ~IN ~JJ$An extension for single words: An extension for single words 1. Produce two sets of 5 best LL collocates for the immediate left and right contexts of the search expression 2. Produce TR classes for the search expression and its best collocates 3. Combine TR classes separately for the left and right context 4. Intersect the set of right collocates in the left class with the set of left collocates in the right classOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workQuestionnaire: QuestionnaireThe scoring system: The scoring system 5 = The suggestion is an appropriate translation as it is. 4 = The suggestion can be used with some minor amendment (e.g. by turning a verb into a participle) 3 = The suggestion is useful as a hint for another, appropriate translation (e.g. suggestion elated cannot be used, but its close synonym exhilarated can) 2 = The suggestion is not useful, even though it is still in the same domain (e.g. fear is proposed for a problem referring to hatred) 1 = The suggestion is totally irrelevantEquivalents for unseen cases: Equivalents for unseen cases Patrick West recently claimed that Britain’s extravagant mourning for Princess Diana and Holly and Jessica was ’recreational grief’. Maybe we also suffer from recreational fear. спортивный интерес (lit. ‘sports interest’, leisure interest) Some translators see more solutions in a context Not a competition with dictionaries, but solutions for genuinely difficult casesFuture work: Future work Disambiguation of simclasses: union ~ federation, strike, trade, worker, soviet, employer, organization, miner, communist, russia, republic, cosatu, confederation ASSIST semantic classes (232 categories): I1.1- = Money: lack; (bankrupt, beggar, impoverished, unpaid) A5.1- = Evaluation: bad (abject, abysmal, bastard, crap) Finding clusters for language pairs Methods from EBMT/SMT You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
lrec06 assist pres Connor Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 44 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 07, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Using collocations from comparable corpora to find translation equivalents: Using collocations from comparable corpora to find translation equivalents Serge Sharoff, Bogdan Babych, Anthony Hartley Centre for Translation Studies University of Leeds {s.sharoff,b.babych,a.hartley}@leeds.ac.ukOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workWhat are translation equivalents?: What are translation equivalents? Terminology: ignitron=Ignitron=игнитрон General lexicon: исчерпывающий ответ=irrefragable answer strong: 57 subentries in Oxford Russian Dictionary (ORD), but no strong feeling, field, opposition, sense, voice Parallel corpora are not always available: strong voice: 16 in Europarl vs. 46 in the BNC Comparable corpora for terminology: (Dagan, Church, 1997; Bennison, et al, 2000), but not for words from the general lexicon Comparable corpora for translators: absolutely vs. assolutamente, but not a procedure for finding equivalentsThe problems we address: The problems we address Hospital admission can prove a particularly daunting experience. I did all the cleaning, cooking and kept his books in order, which was no mean feat. The problem of finding a bridge between two comparable corpora Main steps 1. Generalising source contexts in SL 2. Translating generalisations using bilingual MRDs and generalising them 3. Filtering suggestions down to what occurs in TLCorpora and tools used: Corpora and tools used Databases of multiword expressions IMS Corpus Workbench (Christ, Evert) Distributional similarity classes (Rapp) Oxford Russian Dictionary from OUPOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workStep 1: Generalising contexts: Step 1: Generalising contexts Distributional similarity list: Θ(s0) = s1, . . . , sN Simcluster S(s0) (words with intersecting similarity lists) ∀w ∈ S(s0) ⇔ w ∈ Θ(s0)&w ∈ ∪Θ(si ) strong ~ powerful, weak, strength, potent, heavy, good, overwhelming, intense, robust, tough, weaken, compelling, fierce experience ~ knowledge, opportunity, life, encounter, skill, feeling, reality, sensation, dream, vision, learning, perception, learnStep 2: Translating generalisations: Step 2: Translating generalisations Full translation class: TF = S(T(S(s0))) Reduced translation class: TR = S(T(s0)) + T(S(s0)) опыт (experience; experiment) ~ ability, acquire, aptitude, capability, capacity, competence, courage, evidence, experience, experiment, expertise, feasibility, flair, hypothesis, ingenuity, intelligence, investigation, knowledge, laboratory, learning, method, opportunity, perception, qualification, rat, research, skill, stamina, statistical, strength, study, talent, technique, test, training, vision.Step 3: finding MWEs in TL: Step 3: finding MWEs in TL Cartesian product of translation classes produced for words in the query Filtering them against MWEs really occurring in corpora четкая программа (‘precise programme’) ~ clear idea (486) detailed plan (247) right idea (123) detailed proposal (112) detailed work (109) detailed research (108) clear policy (88) clear strategy (83) clear plan (70) right policy (64) right strategy (52)Building the MWE database: Building the MWE database permissive vs. prudent filtering (Manning, Schütze, 1999) weapon~NN of~IN mass~JJ, Filter: ~IN ~JJ$An extension for single words: An extension for single words 1. Produce two sets of 5 best LL collocates for the immediate left and right contexts of the search expression 2. Produce TR classes for the search expression and its best collocates 3. Combine TR classes separately for the left and right context 4. Intersect the set of right collocates in the left class with the set of left collocates in the right classOutline: Outline 1. Introduction Problems in finding translation equivalents Corpora and tools used 2. Methodology for finding translation equivalents Basic steps for collocations Construction of the MWE database An extension for single words 3. Results Evaluation Legitimate translation variation Future workQuestionnaire: QuestionnaireThe scoring system: The scoring system 5 = The suggestion is an appropriate translation as it is. 4 = The suggestion can be used with some minor amendment (e.g. by turning a verb into a participle) 3 = The suggestion is useful as a hint for another, appropriate translation (e.g. suggestion elated cannot be used, but its close synonym exhilarated can) 2 = The suggestion is not useful, even though it is still in the same domain (e.g. fear is proposed for a problem referring to hatred) 1 = The suggestion is totally irrelevantEquivalents for unseen cases: Equivalents for unseen cases Patrick West recently claimed that Britain’s extravagant mourning for Princess Diana and Holly and Jessica was ’recreational grief’. Maybe we also suffer from recreational fear. спортивный интерес (lit. ‘sports interest’, leisure interest) Some translators see more solutions in a context Not a competition with dictionaries, but solutions for genuinely difficult casesFuture work: Future work Disambiguation of simclasses: union ~ federation, strike, trade, worker, soviet, employer, organization, miner, communist, russia, republic, cosatu, confederation ASSIST semantic classes (232 categories): I1.1- = Money: lack; (bankrupt, beggar, impoverished, unpaid) A5.1- = Evaluation: bad (abject, abysmal, bastard, crap) Finding clusters for language pairs Methods from EBMT/SMT