logging in or signing up rim Herminia Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 431 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 10, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Constructing Bilingual Resources for Digital Libraries: Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University 2000.8.10Contents: Contents Introduction Bilingual resources bilingual dictionary bilingual corpus bilingual thesaurus Our experience bilingual dictionary bilingual corpus bilingual thesaurus SummaryIntroduction: Introduction What is the problem? language barrier at multilingual digital library. How to solve the problem? machine translation(MT) cross-language information retrieval(CLIR) Why bilingual resources? MT and CLIR are based on bilingual resources. What shall we do? constructing Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurusOverview: Overview DL DL language barrier bilingual resourcesSlide5: Bilingual Resources Bilingual dictionary Bilingual corpus Bilingual thesaurusBilingual Dictionary: Definition dictionary containing words and their translated words. Application field CLIR [Oard 98], [Fujii et al. 99], [Myaeng et al. 99] MT Utilization Bilingual Dictionary word “대기” bilingual dictionary “대기1” – “atmosphere” “대기2” – “waiting” translated words “atmosphere” “waiting” CLIR MT Bilingual Corpus (1): Bilingual Corpus (1) Definition comparable corpus a collection of similar texts in different languages parallel corpus a collection of texts which have been translated into one or more other language(s). Ex) Canadian Hansard corpus Application field CLIR [Yang et al. 98] MT Example-Based Machine Translation [Brown 96], [Murata et al. 99], [Shirai et al.97] [Turcato et al 99]Bilingual Corpus (2): Utilization Bilingual Corpus (2) translated words “대기” - “atmosphere” - “waiting” “오염” - “pollution” “대기 오염” “atmosphere pollution” ? “waiting pollution” ? CLIR MT bilingual corpus “the sources of atmosphere pollution may have a global, regional and local character.” “대기 오염의 원인은 전세계적, 국부적, 그리고 지역적인 특징을 가진다.” translated phrase “대기 오염” “atmosphere pollution” Bilingual Thesaurus (1): Bilingual Thesaurus (1) Definition a collection of words in two languages that are put into groups together according to connections between their meanings Ex) EuroWordNet Application field CLIR concept-based CLIR [Gonzalo et al. 98], [Gilarranz et al. 97]Bilingual Thesaurus (2): bilingual thesaurus {region, part} {atmosphere, 대기} {air} {inactivity} {wait,waiting, 대기} {pause} Utilization Bilingual Thesaurus (2) word “대기” CLIR word concept “region” “inactivity”Slide11: Our Experience Bilingual dictionary Bilingual corpus Bilingual thesaurusBilingual Dictionary: Bilingual Dictionary Korean-English bilingual dictionary size 2 million entries application person’s name “링컨” bilingual biographical dictionary “링컨” - “Lincoln” translated person’s name “Lincoln” CLIR MT Bilingual Corpus: Bilingual Corpus Korean-English bilingual corpus parallel corpus containing 250,000 words based on CES(Corpus Encoding Standard) Corpus construction tools corpus refining tools corpus annotating tools bilingual concordancerBilingual Thesaurus (1): Goal Constructing a Korean-English bilingual thesaurus Approach assigning Korean words to corresponding English words in WordNet Bilingual Thesaurus (1) {air} Korean word “대기” WordNet [ Korean-English bilingual thesaurus ] Bilingual Thesaurus (2): Bilingual Thesaurus (2) Current status of the task under constructionSummary: Summary Surmounting the language barrier using bilingual resources Korean-English bilingual resources Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurus Our experience Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurusreference(1): reference(1) [Oard 98] Douglas W. Oard, “A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval”, the Third Conference of the Association for Machine Translation in the Americas (AMTA), Philadelphia, PA, October, 1998. [Fujii et al. 99] Atsushi Fujii, Tetsuya Ishikawa, "Cross-Language Information Retrieval for Technical Documents", Proceedings of the joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.29-37, 1999. [Myaeng et al. 99] Sung Hyon Myaeng and Myung-gil Jang, "Complementing Dictionary-Based Query Translations with Corpus Statistics for Cross-Language IR", Machine Translation Summit VII, 1999.reference(2): reference(2) [Yang et al. 98] Yiming Yang, Jaime G. Carbonell, Ralf D. Brown, and Robert E.F rederking. "Translingual Information Retrieval: Learning from Bilingual Corpora", In Artificial Intelligence, Special issue: Best of IJCAI-97). Vol. 103 (1998), pp. 323-345 [Brown 96] Ralf D. Brown, “Example-Based Machine Translation in the Pangloss System”, In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp.169-174, Copenhagen, Denmark, August 5-9, 1996. [Murata et al. 99] Murata, M, Q. Ma, K.Uchimoto, H. Isahara, "An Example-Based Approach to Japanese-to-English Translation of Tense, Aspect, and Modality", in TMI'99, Chester, UK, August 23, 1999.reference(3): reference(3) [Shirai et al. 97] Shirai, S., F. Bond, and Y. Takahashi. 1997. “A Hybrid Rule and Example based Method for Machine Translation.” In Natural Language Processing Pacific Rim Symposium '97: NLPRS-97. [Turcato et al. 99] Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole, "A Unified Example-Based and Lexicalist Approach to Machine Translation", at the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99) [Gonzalo et al. 98] Julio Gonzalo, Felisa Verdejo, Carol Peters and Nicoletta Calzolari, “Applying EuroWordNet to Cross-Language Text Retrieval”, Computers and the Humanities, Vol 32, Nos. 2-3, pp. 73-89, 1998.reference(4): reference(4) [Gilarranz et al. 97] Julio Gilarranz, Julio Gonzalo and Felisa Verdejo, "An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database", AAAI 97. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
rim Herminia Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 431 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 10, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Constructing Bilingual Resources for Digital Libraries: Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University 2000.8.10Contents: Contents Introduction Bilingual resources bilingual dictionary bilingual corpus bilingual thesaurus Our experience bilingual dictionary bilingual corpus bilingual thesaurus SummaryIntroduction: Introduction What is the problem? language barrier at multilingual digital library. How to solve the problem? machine translation(MT) cross-language information retrieval(CLIR) Why bilingual resources? MT and CLIR are based on bilingual resources. What shall we do? constructing Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurusOverview: Overview DL DL language barrier bilingual resourcesSlide5: Bilingual Resources Bilingual dictionary Bilingual corpus Bilingual thesaurusBilingual Dictionary: Definition dictionary containing words and their translated words. Application field CLIR [Oard 98], [Fujii et al. 99], [Myaeng et al. 99] MT Utilization Bilingual Dictionary word “대기” bilingual dictionary “대기1” – “atmosphere” “대기2” – “waiting” translated words “atmosphere” “waiting” CLIR MT Bilingual Corpus (1): Bilingual Corpus (1) Definition comparable corpus a collection of similar texts in different languages parallel corpus a collection of texts which have been translated into one or more other language(s). Ex) Canadian Hansard corpus Application field CLIR [Yang et al. 98] MT Example-Based Machine Translation [Brown 96], [Murata et al. 99], [Shirai et al.97] [Turcato et al 99]Bilingual Corpus (2): Utilization Bilingual Corpus (2) translated words “대기” - “atmosphere” - “waiting” “오염” - “pollution” “대기 오염” “atmosphere pollution” ? “waiting pollution” ? CLIR MT bilingual corpus “the sources of atmosphere pollution may have a global, regional and local character.” “대기 오염의 원인은 전세계적, 국부적, 그리고 지역적인 특징을 가진다.” translated phrase “대기 오염” “atmosphere pollution” Bilingual Thesaurus (1): Bilingual Thesaurus (1) Definition a collection of words in two languages that are put into groups together according to connections between their meanings Ex) EuroWordNet Application field CLIR concept-based CLIR [Gonzalo et al. 98], [Gilarranz et al. 97]Bilingual Thesaurus (2): bilingual thesaurus {region, part} {atmosphere, 대기} {air} {inactivity} {wait,waiting, 대기} {pause} Utilization Bilingual Thesaurus (2) word “대기” CLIR word concept “region” “inactivity”Slide11: Our Experience Bilingual dictionary Bilingual corpus Bilingual thesaurusBilingual Dictionary: Bilingual Dictionary Korean-English bilingual dictionary size 2 million entries application person’s name “링컨” bilingual biographical dictionary “링컨” - “Lincoln” translated person’s name “Lincoln” CLIR MT Bilingual Corpus: Bilingual Corpus Korean-English bilingual corpus parallel corpus containing 250,000 words based on CES(Corpus Encoding Standard) Corpus construction tools corpus refining tools corpus annotating tools bilingual concordancerBilingual Thesaurus (1): Goal Constructing a Korean-English bilingual thesaurus Approach assigning Korean words to corresponding English words in WordNet Bilingual Thesaurus (1) {air} Korean word “대기” WordNet [ Korean-English bilingual thesaurus ] Bilingual Thesaurus (2): Bilingual Thesaurus (2) Current status of the task under constructionSummary: Summary Surmounting the language barrier using bilingual resources Korean-English bilingual resources Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurus Our experience Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurusreference(1): reference(1) [Oard 98] Douglas W. Oard, “A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval”, the Third Conference of the Association for Machine Translation in the Americas (AMTA), Philadelphia, PA, October, 1998. [Fujii et al. 99] Atsushi Fujii, Tetsuya Ishikawa, "Cross-Language Information Retrieval for Technical Documents", Proceedings of the joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.29-37, 1999. [Myaeng et al. 99] Sung Hyon Myaeng and Myung-gil Jang, "Complementing Dictionary-Based Query Translations with Corpus Statistics for Cross-Language IR", Machine Translation Summit VII, 1999.reference(2): reference(2) [Yang et al. 98] Yiming Yang, Jaime G. Carbonell, Ralf D. Brown, and Robert E.F rederking. "Translingual Information Retrieval: Learning from Bilingual Corpora", In Artificial Intelligence, Special issue: Best of IJCAI-97). Vol. 103 (1998), pp. 323-345 [Brown 96] Ralf D. Brown, “Example-Based Machine Translation in the Pangloss System”, In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp.169-174, Copenhagen, Denmark, August 5-9, 1996. [Murata et al. 99] Murata, M, Q. Ma, K.Uchimoto, H. Isahara, "An Example-Based Approach to Japanese-to-English Translation of Tense, Aspect, and Modality", in TMI'99, Chester, UK, August 23, 1999.reference(3): reference(3) [Shirai et al. 97] Shirai, S., F. Bond, and Y. Takahashi. 1997. “A Hybrid Rule and Example based Method for Machine Translation.” In Natural Language Processing Pacific Rim Symposium '97: NLPRS-97. [Turcato et al. 99] Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole, "A Unified Example-Based and Lexicalist Approach to Machine Translation", at the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99) [Gonzalo et al. 98] Julio Gonzalo, Felisa Verdejo, Carol Peters and Nicoletta Calzolari, “Applying EuroWordNet to Cross-Language Text Retrieval”, Computers and the Humanities, Vol 32, Nos. 2-3, pp. 73-89, 1998.reference(4): reference(4) [Gilarranz et al. 97] Julio Gilarranz, Julio Gonzalo and Felisa Verdejo, "An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database", AAAI 97.