rim

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Constructing Bilingual Resources for Digital Libraries: 

Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University 2000.8.10

Contents: 

Contents Introduction Bilingual resources bilingual dictionary bilingual corpus bilingual thesaurus Our experience bilingual dictionary bilingual corpus bilingual thesaurus Summary

Introduction: 

Introduction What is the problem? language barrier at multilingual digital library. How to solve the problem? machine translation(MT) cross-language information retrieval(CLIR) Why bilingual resources? MT and CLIR are based on bilingual resources. What shall we do? constructing Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurus

Overview: 

Overview DL DL language barrier bilingual resources

Slide5: 

Bilingual Resources Bilingual dictionary Bilingual corpus Bilingual thesaurus

Bilingual Dictionary: 

Definition dictionary containing words and their translated words. Application field CLIR [Oard 98], [Fujii et al. 99], [Myaeng et al. 99] MT Utilization Bilingual Dictionary word “대기” bilingual dictionary “대기1” – “atmosphere” “대기2” – “waiting” translated words “atmosphere” “waiting” CLIR MT

Bilingual Corpus (1): 

Bilingual Corpus (1) Definition comparable corpus a collection of similar texts in different languages parallel corpus a collection of texts which have been translated into one or more other language(s). Ex) Canadian Hansard corpus Application field CLIR [Yang et al. 98] MT Example-Based Machine Translation [Brown 96], [Murata et al. 99], [Shirai et al.97] [Turcato et al 99]

Bilingual Corpus (2): 

Utilization Bilingual Corpus (2) translated words “대기” - “atmosphere” - “waiting” “오염” - “pollution” “대기 오염” “atmosphere pollution” ? “waiting pollution” ? CLIR MT bilingual corpus “the sources of atmosphere pollution may have a global, regional and local character.” “대기 오염의 원인은 전세계적, 국부적, 그리고 지역적인 특징을 가진다.” translated phrase “대기 오염” “atmosphere pollution”

Bilingual Thesaurus (1): 

Bilingual Thesaurus (1) Definition a collection of words in two languages that are put into groups together according to connections between their meanings Ex) EuroWordNet Application field CLIR concept-based CLIR [Gonzalo et al. 98], [Gilarranz et al. 97]

Bilingual Thesaurus (2): 

bilingual thesaurus {region, part} {atmosphere, 대기} {air} {inactivity} {wait,waiting, 대기} {pause} Utilization Bilingual Thesaurus (2) word “대기” CLIR word concept “region” “inactivity”

Slide11: 

Our Experience Bilingual dictionary Bilingual corpus Bilingual thesaurus

Bilingual Dictionary: 

Bilingual Dictionary Korean-English bilingual dictionary size 2 million entries application person’s name “링컨” bilingual biographical dictionary “링컨” - “Lincoln” translated person’s name “Lincoln” CLIR MT

Bilingual Corpus: 

Bilingual Corpus Korean-English bilingual corpus parallel corpus containing 250,000 words based on CES(Corpus Encoding Standard) Corpus construction tools corpus refining tools corpus annotating tools bilingual concordancer

Bilingual Thesaurus (1): 

Goal Constructing a Korean-English bilingual thesaurus Approach assigning Korean words to corresponding English words in WordNet Bilingual Thesaurus (1) {air} Korean word “대기” WordNet [ Korean-English bilingual thesaurus ]

Bilingual Thesaurus (2): 

Bilingual Thesaurus (2) Current status of the task under construction

Summary: 

Summary Surmounting the language barrier using bilingual resources Korean-English bilingual resources Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurus Our experience Korean-English bilingual dictionary Korean-English bilingual corpus Korean-English bilingual thesaurus

reference(1): 

reference(1) [Oard 98] Douglas W. Oard, “A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval”, the Third Conference of the Association for Machine Translation in the Americas (AMTA), Philadelphia, PA, October, 1998. [Fujii et al. 99] Atsushi Fujii, Tetsuya Ishikawa, "Cross-Language Information Retrieval for Technical Documents", Proceedings of the joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.29-37, 1999. [Myaeng et al. 99] Sung Hyon Myaeng and Myung-gil Jang, "Complementing Dictionary-Based Query Translations with Corpus Statistics for Cross-Language IR", Machine Translation Summit VII, 1999.

reference(2): 

reference(2) [Yang et al. 98] Yiming Yang, Jaime G. Carbonell, Ralf D. Brown, and Robert E.F rederking. "Translingual Information Retrieval: Learning from Bilingual Corpora", In Artificial Intelligence, Special issue: Best of IJCAI-97). Vol. 103 (1998), pp. 323-345 [Brown 96] Ralf D. Brown, “Example-Based Machine Translation in the Pangloss System”, In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp.169-174, Copenhagen, Denmark, August 5-9, 1996. [Murata et al. 99] Murata, M, Q. Ma, K.Uchimoto, H. Isahara, "An Example-Based Approach to Japanese-to-English Translation of Tense, Aspect, and Modality", in TMI'99, Chester, UK, August 23, 1999.

reference(3): 

reference(3) [Shirai et al. 97] Shirai, S., F. Bond, and Y. Takahashi. 1997. “A Hybrid Rule and Example based Method for Machine Translation.” In Natural Language Processing Pacific Rim Symposium '97: NLPRS-97. [Turcato et al. 99] Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole, "A Unified Example-Based and Lexicalist Approach to Machine Translation", at the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99) [Gonzalo et al. 98] Julio Gonzalo, Felisa Verdejo, Carol Peters and Nicoletta Calzolari, “Applying EuroWordNet to Cross-Language Text Retrieval”, Computers and the Humanities, Vol 32, Nos. 2-3, pp. 73-89, 1998.

reference(4): 

reference(4) [Gilarranz et al. 97] Julio Gilarranz, Julio Gonzalo and Felisa Verdejo, "An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database", AAAI 97.