No description available.
Multilingual Information Accessfor Digital Libraries: Multilingual Information Access for Digital Libraries Carol Peters ISTI-CNR, PisaMLIA - The Problem: MLIA - The Problem Increasing pressure for access to information without language or cultural barriers means there is a strong demand to be able to: Find information in foreign languages Read and interpret that information Merge it with information in other languages Need for Multilingual Information Access Global Information Society: Global Information Society WWW as platform for knowledge dissemination Distance Learning….. Digital Libraries….. information providers and seekers should have equal opportunities preservation of national languagesWWW and Internet: WWW and Internet Internet is no longer monolingual and non-English content is growing rapidly User profile has changed radically From primarily academic use to widespread commercial, leisure, educational, entertainment etc. usesSlide5: 78% of Internet Users will be Non-English Speaking by 2005 Confidential, unpublished information Manning & Napier Information Services 2000Evolution of non-English speaking population: Evolution of non-English speaking populationSlide8: Widely Spoken Languages What is MLIA?: What is MLIA? MLIA related research regards the storage, access, retrieval and presentation of information in any of the world's languages. Two main areas of interest: multiple language access, browsing, display cross-language information discovery and retrievalMulti-Language Access, Browsing, Display: Multi-Language Access, Browsing, Display The enabling technology: character encoding specific requirements of particular languages and scripts localization and presentation Cross-Language Information Retrieval: Cross-Language Information Retrieval Crossing the language barrier… querying of multilingual collection in one language against documents in many other languages… filtering, selecting, ranking retrieved documents presenting retrieved information in an interpretable and exploitable fashion MLIA and Digital Libraries: MLIA and Digital Libraries The neglected problem! It’s hard! It’s resource demanding! There are other issues to solve! BUT – everyone agrees - it’s important!MLIA is Resource Demanding: MLIA is Resource Demanding Multilingual Portals How many languages / how many levels should be multilingual / how to handle updates Monolingual Search for Multiple Languages Encoding and representation issues / indexing issues (stop words, stemmers, morphological analysers ..) Cross-Language Search translation resources (dictionaries, corpora, MT systems) Presentation of Results in form exploitable by user Digital Library Projects in 5FP: Digital Library Projects in 5FP 14 projects contained collections in multiple languages 4 had not considered any kind of multiple language processing (all text=English) 10 monolingual retrieval functionality for all languages 1 had implemented cross-browsing of collections using common metadata schema 6 had some kind of basic cross-language functionality: 5 used multilingual controlled vocabulary / thesaurus 1 used bilingual dictionary search 1 used pseudo relevance feedback (in addition to thesaurus) 1 proposed using similarity search (in addition to controlled vocab) ETRDL: ETRDL multilingual interfaces (6 languages) choice of interface language select language of document collection multiple language text processing SCHOLNET: SCHOLNET ETRDL plus cross-language search functionality Multilingual thesaurus mechanisms for thesaurus maintenance and update Free-text search on abstracts via pseudo-relevance feedbackECHO: ECHO Film archives in 4 languages cross-language search via controlled vocabulary experimental corpus-based approach on speech recognition outputSlide18: MUCHMORE Project for CLIR in Medical DomainDELOS supports CLEF: DELOS supports CLEF Cross-Language Evaluation Forum mono-, bi- and multilingual textual document retrieval on news collections (Ad Hoc) mono- and cross-language information on structured scientific data (Domain-Specific) interactive cross-language retrieval (iCLEF) multiple language question answering (QA@CLEF) cross-language retrieval in image collections (ImageCLEF) cross-language spoken document retrieval (CL-SR) multilingual retrieval of Web documents (WebCLEF) cross-language geographical retrieval (GeoCLEF) The Challenge: The Challenge Bridge the Gap between research and application Transfer research results to real world Make existing resources and methodology generally available Raise awareness What should we have now: What should we have now Multilingual portals Support monolingual search in multiple languages Character encoding issues / stopword lists / stemmers / morphological analysers Support simple cross-language search multilingual metadata interlingua or pivot language thesauri for domain-specific searchExisting DL Software Systems: Existing DL Software Systems Some kind of multilingual support D-Space Greenstone Open-DLib NSDL Cross-language functionality CheshireCheshire Interface: Cheshire InterfaceDigital Library Projects in 6FP: Digital Library Projects in 6FP DILIGENT (DL infrastrucutre on GRID) BRICKS (DLMS for CH) Translation manager Dictionary-based translation Accepts any language (in theory) Query translated to language of collections being searched Interactive stage Metadata search Results translated into user preferred language The Future: A Targeted Multilingual/Multimedia Search Engine: The Future: A Targeted Multilingual/Multimedia Search Engine Problem: The Web contains a wealth of fragmented CH information but users are left to discover, interpret and aggregate it. Objective: Provide targeted, enriched access to heterogeneous CH objects across all media types and language boundaries supporting various user classes with aggregate views on complex task scenarios Assist CH institutions to raise visibility and disseminate content Challenges: From document to complex objects retrieval CH concept and relation extraction integration and representation of related objects presentation of aggregate search results focused crawling for acquisition of CH-related information from heterogeneous MM resources Slide27: MULTI MATCH Museums Databases Web Resources: Museums Libraries Archives Newspapers Newsagencies Personal Pages Blogs crawling acquisition His life (1853-1890), … Paintings Other expressionists Exhibitions Milano, … Critical reviews Van GoghIn the Meantime: In the Meantime Recognise the problem – do as much as you can – keep it simple and remember that MLIA = Interoperability Standards Unicode (http://www.unicode.org/) Multilingual Dublin Core http://dublincore.org/groups/languages/ RDF Encoding of Multilingual Thesauri http://www.w3.org/2001/sw/Europe/reports/thes/8.3 OWL (Web Ontology Language) http://www.w3.org/TR/2004/REC-owl-features-20040210/
Carol Misseld..
By: Techy_Gu..
Lakes North S..
By: Natalia
QUM HMR Peter..
By: Marcell
Carol Powerpo..
By: MarcM
CaRoL
By: patroner..
Carol aguirre
By: caritolo..
Carol Aguirre..
Carol
By: aSGuest8..
chile
By: zaski
newslink volk..
By: Yuan
logging in or signing up