logging in or signing up MultiMatchBerlinAmato aSGuest9618 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 5 Category: Business & Fin.. License: All Rights Reserved Like it (0) Dislike it (0) Added: January 08, 2009 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript MULTIMATCH? Vertical Search Engine for European Cultural Heritage : MULTIMATCH? Vertical Search Engine for European Cultural Heritage Giuseppe Amato ISTI – CNR giuseppe.amato@isti.cnr.it http://www.multimatch.org MultiMatch: The Initial Idea : MultiMatch: The Initial Idea Problem: The Web contains a wealth of fragmented CH information stored in web pages as well as in Digital Libraries managed by cultural institutions. This information is dispersed and fragmented Users are left to discover, interpret, and aggregate it. A user perspective : A user perspective Information needed is likely to be available on the Internet but to access it language, media, and source boundaries need to be overcome Key ideas underlying the MultiMatch vision Multiplicity Queries and retrieved items in multiple languages Queries and retrieved items in multiple media Items selected from multiple sources Aggregation Presentation of aggregated results Relationships between retrieved items Personalization What is supported by existing systems? : What is supported by existing systems? Google-like search Pros Large content-base High recall Cons Low precision No multilingual support Single media search No relationship between items Access to Specialized CH sites Very precise results but limited to a single site Results are presented in an attractive way, but cannot be adapted to specific user requests MultiMatch Objectives : MultiMatch Objectives Develop a search engine that provides targeted, enriched access to heterogeneous CH objects Across all media types and language boundaries Gathering information from the web and from CH sites and digital libraries Supporting various user classes and offering personalized access with aggregate views on complex task scenarios Assist CH institutions to raise visibility and disseminate content Slide 6: crawling acquisition What MM intends to achieve : What MM intends to achieve MultiMatch is developing a Web search engine specialized in the cultural heritage domain. Queries can be expressed in multiple languages Multilingual and multimedia retrieval Access to multiple sources of information Web sites containing authoritative cultural heritage data (e.g. museums, cultural institutions, CH educational sites, tourist information portals) IPR protected CH material, provided by content providers (e.g. Alinari, Sound and Vision, Biblioteca Virtual de Cervantes) Other OAI compliant resources Slide 8: Default search functionality The Project : The Project VI FP Project Technology-enhanced Learning and Access to Cultural Heritage Started: May 2006 First prototype: August 2007 Second Prototype: July 2008 Evaluation and field trials: October 2008 MultiMatch Partners : MultiMatch Partners Cultural Heritage Alinari 24Ore (Alinari) Netherlands Institute for Sound and Vision(Sound and Vision) University of Alicante – Biblioteca Miguel de Cervantes (UA-BVMC) Industry OCLC PICA (FDI) WIND Telecomunicazioni S.p.A. (WIND) Academia Istituto di Scienza e Tecnologie dell’Informazione (ISTI-CNR) University of Sheffield (USFD) Dublin City University (DCU) University of Amsterdam (UvA) University of Geneva (UniGE) Universidad Nacional de Educación a Distancia (UNED) Main Research Challenges : Main Research Challenges From document to complex objects retrieval Focused crawling for acquisition of CH-related information from heterogeneous MM resources CH concept and relation extraction using information extraction and text mining techniques Multimedia search and mixed media search Multilingual management with support for query formulation, cross-language retrieval and summarization Integration and representation of related objects Presentation of aggregated search results User support (e.g. search history, annotation facilities, personalised presentation of results, etc.) Metadata modelling and interoperability : Metadata modelling and interoperability Definition of a MultiMatch conceptual reference framework suitable for Cultural Heritage Metadata schema Thesauri for artists names and descriptions, geospatial information Simple to use (interoperability) Suitable to the application of automatic population techniques Indexing and information extraction : Indexing and information extraction Automatic extraction of indexing features for all media (text, speech, images, video) and crawled data Text indexing for four languages (Italian, Spanish, Dutch, and English, plus others) Automatic generation of inter-document links Development of algorithms for classification and information extraction Creators, type/genre, subject, place/time, art objects/works Semantic enrichment of documents Linking information sources Thesauri from outside the system Documents within the system Multilingual support : Multilingual support Provide system with monolingual and multilingual search functionalities (initially four languages, to be extended) Provide effective translation strategies e.g. multilingual dictionaries, machine translation, thesaurus term expansion Multilingual query expansion Dynamic summarization Multimedia search : Multimedia search Similarity search based on visual features (low level and high level – faces, objects, etc.) Efficient retrieval Support of relevance feedback and interactive search Combined text and visual search User interaction and interface design : User interaction and interface design User-centred design process Evaluate and refine the interface based on empirical evaluation and usability testing Interface supporting multilingual and multimedia search Default search on all types of content Specialized search on metadata fields and on different media Use of semantic structures for search and browse Project achievements : Project achievements First project phase completed User requirements analysis MultiMatch conceptual reference framework for Cultural Heritage First prototype of the MultiMatch search engine Baseline for experimentation, user feedback and second prototype development Limited content Based on manually associated semantic concepts No relationship between retrieved objects Research issues for the second project phase : Research issues for the second project phase Automatic document labeling, enrichment and linking and automatic document classification Combination of cross media query results Translation and summarization of results Innovative User Interface Multimedia search and browse Presentation of video material Multilingual interaction Optimized search and browse based on semantic structures Conclusions : Conclusions Further evolutions of MultiMatch Use of MultiMatch technologies to build large scale Digital Libraries and a large scale search engine specialized for Cultural Heritage Enlarge the content base through the access to a complete set of CH sites and crawling of a significant part of the web Enlarge the number of languages managed (possibly to cover all EU languages) Invest on system efficiency, system scalability, and robustness Conclusions : Conclusions Significant investment in dissemination of project results, in order to attract cultural heritage institutions Dissemination of MultiMatch technology into other related application fields Further investment on the key research topics addressed in MultiMatch Further information : Further information MultiMatch Web site http://www.multimatch.org Slide 22: Author search: Classified Web page results Relations between authors : Relations between authors Relations between Author and works of art : Relations between Author and works of art Author and related CH sites : Author and related CH sites Specialized image search : Specialized image search Multilingual Search (1) : Multilingual Search (1) Multilingual Search (2) : Multilingual Search (2) Multilingual Search (3) : Multilingual Search (3) Multilingual Search (4) : Multilingual Search (4) You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
MultiMatchBerlinAmato aSGuest9618 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 5 Category: Business & Fin.. License: All Rights Reserved Like it (0) Dislike it (0) Added: January 08, 2009 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript MULTIMATCH? Vertical Search Engine for European Cultural Heritage : MULTIMATCH? Vertical Search Engine for European Cultural Heritage Giuseppe Amato ISTI – CNR giuseppe.amato@isti.cnr.it http://www.multimatch.org MultiMatch: The Initial Idea : MultiMatch: The Initial Idea Problem: The Web contains a wealth of fragmented CH information stored in web pages as well as in Digital Libraries managed by cultural institutions. This information is dispersed and fragmented Users are left to discover, interpret, and aggregate it. A user perspective : A user perspective Information needed is likely to be available on the Internet but to access it language, media, and source boundaries need to be overcome Key ideas underlying the MultiMatch vision Multiplicity Queries and retrieved items in multiple languages Queries and retrieved items in multiple media Items selected from multiple sources Aggregation Presentation of aggregated results Relationships between retrieved items Personalization What is supported by existing systems? : What is supported by existing systems? Google-like search Pros Large content-base High recall Cons Low precision No multilingual support Single media search No relationship between items Access to Specialized CH sites Very precise results but limited to a single site Results are presented in an attractive way, but cannot be adapted to specific user requests MultiMatch Objectives : MultiMatch Objectives Develop a search engine that provides targeted, enriched access to heterogeneous CH objects Across all media types and language boundaries Gathering information from the web and from CH sites and digital libraries Supporting various user classes and offering personalized access with aggregate views on complex task scenarios Assist CH institutions to raise visibility and disseminate content Slide 6: crawling acquisition What MM intends to achieve : What MM intends to achieve MultiMatch is developing a Web search engine specialized in the cultural heritage domain. Queries can be expressed in multiple languages Multilingual and multimedia retrieval Access to multiple sources of information Web sites containing authoritative cultural heritage data (e.g. museums, cultural institutions, CH educational sites, tourist information portals) IPR protected CH material, provided by content providers (e.g. Alinari, Sound and Vision, Biblioteca Virtual de Cervantes) Other OAI compliant resources Slide 8: Default search functionality The Project : The Project VI FP Project Technology-enhanced Learning and Access to Cultural Heritage Started: May 2006 First prototype: August 2007 Second Prototype: July 2008 Evaluation and field trials: October 2008 MultiMatch Partners : MultiMatch Partners Cultural Heritage Alinari 24Ore (Alinari) Netherlands Institute for Sound and Vision(Sound and Vision) University of Alicante – Biblioteca Miguel de Cervantes (UA-BVMC) Industry OCLC PICA (FDI) WIND Telecomunicazioni S.p.A. (WIND) Academia Istituto di Scienza e Tecnologie dell’Informazione (ISTI-CNR) University of Sheffield (USFD) Dublin City University (DCU) University of Amsterdam (UvA) University of Geneva (UniGE) Universidad Nacional de Educación a Distancia (UNED) Main Research Challenges : Main Research Challenges From document to complex objects retrieval Focused crawling for acquisition of CH-related information from heterogeneous MM resources CH concept and relation extraction using information extraction and text mining techniques Multimedia search and mixed media search Multilingual management with support for query formulation, cross-language retrieval and summarization Integration and representation of related objects Presentation of aggregated search results User support (e.g. search history, annotation facilities, personalised presentation of results, etc.) Metadata modelling and interoperability : Metadata modelling and interoperability Definition of a MultiMatch conceptual reference framework suitable for Cultural Heritage Metadata schema Thesauri for artists names and descriptions, geospatial information Simple to use (interoperability) Suitable to the application of automatic population techniques Indexing and information extraction : Indexing and information extraction Automatic extraction of indexing features for all media (text, speech, images, video) and crawled data Text indexing for four languages (Italian, Spanish, Dutch, and English, plus others) Automatic generation of inter-document links Development of algorithms for classification and information extraction Creators, type/genre, subject, place/time, art objects/works Semantic enrichment of documents Linking information sources Thesauri from outside the system Documents within the system Multilingual support : Multilingual support Provide system with monolingual and multilingual search functionalities (initially four languages, to be extended) Provide effective translation strategies e.g. multilingual dictionaries, machine translation, thesaurus term expansion Multilingual query expansion Dynamic summarization Multimedia search : Multimedia search Similarity search based on visual features (low level and high level – faces, objects, etc.) Efficient retrieval Support of relevance feedback and interactive search Combined text and visual search User interaction and interface design : User interaction and interface design User-centred design process Evaluate and refine the interface based on empirical evaluation and usability testing Interface supporting multilingual and multimedia search Default search on all types of content Specialized search on metadata fields and on different media Use of semantic structures for search and browse Project achievements : Project achievements First project phase completed User requirements analysis MultiMatch conceptual reference framework for Cultural Heritage First prototype of the MultiMatch search engine Baseline for experimentation, user feedback and second prototype development Limited content Based on manually associated semantic concepts No relationship between retrieved objects Research issues for the second project phase : Research issues for the second project phase Automatic document labeling, enrichment and linking and automatic document classification Combination of cross media query results Translation and summarization of results Innovative User Interface Multimedia search and browse Presentation of video material Multilingual interaction Optimized search and browse based on semantic structures Conclusions : Conclusions Further evolutions of MultiMatch Use of MultiMatch technologies to build large scale Digital Libraries and a large scale search engine specialized for Cultural Heritage Enlarge the content base through the access to a complete set of CH sites and crawling of a significant part of the web Enlarge the number of languages managed (possibly to cover all EU languages) Invest on system efficiency, system scalability, and robustness Conclusions : Conclusions Significant investment in dissemination of project results, in order to attract cultural heritage institutions Dissemination of MultiMatch technology into other related application fields Further investment on the key research topics addressed in MultiMatch Further information : Further information MultiMatch Web site http://www.multimatch.org Slide 22: Author search: Classified Web page results Relations between authors : Relations between authors Relations between Author and works of art : Relations between Author and works of art Author and related CH sites : Author and related CH sites Specialized image search : Specialized image search Multilingual Search (1) : Multilingual Search (1) Multilingual Search (2) : Multilingual Search (2) Multilingual Search (3) : Multilingual Search (3) Multilingual Search (4) : Multilingual Search (4)