logging in or signing up clef2007 ds overview funnyside Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 26 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 04, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest, September 19, 2007 Slide2: Outline The Domain-Specific Task Collections & Controlled Vocabularies Topics Participants, Runs & Relevance Assessments Themes Summary & OutlookSlide3: The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage bibliographic metadata & controlled vocabularies for: search translation Slide4: The Domain-Specific Task Tasks: Monolingual against German, English or Russian Bilingual against German, English or Russian Multilingual against combined collection Slide5: CollectionsSlide6: Controlled Vocabularies 5 different subject-describing terminologies: Thesaurus for the Social Sciences (GIRT-DE, -EN) Thesaurus of Sociological Indexing Terms (CSA-SA) INION Thesaurus (ISISS) Social Sciences Classification (GIRT-DE, -EN) Sociological Abstracts Classification (CSA-SA)Slide7: Controlled Vocabularies – Mapping Tools Translation: GIRT German GIRT English Intellectual term mappings (cross-walks): equivalent terms in vocabularies GIRT German CSA-SA English GIRT English CSA-SA English original-term: agricultural area mapped-term: Rural areas Slide8: Topics 25 topics in standard TREC format (title, desc, narr): 15 volunteers (social scientists) 2-5 suggestions from 28 subject specialties checked for: coverage in collections variance from previous years translated into English, RussianSlide9: Participants 5 groups Slide10: Runs Slide11: Relevance Assessments * In Russian collection: 3 topics without relevant topics All assessments done with Univ. of Padova‘s DIRECT System.Slide12: Relevance Assessments – Best MAP Slide13: Themes - Retrieval models Lucene Language Modelling Logistic Regression Comparison: Vector Space, LM, Probabilistic - Okapi, DFR Data fusion Russian word-based vs. N-gram retrieval new light-weight stemmerSlide14: Themes – Query Expansion Entry Vocabulary Modules query terms associated with thesaurus terms from documents Thesaurus Lookup combined thesaurus from all CVs GIRT Thesaurus Index Lexical Entailment find document terms in relation to query terms Blind Feedback Slide15: Themes – Translation Lucene plug-in Babelfish, Google, PROMT, Reverso Bilingual thesaurus mapping Dictionary adaption disambiguate term translation given language context of feedback documents Statistical machine translation MATRAX Commercial SoftwareSlide16: Summary & Outlook Extension of Russian materials Translation table DE-EN-RU for GIRT Thesaurus Translation table RU-EN for INION Thesaurus Mapping between GIRT – INION Thesaurus More tools for Terminology mapping different relationships (0T, SYN, BT, NT, RT) GESIS-IZ project: > 40 mappings 25 controlled vocabularies / 11 disciplines ~ 125,000 terms & phrases ~ 400,000 relationsSlide17: Domain-Specific Track: http://www.gesis.org/en/research/ information_technology/clef_ds_2007.htm Vocabulary Mappings: http://www.gesis.org/en/research/ information_technology/komohe.htm Email: vivien.petras@gesis.org You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
clef2007 ds overview funnyside Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 26 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 04, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest, September 19, 2007 Slide2: Outline The Domain-Specific Task Collections & Controlled Vocabularies Topics Participants, Runs & Relevance Assessments Themes Summary & OutlookSlide3: The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage bibliographic metadata & controlled vocabularies for: search translation Slide4: The Domain-Specific Task Tasks: Monolingual against German, English or Russian Bilingual against German, English or Russian Multilingual against combined collection Slide5: CollectionsSlide6: Controlled Vocabularies 5 different subject-describing terminologies: Thesaurus for the Social Sciences (GIRT-DE, -EN) Thesaurus of Sociological Indexing Terms (CSA-SA) INION Thesaurus (ISISS) Social Sciences Classification (GIRT-DE, -EN) Sociological Abstracts Classification (CSA-SA)Slide7: Controlled Vocabularies – Mapping Tools Translation: GIRT German GIRT English Intellectual term mappings (cross-walks): equivalent terms in vocabularies GIRT German CSA-SA English GIRT English CSA-SA English original-term: agricultural area mapped-term: Rural areas Slide8: Topics 25 topics in standard TREC format (title, desc, narr): 15 volunteers (social scientists) 2-5 suggestions from 28 subject specialties checked for: coverage in collections variance from previous years translated into English, RussianSlide9: Participants 5 groups Slide10: Runs Slide11: Relevance Assessments * In Russian collection: 3 topics without relevant topics All assessments done with Univ. of Padova‘s DIRECT System.Slide12: Relevance Assessments – Best MAP Slide13: Themes - Retrieval models Lucene Language Modelling Logistic Regression Comparison: Vector Space, LM, Probabilistic - Okapi, DFR Data fusion Russian word-based vs. N-gram retrieval new light-weight stemmerSlide14: Themes – Query Expansion Entry Vocabulary Modules query terms associated with thesaurus terms from documents Thesaurus Lookup combined thesaurus from all CVs GIRT Thesaurus Index Lexical Entailment find document terms in relation to query terms Blind Feedback Slide15: Themes – Translation Lucene plug-in Babelfish, Google, PROMT, Reverso Bilingual thesaurus mapping Dictionary adaption disambiguate term translation given language context of feedback documents Statistical machine translation MATRAX Commercial SoftwareSlide16: Summary & Outlook Extension of Russian materials Translation table DE-EN-RU for GIRT Thesaurus Translation table RU-EN for INION Thesaurus Mapping between GIRT – INION Thesaurus More tools for Terminology mapping different relationships (0T, SYN, BT, NT, RT) GESIS-IZ project: > 40 mappings 25 controlled vocabularies / 11 disciplines ~ 125,000 terms & phrases ~ 400,000 relationsSlide17: Domain-Specific Track: http://www.gesis.org/en/research/ information_technology/clef_ds_2007.htm Vocabulary Mappings: http://www.gesis.org/en/research/ information_technology/komohe.htm Email: vivien.petras@gesis.org