clef2007 ds overview

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest, September 19, 2007

Slide2: 

Outline The Domain-Specific Task Collections & Controlled Vocabularies Topics Participants, Runs & Relevance Assessments Themes Summary & Outlook

Slide3: 

The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage bibliographic metadata & controlled vocabularies for: search translation

Slide4: 

The Domain-Specific Task Tasks: Monolingual against German, English or Russian Bilingual against German, English or Russian Multilingual against combined collection

Slide5: 

Collections

Slide6: 

Controlled Vocabularies 5 different subject-describing terminologies: Thesaurus for the Social Sciences (GIRT-DE, -EN) Thesaurus of Sociological Indexing Terms (CSA-SA) INION Thesaurus (ISISS) Social Sciences Classification (GIRT-DE, -EN) Sociological Abstracts Classification (CSA-SA)

Slide7: 

Controlled Vocabularies – Mapping Tools Translation: GIRT German  GIRT English Intellectual term mappings (cross-walks): equivalent terms in vocabularies GIRT German  CSA-SA English GIRT English  CSA-SA English original-term: agricultural area mapped-term: Rural areas

Slide8: 

Topics 25 topics in standard TREC format (title, desc, narr): 15 volunteers (social scientists) 2-5 suggestions from 28 subject specialties checked for: coverage in collections variance from previous years translated into English, Russian

Slide9: 

Participants 5 groups

Slide10: 

Runs

Slide11: 

Relevance Assessments * In Russian collection: 3 topics without relevant topics All assessments done with Univ. of Padova‘s DIRECT System.

Slide12: 

Relevance Assessments – Best MAP

Slide13: 

Themes - Retrieval models Lucene Language Modelling Logistic Regression Comparison: Vector Space, LM, Probabilistic - Okapi, DFR Data fusion Russian word-based vs. N-gram retrieval new light-weight stemmer

Slide14: 

Themes – Query Expansion Entry Vocabulary Modules query terms associated with thesaurus terms from documents Thesaurus Lookup combined thesaurus from all CVs GIRT Thesaurus Index Lexical Entailment find document terms in relation to query terms Blind Feedback

Slide15: 

Themes – Translation Lucene plug-in Babelfish, Google, PROMT, Reverso Bilingual thesaurus mapping Dictionary adaption disambiguate term translation given language context of feedback documents Statistical machine translation MATRAX Commercial Software

Slide16: 

Summary & Outlook Extension of Russian materials Translation table DE-EN-RU for GIRT Thesaurus Translation table RU-EN for INION Thesaurus Mapping between GIRT – INION Thesaurus More tools for Terminology mapping different relationships (0T, SYN, BT, NT, RT) GESIS-IZ project: > 40 mappings 25 controlled vocabularies / 11 disciplines ~ 125,000 terms & phrases ~ 400,000 relations

Slide17: 

Domain-Specific Track: http://www.gesis.org/en/research/ information_technology/clef_ds_2007.htm Vocabulary Mappings: http://www.gesis.org/en/research/ information_technology/komohe.htm Email: vivien.petras@gesis.org