logging in or signing up Slides 0203 Montejo Keywords physics Kliment Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: Embed: Flash iPad Dynamic Copy Does not support media & animations Automatically changes to Flash or non-Flash embed WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 89 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 06, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: CERN European Organization for Nuclear Research Automatic Keyword Assignment for High Energy Physics Literature Arturo Montejo Ráez ETT/SI Data Handling Group- CERN Geneva (Switzerland) Joint Research Center, Ispra (Italy) -4 March 2002CERN: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research What we are going to see today... Data Handling Group Keyword assignment process Why keywords? How it is done for High Energy Physics papers The HEPindexer project: Future work Data Algorithm Experiments ResultsCERN: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group Authors Indexer Keyworded papersCERN: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling GroupSlide5: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The document... Full text paper Stored in a database Simplified representation neededSlide6: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The thesaurus... Controlled vocabulary of concepts Relationships between keywords Categories and subcategories Can be domain specific Can be translated into multiple languagesSlide7: CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The thesaurus: a relational model for terms cheese MT 6016 processed agricultural produce BT1 milk product NT1 blue-veined cheese NT1 cow's milk cheese NT1 fresh cheese NT1 goat's milk cheese NT1 hard cheese NT1 processed cheese NT1 semi-soft cheese NT1 sheep's milk cheese NT1 soft cheese RT cheese factory (6031) Slide8: CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The thesaurus: a subject tree 04 POLITICS 0406 political framework 0411 political party 0416 electoral procedure and voting 0421 parliament 0426 parliamentary proceedings 0431 politics and public safety 0436 executive power and public service 08 INTERNATIONAL RELATIONS 0806 international affairs 0811 cooperation policy 0816 international balance 0821 defence 10 EUROPEAN COMMUNITIES 1006 Community institutions and European civil service 1011 Community law 1016 European construction 1021 Community finance Slide9: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The indexer... An expert in the domain of the documents An expert in the use of the thesaurus Heavy task Not always the same proposition Expensive!Slide10: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Why keywords? Data Handling Group Permit to index documents in a coherent way Can be viewed like the "index" at the end of a book Concepts that represent better the content Human made (value added) Meaningful Can stablish relations between documents MultilingualSlide11: CERN European Organization for Nuclear Research Data Handling Group Why keywords? Access to documents But... we already have fulltext indexing! Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide12: Classification: To store (libraries) To access (narrow searches) Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Why keywords? Category 1 Category 2 Category 3Slide13: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Why keywords? Navaja Razor Couteau Navaja Razor Couteau Razor? Lametta Lametta Crosslingual accessSlide14: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Why keywords? Razor Razor Lametta Lametta Multilingual comparison Murder Frabbica CERN European Organization for Nuclear Research Data Handling Group CERN European Organization for Nuclear Research Data Handling Group Why keywords? Multilingual comparisonSlide15: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? Advantages over fulltext searches: No ambiguity Better relevance and precision Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 More advanced tools for searching and classification are coming!Slide16: CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? The BIG problem... - E X P E N S I V E - Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide17: CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? The BIG problem? E X P E N S I V E ? Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide18: CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? The BIG problem? E X P E N S I V E ? Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide19: CERN European Organization for Nuclear Research Data Handling Group CERN The CERN Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 The world's largest particle physics centre Explores what matter is made of, and what forces hold it together Employs just under 3000 people 6500 scientists, come for their researchSlide20: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group DESY: Deutsche Elektronen-Synchrotron (Hamburg, Germany) DESY thesaurus Group of indexers (students, experts...) Only High Energy Physics related papersSlide21: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The DESY thesaurus A *a4(2040) ('postulated particle, a4(2040)', was delta(2040)) *a6(2450) ('postulated particle, a6(2450)', was delta(2450)) *abelian *aberration absorption -absorptive model (model, absorption) accelerator . . . B B B anti-B B+ B+L number B*(5320) (excited B) -B** ('B*2...', similar for B/s, etc.) *B*2(5732) (postulated particle, B*2(5732)) B- -B-factory (B, particle source) B-L number . . . Slide22: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The DESY thesaurus: Few categories rarely used Only two type of keywords: main keywords (1191) secondary keywords (949) No relationships between terms Specific terminologySlide23: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The DESY thesaurus: specific terminology Energy declarations: 1.5-2.7 GeV-cms Resonances: Delta (1232) Reaction equations: anti-p p ---> K0 K- pi+ Combinations: angular distribution, (photon), mass spectrum (pi+ pi- pi0) Two-particle initial state: 'anti-p p', 'electron positron'Slide24: Physicists Indexer Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The problem More than 500 preprints per week!Slide25: CERN European Organization for Nuclear Research The HEPindexer project Data Handling Group Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Physicists Indexer Keyworded papers The solutionSlide26: CERN European Organization for Nuclear Research The HEPindexer project Data Handling Group Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Use of IR techniques Objective evaluation Real time answer Easy portable Full integrable into CDS Posibility of growing Fully automatical & aider toolSlide27: Keyworded papers (collection) Keyword Term Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer projectSlide28: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Keyword Term DESY keywords DocumentsSlide29: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project 1220 test collection 2441 training collection Data 3,661 documents 19,143 terms 1,191 main keywordsSlide30: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project AlgorithmSlide31: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Algorithm Preprocessing Punctuation Lower case Remove stop words StemmingSlide32: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Weight term - document Weight keyword - document Weight keyword - term Similarity keyword - document AlgorithmSlide33: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project ExperimentsSlide34: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Experiments AÇB A B A: keywords propossed by DESY B: keywords propossed by HEPindexer Keywords in the trainning collectionSlide35: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Results 52.7 % of precision 58.5 % of recall Response in 2 secondsSlide36: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project ResultsSlide37: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project ResultsSlide38: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project C++ / STL UNIX Command line interface Digilib: Web interface (PHP) http://cern.ch/digilib Installation on the CERN Document Server http://cds.cern.ch SoftwareSlide39: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project SoftwareSlide40: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project SoftwareSlide41: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project SoftwareSlide42: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Future Work Automatic proposition of secondary keywords Improve the algorithm (lemmatizer, multiwords, segmentation...) Use of references to link documents based on common concepts Specific algorithms for handling of energies, particle decays, desintegrations, etc. Agents OAI Apply Semantic Web approaches You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Slides 0203 Montejo Keywords physics Kliment Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: Embed: Flash iPad Dynamic Copy Does not support media & animations Automatically changes to Flash or non-Flash embed WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 89 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 06, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: CERN European Organization for Nuclear Research Automatic Keyword Assignment for High Energy Physics Literature Arturo Montejo Ráez ETT/SI Data Handling Group- CERN Geneva (Switzerland) Joint Research Center, Ispra (Italy) -4 March 2002CERN: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research What we are going to see today... Data Handling Group Keyword assignment process Why keywords? How it is done for High Energy Physics papers The HEPindexer project: Future work Data Algorithm Experiments ResultsCERN: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group Authors Indexer Keyworded papersCERN: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling GroupSlide5: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The document... Full text paper Stored in a database Simplified representation neededSlide6: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The thesaurus... Controlled vocabulary of concepts Relationships between keywords Categories and subcategories Can be domain specific Can be translated into multiple languagesSlide7: CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The thesaurus: a relational model for terms cheese MT 6016 processed agricultural produce BT1 milk product NT1 blue-veined cheese NT1 cow's milk cheese NT1 fresh cheese NT1 goat's milk cheese NT1 hard cheese NT1 processed cheese NT1 semi-soft cheese NT1 sheep's milk cheese NT1 soft cheese RT cheese factory (6031) Slide8: CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The thesaurus: a subject tree 04 POLITICS 0406 political framework 0411 political party 0416 electoral procedure and voting 0421 parliament 0426 parliamentary proceedings 0431 politics and public safety 0436 executive power and public service 08 INTERNATIONAL RELATIONS 0806 international affairs 0811 cooperation policy 0816 international balance 0821 defence 10 EUROPEAN COMMUNITIES 1006 Community institutions and European civil service 1011 Community law 1016 European construction 1021 Community finance Slide9: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Keyword assignment process Data Handling Group The indexer... An expert in the domain of the documents An expert in the use of the thesaurus Heavy task Not always the same proposition Expensive!Slide10: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Why keywords? Data Handling Group Permit to index documents in a coherent way Can be viewed like the "index" at the end of a book Concepts that represent better the content Human made (value added) Meaningful Can stablish relations between documents MultilingualSlide11: CERN European Organization for Nuclear Research Data Handling Group Why keywords? Access to documents But... we already have fulltext indexing! Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide12: Classification: To store (libraries) To access (narrow searches) Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Why keywords? Category 1 Category 2 Category 3Slide13: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Why keywords? Navaja Razor Couteau Navaja Razor Couteau Razor? Lametta Lametta Crosslingual accessSlide14: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Why keywords? Razor Razor Lametta Lametta Multilingual comparison Murder Frabbica CERN European Organization for Nuclear Research Data Handling Group CERN European Organization for Nuclear Research Data Handling Group Why keywords? Multilingual comparisonSlide15: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? Advantages over fulltext searches: No ambiguity Better relevance and precision Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 More advanced tools for searching and classification are coming!Slide16: CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? The BIG problem... - E X P E N S I V E - Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide17: CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? The BIG problem? E X P E N S I V E ? Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide18: CERN European Organization for Nuclear Research Data Handling Group CERN Why keywords? The BIG problem? E X P E N S I V E ? Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002Slide19: CERN European Organization for Nuclear Research Data Handling Group CERN The CERN Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 The world's largest particle physics centre Explores what matter is made of, and what forces hold it together Employs just under 3000 people 6500 scientists, come for their researchSlide20: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group DESY: Deutsche Elektronen-Synchrotron (Hamburg, Germany) DESY thesaurus Group of indexers (students, experts...) Only High Energy Physics related papersSlide21: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The DESY thesaurus A *a4(2040) ('postulated particle, a4(2040)', was delta(2040)) *a6(2450) ('postulated particle, a6(2450)', was delta(2450)) *abelian *aberration absorption -absorptive model (model, absorption) accelerator . . . B B B anti-B B+ B+L number B*(5320) (excited B) -B** ('B*2...', similar for B/s, etc.) *B*2(5732) (postulated particle, B*2(5732)) B- -B-factory (B, particle source) B-L number . . . Slide22: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The DESY thesaurus: Few categories rarely used Only two type of keywords: main keywords (1191) secondary keywords (949) No relationships between terms Specific terminologySlide23: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The DESY thesaurus: specific terminology Energy declarations: 1.5-2.7 GeV-cms Resonances: Delta (1232) Reaction equations: anti-p p ---> K0 K- pi+ Combinations: angular distribution, (photon), mass spectrum (pi+ pi- pi0) Two-particle initial state: 'anti-p p', 'electron positron'Slide24: Physicists Indexer Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research How it is done for High Energy Physics papers Data Handling Group The problem More than 500 preprints per week!Slide25: CERN European Organization for Nuclear Research The HEPindexer project Data Handling Group Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Physicists Indexer Keyworded papers The solutionSlide26: CERN European Organization for Nuclear Research The HEPindexer project Data Handling Group Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 Use of IR techniques Objective evaluation Real time answer Easy portable Full integrable into CDS Posibility of growing Fully automatical & aider toolSlide27: Keyworded papers (collection) Keyword Term Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer projectSlide28: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Keyword Term DESY keywords DocumentsSlide29: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project 1220 test collection 2441 training collection Data 3,661 documents 19,143 terms 1,191 main keywordsSlide30: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project AlgorithmSlide31: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Algorithm Preprocessing Punctuation Lower case Remove stop words StemmingSlide32: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Weight term - document Weight keyword - document Weight keyword - term Similarity keyword - document AlgorithmSlide33: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project ExperimentsSlide34: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Experiments AÇB A B A: keywords propossed by DESY B: keywords propossed by HEPindexer Keywords in the trainning collectionSlide35: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project Results 52.7 % of precision 58.5 % of recall Response in 2 secondsSlide36: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project ResultsSlide37: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project ResultsSlide38: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project C++ / STL UNIX Command line interface Digilib: Web interface (PHP) http://cern.ch/digilib Installation on the CERN Document Server http://cds.cern.ch SoftwareSlide39: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project SoftwareSlide40: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project SoftwareSlide41: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group The HEPindexer project SoftwareSlide42: Automatic Keywording for HEP literature Ispra (Italy) 4 March 2002 CERN European Organization for Nuclear Research Data Handling Group Future Work Automatic proposition of secondary keywords Improve the algorithm (lemmatizer, multiwords, segmentation...) Use of references to link documents based on common concepts Specific algorithms for handling of energies, particle decays, desintegrations, etc. Agents OAI Apply Semantic Web approaches