14 EVALITA NER intro Speranza

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst

Outline: 

Outline Named Entity Recognition at EVALITA 2007 Introduction to the task Participants Evaluation Dataset Metrics Results Ranking Discussion Conclusion EVALITA 2007 Workshop Rome, September 10, 2007

Introduction to the NER Task: 

Introduction to the NER Task Task: Recognize Named Entities in Italian newspaper articles Four types of Named Entities: Geo-Political Entities (GPE): e.g. Italy Location Entities (LOC): e.g. Tevere Organization Entities (ORG): e.g. FIAT Person Entities (PER): e.g. Napolitano Based on the ACE Entity Recognition and Normalization Task Adaptations from ACE: limit the task to the recognition of Named Entities adapt it to Italian EVALITA 2007 Workshop Rome, September 10, 2007

Participants: 

Participants In the NER Task we had six participants: FBK-irst, Trento (FBKirst_Zanoli_NER) LDC, University of Pennsylvania (LDC_Walker_NER) University of Alicante (UniAli_Kozareva_NER) University of Dortmund (UniDort_Jungermann_NER) University of Duisburg-Essen (UniDuE_Roessler_NER) Yahoo, Barcelona (Yahoo_Ciaramita_NER) Only one Italian institution, while two from Spain and two from Germany One participant from the USA EVALITA 2007 Workshop Rome, September 10, 2007

Evaluation Dataset: I-CAB (i): 

Evaluation Dataset: I-CAB (i) 525 news stories from the Italian local newspaper “L’Adige” 4 days 5 categories Two sections 7-8 September 2004 7-8 October 2004 News Stories Cultural News Economic News Sports News Local News Number of words = 182.500 Average number of words per file = 348 EVALITA 2007 Workshop Rome, September 10, 2007 training (335 news stories) test (190 news stories)

Evaluation Dataset: I-CAB (ii): 

EVALITA 2007 Workshop Rome, September 10, 2007 Evaluation Dataset: I-CAB (ii)

Evaluation of Results: 

Evaluation of Results Scorer: CONLL Shared Task 2002 Metrics: Precision (Pr.), Recall (Re.), and F-Measure (FB1) Official ranking is based on FB1 EVALITA 2007 Workshop Rome, September 10, 2007

Official Ranking: 

Official Ranking

Official Ranking: 

Official Ranking

Discussion: 

Discussion

Discussion: 

Discussion

Discussion: 

Discussion

Discussion: 

Discussion

Conclusions: 

Conclusions Good interest from the community: 14 initial registrations 6 participants (though only one Italian Institution) Relatively high rate of abandonment (8/14, 60%) Good performance best system at CONLL: 88.8% for English, 72.4% for German best system at EVALITA: 82.1% EVALITA 2007 Workshop Rome, September 10, 2007

Slide15: 

Thanks to all who participated EVALITA 2007 Workshop Rome, September 10, 2007

References: 

References ACE. http://www.nist.gov/speech/tests/ace/index.htm CONLL. http://www.cnts.ua.ac.be/conll2002/ner/ L’Adige. http://www.ladige.it/ Linguistic Data Consortium (LDC). Automatic Content Extraction English Annotation Guidelines for Entities, version 5.6.1 2005.05.23. http://projects.ldc.upenn.edu/ ace/docs/English-Entities-Guidelines_v5.6.1.pdf Magnini, Cappelli, Pianta, Speranza, Bartalesi Lenzi, Sprugnoli, Romano, Girardi, Negri. Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI 2006, X Congresso Internazionale della Società di Linguistica e Filologia Italiana, Firenze 14-17 giugno 2006. Magnini, Pianta, Speranza, Bartalesi Lenzi, Sprugnoli. Italian Content Annotation Bank (I-CAB): Named Entities, Technical report, ITC-irst, 2007. http://evalita.itc.it/tasks/I-CAB-Report-Named-Entities.pdf ONTOTEXT. http://ontotext.itc.it/ EVALITA 2007 Workshop Rome, September 10, 2007