Share PowerPoint. Anywhere!

nodalida99

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Like authorSTREAM?


You can vote once a day till December
10th, Vote Now!
Views: 31
Like it  ( Likes) Dislike it  ( Dislikes)
Added: February 07, 2008 This presentation is Public
Presentation Category :Education
Presentation StatisticsNew!
Views on authorSTREAM: 30 | Views from Embeds: 1
Presentation Transcript

Slide1 : Adapting an English Information Extraction System to Swedish Kristofer Franzén Information and Language Engineering Group Human Computer Interaction and Language Engineering Laboratory Swedish Institute of Computer Science franzen@sics.se


Overview : Overview Information Extraction The Proteus-system at NYU Changes made to the system Experiment results The SICS IE system


Information Extraction? : Information Extraction? Capturing predefined events or relations in texts. Example scenarios: Changes in corporate executive management personnel. Capture all information about people changing jobs at higher positions in companies. Aircraft accidents. Capture all information about flight, airline, accident location, aircraft model, flight origin, flight destination, accident casualties etc.


Template filling : Template filling - Karo Bio. Per-Olof Mårtensson har åter utsetts till VD efter att sedan förra våren ha varit ordförande. Mårtensson efterträds på ordförandeposten av Bertil Hållsten, tidigare chef för SE-Bankens läkemedelsfonder. POSITION VD COMPANY Karo Bio IN-PERSON Per-Olof Mårtensson POSITION ordförande COMPANY Karo Bio IN-PERSON Bertil Hållsten OUT-PERSON Per-Olof Mårtensson POSITION chef COMPANY SE-Bankens läkemedelsfonder OUT-PERSON Bertil Hållsten


General system architecture : Local text analysis General system architecture Discourse analysis


Incremental pattern matching : …and lexical generalization Totte Boll, tidigare VD i Eckym Ropos Inc., har utsetts till … [person]name , tidigare [position]np i [company]name , [utse]vg-pass till … [person]np , [position-in-company]np , [utse]vg-pass till … [person]np-entity , [appoint]vg-pass till … which would match the beginning of the following event-pattern np-entity(person) vg(appoint, voice=pass) 'till' np(position) ('av' np(company))? Incremental pattern matching


Incremental pattern matching : …and lexical generalization Totte Boll, tidigare VD i Eckym Ropos Inc., har utsetts till … [person]name , tidigare [position]np i [company]name , [utse]vg till … string=”Totte Boll” tense=perf voice=pass [person] name , [post-in-company]np , [utse]vg till … … tense=former … post=”VD” org=”Eckym Ropos Inc.” [person] np [appoint]vg-pass till … outPos=[post-in-company] ... ... which would match the beginning of the following event-pattern np(person) vg(appoint, voice=pass) 'till' np(position) ('av' np(company))? Incremental pattern matching


Syntactic generalization : Syntactic generalization Metarules transform patterns to capture all syntactic variations. Assam Pärks styrelse har utsett Totte Boll till ny styrelseordförande. Totte Boll utses i morgon till ny styrelseordförande i Assam Pärks. Totte Boll, tidigare VD i Eckym Ropos Inc., har utsetts till styrelseordförande i Assam Pärks. Totte Boll, utsedd till ny styrelseordförande i Assam Pärks, är en glad lax. Totte Boll, som utsågs till styrelseordförande i Assam Pärks igår, har framtiden för sig. Assam Pärks nye styrelseordförande Totte Boll skyr inga medel.


Changes made to the system : Changes made to the system Lexical analysis Input format Rule predicates Domain and task independent patterns Knowledge bases Scenario specific patterns


Swedish management succession : Training corpus 34 news articles (51 events) F-score  55 Test corpus 50 news articles (87 events) F-score  25 MUC-6 systems F-score 48-56 Proteus today F-score  65 Swedish management succession F-score = (2 * precision * recall) / (precision + recall)


Possible reasons : Possible reasons Overtraining (overfitting) Mismatching interpretation of the template filling rules System design (scenario- and linguistic specifics too integrated in the core system)


Conclusions : Conclusions linguistic differences were not a problem pragmatic differences were not a problem complex system not easy to reconfigure compiling a test corpus is difficult


SICS IE system : SICS IE system Goal Domain-, language- and platform independent, modular, open and free IE-system. Within one year ;-) What we have a general annotation-based (TIPSTER) infrastructure of a document processing system and some low level pattern bases. Next to do pattern matching language conforming to the Common Pattern Specification Language definition (CPSL) a set of scenario independent pattern libraries