Ceausu

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Towards a Text Mining Driven Approach for Terminology Construction: 

Towards a Text Mining Driven Approach for Terminology Construction Valentina Ceausu, Sylvie Desprès CRIP 5, René Descartes University

Overview: 

Overview

Why a terminology of road accidents ? : 

Why a terminology of road accidents ? Exploited by a case based reasoning system CBR Case base (collection of source cases) Created from accident scenarios Accident scenarios : natural language description of sets of similar accidents Created by experts in road safety New problem (target case) Created from accident reports Accident reports created by policemen

  Scope and available resources : 

  Scope and available resources Scope To compare cases created from accident reports with cases created from accident scenarios Problem : scenarios and reports are created by different communities Available resources Meta-model to represent accidents Ontology of road accidents (Protege 2000) To solve the problem Create a terminology of road accidents from a set of accident reports

Knowledge extraction: patterns recognition algorithm : 

Knowledge extraction: patterns recognition algorithm Available corpora: 250 reports of accidents in and around Lille Goal: to extract knowledge from natural language corpora Recognition of lexical patterns Pattern :association of lexical types Nominal (Noun, Preposition, Noun) Verbal (Verb, Preposition, Noun ) Input: Annotated corpora (TreeTagger, Cordial) Output: Important number of word regroupings Refining approaches Extract of Accident Report Le cycle de marque GO SPORT conduit par M XXXXXXXXXXXXXXXXld d'Auteuil, vient du carrefour des Anciens Combattants et se dirige vers l'ave Robert Schuman. Au niveau du Nø 31 du dit boulevard le cycle s'arrête sur le côté droit du côté des num XXXXXXXXXXXXXXXXXe long des véhicules en stationnement se préparant à traverser vers le num XXXXXXXXXXXXXXXXXcycle et sur le passage piétons. Lorsque le cycle commence sa manoeuvre la voiture de marque Volkswagen Nø 381 LTL 75 conduite par Me XXXXXXXXXXXXXXXXcule, vient et se dirige dans le même sens de progression que le cycle, heurte de son avant la roue arrière du vélo. Suite au choc le cycliste est blessé légèrement. Transport à l'hôpital A.Paré à Boulogne par les sapeurs pompiers locaux. Non admis. Le changement de direction sans précaution de la part du cycliste et la non maîtrise de son véhicule de la part de l'automobiliste semblent être à l'origine de l'accident.

Lexical patterns and corresponding regroupings: 

Lexical patterns and corresponding regroupings Lexical Patterns Noun , Noun Noun, Preposition, Noun Noun, Preposition, Adjective Verb, Preposition, Noun Verb, Preposition, Adjective Corresponding regroupings accident , agent (accident, policeman) usager de route (road user) groupe de piéton ( group of pedestrians)  trottoir de droite (right side pavement) diriger vers place (direct to square) virer à gauche (turn left) virer à droite (turn right)

Apriori algorithm (1/3) : 

Apriori algorithm (1/3) Association rules extraction Agrawal & Srikant, 1994 Adaptation to text mining : Maedche & Staab, 2000 Basic association rules algorithm Set of transactions Set of words {véhicule, conducteur,(vehicle, driver)} Association (X=>Y) X and Y are word regroupings X = conducteur (driver) Y = de véhicule (of vehicle )

Apriori algorithm (2/3): 

Apriori algorithm (2/3) Linguistic rule: word co-occurrences Quality measures : Thresholds defined by user Intervention of an expert to select threshold values Support and confidence exceed user-defined thresholds =>association rule

Apriori algorithm (3/3): 

Apriori algorithm (3/3) Steps of Apriori algorithm : Generate the association set (according to patterns ) For each association: Determinate support Determinate confidence Output association rules that exceed user-defined confidence and support Apriori output: véhicule, automobile ( vehicle, car) volant, véhicule (steering wheel, vehicle) conducteur, véhicule  (driver, vehicle) conducteur, camion (driver, van) conducteur, cyclomoteur (driver, motorbike) Output interpretation: terms of field trottoir de droite (right side pavement) Relations : conducteur, véhicule (driver, vehicle) Type of relations : IS-A : véhicule, automobile ( vehicle, car) PART-OF volant, véhicule (steering wheel, vehicle) Functional: conducteur, propriétaire (driver, owner) conducteur, véhicule (driver, vehicle) Particular form: conducteur, camion  (driver, van)

Refining the set of verbal syntagms (1/4) : 

Refining the set of verbal syntagms (1/4) Verbal syntagms : instances of verbal patterns Verb classes identification Class of verbs : a set of regroupings generated by the same verb Two-term regroupings : {diriger vers (direct to), venir de (come from)} Three-term regroupings Instances of “Verb, Preposition, (Argument) ” patterns Extensions of two –term regroupings venir de gauche (come from left ) ; diriger vers infrastructure (direct to infrastructure ) Important number of three –term regroupings Extremely fine level of granularity

Refining the set of verbal syntagms (2/4): 

Refining the set of verbal syntagms (2/4) Using a domain model to refine the set of verbal syntagms extensions of three-term associations can be organized in homogeneous lists Direction (direction)  : droite (right), gauche (left), devant (in front of) ; Lieu (place)  : usine (factory), parc (parc), domicile (home) ; Humain : enfant (child ), piéton (pedestrian), personne (person) ; Associating each list to a concept of ontology of road accidents Ontology previously created from experts knowledge Manual intervention to assign lists to concepts

Refining the set of verbal syntagms (3/4) Venir (to come) class: 

Refining the set of verbal syntagms (3/4) Venir (to come) class venir de hau bourdin (come from hau bourdin ) venir de i (come from i) venir de abbaye (come from abbey ) venir de résidence (come from residence ) venir de rue (come from street ) venir de gauche (come from left) venir par (come by ) venir par droite (come by right) venir vers enfant (come to child ) Noise, instances are eliminated venir de lieu (come from place) venir de infrastructure (come from infrastructure) venir de direction (come from direction) venir par direction (come by direction) venir vers humain (come towards human)

Refining the set of verbal syntagms (4/4): 

Refining the set of verbal syntagms (4/4) Decreasing the number of three-term regroupings Many arguments assigned to the same concept Eliminate parasitic regroupings and noise Created lists will not contain terms out of the field « diriger vers 12  (direct to 12)» : “12 ” will be not included in a list - Eliminating valuable regroupings if created lists are incomplete

Text mining driven terminology construction: 

Text mining driven terminology construction

  Linguistic analysis : integrating text mining results : 

  Linguistic analysis : integrating text mining results Input of linguistic analysis phase Syntex and Cordial output Goal of this phase : Selection of domain terms and Identification of lexical relations Difficulties of this phase : Manual treatment difficult for large corpora No information available to guide the selection To solve difficulties : Integrate Apriori results Selection of terms Identification of lexical relations

Linguistic analysis: 

Linguistic analysis

Normalization phase: integrating text mining results: 

Normalization phase: integrating text mining results Input of linguistic analysis phase Previously selected terms Lexical relations between terms Goal Definition of terminological concepts Semantic relations modeling Difficulties: No information for semantic relations To solve difficulties: Integrate lexical relations Integrate previously identified verb classes Integrate non-taxonomic relations provided by Apriori

Formalization phase: integrating text mining results: 

Formalization phase: integrating text mining results

Conclusion: 

Conclusion Semi-automatic approach to build a terminology Construction process supported by text mining results Association rules results to guide selection of terms Lexical patterns improve work with Linguae module Identify non-taxonomic relations Results obtained are more general Syntex output : SE DIRIGER vers la Commune de Wahagnies (Direct to Wahagnies village ) Text mining output : diriger vers lieu (direct to a place) Semantic relation modeling : Guided by verbs of domain Apriori output

Future work: 

Future work Tools in the pre-treatment phase Definition and identification of syntactic patterns New heuristics to generate associations Using other quality measures to rank extracted rules Towards an automatic approach to assign lists of terms to ontology concepts Towards identifying functional and structural properties

Thank you : 

Thank you ceausu@math-info.univ-paris5.fr