logging in or signing up Ceausu Gourmet Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 101 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 16, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Towards a Text Mining Driven Approach for Terminology Construction: Towards a Text Mining Driven Approach for Terminology Construction Valentina Ceausu, Sylvie Desprès CRIP 5, René Descartes University Overview: OverviewWhy a terminology of road accidents ? : Why a terminology of road accidents ? Exploited by a case based reasoning system CBR Case base (collection of source cases) Created from accident scenarios Accident scenarios : natural language description of sets of similar accidents Created by experts in road safety New problem (target case) Created from accident reports Accident reports created by policemen Scope and available resources : Scope and available resources Scope To compare cases created from accident reports with cases created from accident scenarios Problem : scenarios and reports are created by different communities Available resources Meta-model to represent accidents Ontology of road accidents (Protege 2000) To solve the problem Create a terminology of road accidents from a set of accident reports Knowledge extraction: patterns recognition algorithm : Knowledge extraction: patterns recognition algorithm Available corpora: 250 reports of accidents in and around Lille Goal: to extract knowledge from natural language corpora Recognition of lexical patterns Pattern :association of lexical types Nominal (Noun, Preposition, Noun) Verbal (Verb, Preposition, Noun ) Input: Annotated corpora (TreeTagger, Cordial) Output: Important number of word regroupings Refining approaches Extract of Accident Report Le cycle de marque GO SPORT conduit par M XXXXXXXXXXXXXXXXld d'Auteuil, vient du carrefour des Anciens Combattants et se dirige vers l'ave Robert Schuman. Au niveau du Nø 31 du dit boulevard le cycle s'arrête sur le côté droit du côté des num XXXXXXXXXXXXXXXXXe long des véhicules en stationnement se préparant à traverser vers le num XXXXXXXXXXXXXXXXXcycle et sur le passage piétons. Lorsque le cycle commence sa manoeuvre la voiture de marque Volkswagen Nø 381 LTL 75 conduite par Me XXXXXXXXXXXXXXXXcule, vient et se dirige dans le même sens de progression que le cycle, heurte de son avant la roue arrière du vélo. Suite au choc le cycliste est blessé légèrement. Transport à l'hôpital A.Paré à Boulogne par les sapeurs pompiers locaux. Non admis. Le changement de direction sans précaution de la part du cycliste et la non maîtrise de son véhicule de la part de l'automobiliste semblent être à l'origine de l'accident. Lexical patterns and corresponding regroupings: Lexical patterns and corresponding regroupings Lexical Patterns Noun , Noun Noun, Preposition, Noun Noun, Preposition, Adjective Verb, Preposition, Noun Verb, Preposition, Adjective Corresponding regroupings accident , agent (accident, policeman) usager de route (road user) groupe de piéton ( group of pedestrians) trottoir de droite (right side pavement) diriger vers place (direct to square) virer à gauche (turn left) virer à droite (turn right) Apriori algorithm (1/3) : Apriori algorithm (1/3) Association rules extraction Agrawal & Srikant, 1994 Adaptation to text mining : Maedche & Staab, 2000 Basic association rules algorithm Set of transactions Set of words {véhicule, conducteur,(vehicle, driver)} Association (X=>Y) X and Y are word regroupings X = conducteur (driver) Y = de véhicule (of vehicle )Apriori algorithm (2/3): Apriori algorithm (2/3) Linguistic rule: word co-occurrences Quality measures : Thresholds defined by user Intervention of an expert to select threshold values Support and confidence exceed user-defined thresholds =>association rule Apriori algorithm (3/3): Apriori algorithm (3/3) Steps of Apriori algorithm : Generate the association set (according to patterns ) For each association: Determinate support Determinate confidence Output association rules that exceed user-defined confidence and support Apriori output: véhicule, automobile ( vehicle, car) volant, véhicule (steering wheel, vehicle) conducteur, véhicule (driver, vehicle) conducteur, camion (driver, van) conducteur, cyclomoteur (driver, motorbike) Output interpretation: terms of field trottoir de droite (right side pavement) Relations : conducteur, véhicule (driver, vehicle) Type of relations : IS-A : véhicule, automobile ( vehicle, car) PART-OF volant, véhicule (steering wheel, vehicle) Functional: conducteur, propriétaire (driver, owner) conducteur, véhicule (driver, vehicle) Particular form: conducteur, camion (driver, van) Refining the set of verbal syntagms (1/4) : Refining the set of verbal syntagms (1/4) Verbal syntagms : instances of verbal patterns Verb classes identification Class of verbs : a set of regroupings generated by the same verb Two-term regroupings : {diriger vers (direct to), venir de (come from)} Three-term regroupings Instances of “Verb, Preposition, (Argument) ” patterns Extensions of two –term regroupings venir de gauche (come from left ) ; diriger vers infrastructure (direct to infrastructure ) Important number of three –term regroupings Extremely fine level of granularity Refining the set of verbal syntagms (2/4): Refining the set of verbal syntagms (2/4) Using a domain model to refine the set of verbal syntagms extensions of three-term associations can be organized in homogeneous lists Direction (direction) : droite (right), gauche (left), devant (in front of) ; Lieu (place) : usine (factory), parc (parc), domicile (home) ; Humain : enfant (child ), piéton (pedestrian), personne (person) ; Associating each list to a concept of ontology of road accidents Ontology previously created from experts knowledge Manual intervention to assign lists to concepts Refining the set of verbal syntagms (3/4) Venir (to come) class: Refining the set of verbal syntagms (3/4) Venir (to come) class venir de hau bourdin (come from hau bourdin ) venir de i (come from i) venir de abbaye (come from abbey ) venir de résidence (come from residence ) venir de rue (come from street ) venir de gauche (come from left) venir par (come by ) venir par droite (come by right) venir vers enfant (come to child ) Noise, instances are eliminated venir de lieu (come from place) venir de infrastructure (come from infrastructure) venir de direction (come from direction) venir par direction (come by direction) venir vers humain (come towards human) Refining the set of verbal syntagms (4/4): Refining the set of verbal syntagms (4/4) Decreasing the number of three-term regroupings Many arguments assigned to the same concept Eliminate parasitic regroupings and noise Created lists will not contain terms out of the field « diriger vers 12 (direct to 12)» : “12 ” will be not included in a list - Eliminating valuable regroupings if created lists are incomplete Text mining driven terminology construction: Text mining driven terminology construction Linguistic analysis : integrating text mining results : Linguistic analysis : integrating text mining results Input of linguistic analysis phase Syntex and Cordial output Goal of this phase : Selection of domain terms and Identification of lexical relations Difficulties of this phase : Manual treatment difficult for large corpora No information available to guide the selection To solve difficulties : Integrate Apriori results Selection of terms Identification of lexical relations Linguistic analysis: Linguistic analysisNormalization phase: integrating text mining results: Normalization phase: integrating text mining results Input of linguistic analysis phase Previously selected terms Lexical relations between terms Goal Definition of terminological concepts Semantic relations modeling Difficulties: No information for semantic relations To solve difficulties: Integrate lexical relations Integrate previously identified verb classes Integrate non-taxonomic relations provided by Apriori Formalization phase: integrating text mining results: Formalization phase: integrating text mining resultsConclusion: Conclusion Semi-automatic approach to build a terminology Construction process supported by text mining results Association rules results to guide selection of terms Lexical patterns improve work with Linguae module Identify non-taxonomic relations Results obtained are more general Syntex output : SE DIRIGER vers la Commune de Wahagnies (Direct to Wahagnies village ) Text mining output : diriger vers lieu (direct to a place) Semantic relation modeling : Guided by verbs of domain Apriori output Future work: Future work Tools in the pre-treatment phase Definition and identification of syntactic patterns New heuristics to generate associations Using other quality measures to rank extracted rules Towards an automatic approach to assign lists of terms to ontology concepts Towards identifying functional and structural properties Thank you : Thank you ceausu@math-info.univ-paris5.fr You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Ceausu Gourmet Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 101 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 16, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Towards a Text Mining Driven Approach for Terminology Construction: Towards a Text Mining Driven Approach for Terminology Construction Valentina Ceausu, Sylvie Desprès CRIP 5, René Descartes University Overview: OverviewWhy a terminology of road accidents ? : Why a terminology of road accidents ? Exploited by a case based reasoning system CBR Case base (collection of source cases) Created from accident scenarios Accident scenarios : natural language description of sets of similar accidents Created by experts in road safety New problem (target case) Created from accident reports Accident reports created by policemen Scope and available resources : Scope and available resources Scope To compare cases created from accident reports with cases created from accident scenarios Problem : scenarios and reports are created by different communities Available resources Meta-model to represent accidents Ontology of road accidents (Protege 2000) To solve the problem Create a terminology of road accidents from a set of accident reports Knowledge extraction: patterns recognition algorithm : Knowledge extraction: patterns recognition algorithm Available corpora: 250 reports of accidents in and around Lille Goal: to extract knowledge from natural language corpora Recognition of lexical patterns Pattern :association of lexical types Nominal (Noun, Preposition, Noun) Verbal (Verb, Preposition, Noun ) Input: Annotated corpora (TreeTagger, Cordial) Output: Important number of word regroupings Refining approaches Extract of Accident Report Le cycle de marque GO SPORT conduit par M XXXXXXXXXXXXXXXXld d'Auteuil, vient du carrefour des Anciens Combattants et se dirige vers l'ave Robert Schuman. Au niveau du Nø 31 du dit boulevard le cycle s'arrête sur le côté droit du côté des num XXXXXXXXXXXXXXXXXe long des véhicules en stationnement se préparant à traverser vers le num XXXXXXXXXXXXXXXXXcycle et sur le passage piétons. Lorsque le cycle commence sa manoeuvre la voiture de marque Volkswagen Nø 381 LTL 75 conduite par Me XXXXXXXXXXXXXXXXcule, vient et se dirige dans le même sens de progression que le cycle, heurte de son avant la roue arrière du vélo. Suite au choc le cycliste est blessé légèrement. Transport à l'hôpital A.Paré à Boulogne par les sapeurs pompiers locaux. Non admis. Le changement de direction sans précaution de la part du cycliste et la non maîtrise de son véhicule de la part de l'automobiliste semblent être à l'origine de l'accident. Lexical patterns and corresponding regroupings: Lexical patterns and corresponding regroupings Lexical Patterns Noun , Noun Noun, Preposition, Noun Noun, Preposition, Adjective Verb, Preposition, Noun Verb, Preposition, Adjective Corresponding regroupings accident , agent (accident, policeman) usager de route (road user) groupe de piéton ( group of pedestrians) trottoir de droite (right side pavement) diriger vers place (direct to square) virer à gauche (turn left) virer à droite (turn right) Apriori algorithm (1/3) : Apriori algorithm (1/3) Association rules extraction Agrawal & Srikant, 1994 Adaptation to text mining : Maedche & Staab, 2000 Basic association rules algorithm Set of transactions Set of words {véhicule, conducteur,(vehicle, driver)} Association (X=>Y) X and Y are word regroupings X = conducteur (driver) Y = de véhicule (of vehicle )Apriori algorithm (2/3): Apriori algorithm (2/3) Linguistic rule: word co-occurrences Quality measures : Thresholds defined by user Intervention of an expert to select threshold values Support and confidence exceed user-defined thresholds =>association rule Apriori algorithm (3/3): Apriori algorithm (3/3) Steps of Apriori algorithm : Generate the association set (according to patterns ) For each association: Determinate support Determinate confidence Output association rules that exceed user-defined confidence and support Apriori output: véhicule, automobile ( vehicle, car) volant, véhicule (steering wheel, vehicle) conducteur, véhicule (driver, vehicle) conducteur, camion (driver, van) conducteur, cyclomoteur (driver, motorbike) Output interpretation: terms of field trottoir de droite (right side pavement) Relations : conducteur, véhicule (driver, vehicle) Type of relations : IS-A : véhicule, automobile ( vehicle, car) PART-OF volant, véhicule (steering wheel, vehicle) Functional: conducteur, propriétaire (driver, owner) conducteur, véhicule (driver, vehicle) Particular form: conducteur, camion (driver, van) Refining the set of verbal syntagms (1/4) : Refining the set of verbal syntagms (1/4) Verbal syntagms : instances of verbal patterns Verb classes identification Class of verbs : a set of regroupings generated by the same verb Two-term regroupings : {diriger vers (direct to), venir de (come from)} Three-term regroupings Instances of “Verb, Preposition, (Argument) ” patterns Extensions of two –term regroupings venir de gauche (come from left ) ; diriger vers infrastructure (direct to infrastructure ) Important number of three –term regroupings Extremely fine level of granularity Refining the set of verbal syntagms (2/4): Refining the set of verbal syntagms (2/4) Using a domain model to refine the set of verbal syntagms extensions of three-term associations can be organized in homogeneous lists Direction (direction) : droite (right), gauche (left), devant (in front of) ; Lieu (place) : usine (factory), parc (parc), domicile (home) ; Humain : enfant (child ), piéton (pedestrian), personne (person) ; Associating each list to a concept of ontology of road accidents Ontology previously created from experts knowledge Manual intervention to assign lists to concepts Refining the set of verbal syntagms (3/4) Venir (to come) class: Refining the set of verbal syntagms (3/4) Venir (to come) class venir de hau bourdin (come from hau bourdin ) venir de i (come from i) venir de abbaye (come from abbey ) venir de résidence (come from residence ) venir de rue (come from street ) venir de gauche (come from left) venir par (come by ) venir par droite (come by right) venir vers enfant (come to child ) Noise, instances are eliminated venir de lieu (come from place) venir de infrastructure (come from infrastructure) venir de direction (come from direction) venir par direction (come by direction) venir vers humain (come towards human) Refining the set of verbal syntagms (4/4): Refining the set of verbal syntagms (4/4) Decreasing the number of three-term regroupings Many arguments assigned to the same concept Eliminate parasitic regroupings and noise Created lists will not contain terms out of the field « diriger vers 12 (direct to 12)» : “12 ” will be not included in a list - Eliminating valuable regroupings if created lists are incomplete Text mining driven terminology construction: Text mining driven terminology construction Linguistic analysis : integrating text mining results : Linguistic analysis : integrating text mining results Input of linguistic analysis phase Syntex and Cordial output Goal of this phase : Selection of domain terms and Identification of lexical relations Difficulties of this phase : Manual treatment difficult for large corpora No information available to guide the selection To solve difficulties : Integrate Apriori results Selection of terms Identification of lexical relations Linguistic analysis: Linguistic analysisNormalization phase: integrating text mining results: Normalization phase: integrating text mining results Input of linguistic analysis phase Previously selected terms Lexical relations between terms Goal Definition of terminological concepts Semantic relations modeling Difficulties: No information for semantic relations To solve difficulties: Integrate lexical relations Integrate previously identified verb classes Integrate non-taxonomic relations provided by Apriori Formalization phase: integrating text mining results: Formalization phase: integrating text mining resultsConclusion: Conclusion Semi-automatic approach to build a terminology Construction process supported by text mining results Association rules results to guide selection of terms Lexical patterns improve work with Linguae module Identify non-taxonomic relations Results obtained are more general Syntex output : SE DIRIGER vers la Commune de Wahagnies (Direct to Wahagnies village ) Text mining output : diriger vers lieu (direct to a place) Semantic relation modeling : Guided by verbs of domain Apriori output Future work: Future work Tools in the pre-treatment phase Definition and identification of syntactic patterns New heuristics to generate associations Using other quality measures to rank extracted rules Towards an automatic approach to assign lists of terms to ontology concepts Towards identifying functional and structural properties Thank you : Thank you ceausu@math-info.univ-paris5.fr