logging in or signing up meaning Marianna Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 479 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 09, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: MEANING Developing Multilingual Web-scale Language Technologies IST-2001-34460 http://www.lsi.upc.es/~nlp/meaning/meaning.html German Rigau i ClaramuntMEANING: Introduction: MEANING: Introduction From Financial Times US officials has expected Basra to fall early Music sales will fall by up to 15% this year No missiles have fallen and ...MEANING: Introduction: MEANING: Introduction Sense 10 fall -- (be captured; "The cities fell to the enemy") => yield -- (cease opposition; stop fighting) Sense 2 descend, fall, go down, come down -- (move downward but not necessarily all the way; "The temperature is going down"; "The barometer is falling"; "Real estate prices are coming down") => travel, go, move, locomote -- (change location; …) Sense 1 fall -- (descend in free fall under the influence of gravity; "The branch fell from the tree"; "The unfortunate hiker fell into a crevasse") => travel, go, move, locomote -- (change location; …)MEANING: Introduction: MEANING: Introduction From NLP to NLU Large-scale Semantic Processing dealing with concepts (senses) rather than words Two complementary OPEN problems: Acquisition bottleneck Autonomous large-scale knowledge acquisition systems Ambiguity bottleneck Highly accurate WSD systemsMEANING: Introduction: MEANING: Introduction Dealing with the ACQ/WSD deadlock Dealing with knowledge acquisition Need of texts automatically sense tagged Current state-of-the-art 60%-70% accuracy! Dealing with concepts Need of knowledge not currently available: Subcategorization frequencies for predicates Selectional Preferences, etc. Dealing with multilingualism Need of compatibility across resourcesMEANING: Introduction: MEANING: Introduction Dealing with the ACQ/WSD deadlock Addressing Acquisition and WSD simultaneously three consecutive MEANING cycles Language is highly polysemous but also highly redundant Multilingualism maybe is part of the solution using EuroWordNet Reuse of incompatible large-scale resources Mapping technology to connect already available data Cross-checking capabilities to detect inconsistenciesMEANING: Architecture: MEANING: Architecture Multilingual Central Repository Italian EWN Basque EWN Spanish EWN English EWN Basque Web Corpus Italian Web Corpus English Web Corpus Catalan EWN Spanish Web Corpus Catalan Web Corpus ACQ ACQ ACQ ACQ UPLOAD UPLOAD UPLOAD UPLOAD PORT PORT PORT PORT WSD WSD WSD WSDMEANING: Overview: MEANING: Overview 3 years research project (2002-2005) 1.610 Million Euro Consortium TALP Research Center, UPC ITC-IRST IXA group, UPV/EHU University of Sussex Irion TechnologiesMEANING: Workplan: MEANING: Workplan MEANING: Workplan: MEANING: Workplan WP3 (Linguistic Processors) Three development cycles: WP5 (Acquisition): (ACQ0, ACQ1, ACQ2) Local acquisition of knowledge using specially designed tools and resources, corpus and wordnets WP4 (Integration): (PORT0, PORT1, PORT2) Uploading the acquired knowledge from each language into the Multilingual Central Repository and porting to the local wordnets WP6 (WSD): (WSD0, WSD1, WSD2) Word Sense Disambiguation using the local wordnets and the enriched knowledge ported from the MCR WP7 (evaluation and assessment) of the software tools and resources producedMEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP3 Linguistic Processors & Infrastructure: MEANING: WP3 Linguistic Processors & Infrastructure ITC-IRST Basque, Catalan, English, Italian, Spanish Tokenization and sentence boundary detection Lemmatization Part of Speech tagging Noun-group chunking Robust-shallow parsing NERC Keyword, topic and terminology detection Text Classification (e.g. FINANCE, SPORT, etc.) Direct access to web Search Engines MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP4 (Knowledge) Integration: MEANING: WP4 (Knowledge) Integration TALP-UPC The Multilingual Central Repository acts as a multilingual interface for uploading, integrating and porting all the knowledge produced by MEANING Uploading the knowledge acquired from one language to the MCR Integrating and validating the knowledge uploaded Porting all the knowledge acquired to the local wordnets, balancing resources and technological advances across languages MEANING: MCR Software: MEANING: MCR Software Web Interface to the MCR Based on Web EuroWordNet Interface (WEI) APIs SOAP Perl, C++ Import/Export facilities XML Advanced Analysis Module Provides different views of the multilingual dataMEANING: MCR Content: MEANING: MCR Content ILI WordNet1.6 EuroWordNet Base Concepts EuroWordNet Top Ontology Multiwordnet Domains SUMO Local wordnets Wordnets of five Languages Basque, Catalan, English, Italian, Spanish Five WordNet versions (1.5, 1.6, 1.7, 1.7.1, 2.0) eXtended WordNet Large collections of Semantic Preferences Acquired from SemCor (179,942) Acquired from BNC (295,422) Instances Named InstancesMEANING: MCR: MEANING: MCRMEANING: Porting Process: MEANING: Porting Process Uploading process Checking errors and inconsistencies Coherent integration of every piece of information Dealing with several WordNet versions Integration process Consistency checking and direct inference Making explicit all knowledge contained into the MCR Realisation (top-down) Generalisation (bottom-up) Porting process Direct porting to local wordnets or New inference rules When detecting particular semantic patternsMEANING: MCR Content: MEANING: MCR Content ILI WordNet1.6 EuroWordNet Base Concepts => WN1.5 EuroWordNet Top Ontology => WN1.5 Multiwordnet Domains => WN1.6 SUMO => WN1.6 Local wordnets Wordnets of five European Languages Basque, Catalan, English, Italian, Spanish Five WordNet versions (1.5, 1.6, 1.7, 1.7.1, 2.0) eXtended WordNet => WN1.7 Large collections of Semantic Preferences Acquired from SemCor (179,942) => WN1.6 Acquired from BNC (295,422) => WN1.6 Instances Named Instances => WN1.6MEANING: Mapping technology: C1 C2 C3 C5 C6 C4 MEANING: Mapping technologyMEANING: Mapping technology: C1 C2 C3 C5 C6 C4 MEANING: Mapping technologyMEANING: Mapping Technology: MEANING: Mapping Technology Mapping technology for connecting already existing semantic networks (i.e. wordnets) Relaxation Labelling Algorithm (Daudé et al. 2003) Iterative algorithm for function optimisation based on local information Local constraints with global effects! Structural Constraints (hierarchical and non hierarchical) Non structural constraints (synonym words, gloss, etc.) Given a set of constraints, provides de best possible mapping!MEANING: Mapping Technology: MEANING: Mapping TechnologyMEANING: Porting Process: MEANING: Porting Process UPLOAD0 PORT0 Relations Spanish 53,272 = English 59,951 +4,246 Italian 18,175 +763 Catalan 53,272 = Basque 53,272 = Role Spanish 0 +162,212 English 390,109 = Italian 0 +103,002 Catalan 0 +125,997 Basque 0 +161,807 MEANING: Porting Process: MEANING: Porting Process UPLOAD0 PORT0 Instance Spanish 0 +1,599 English 0 +2,128 Italian +791 = Catalan 0 +1,599 Basque 0 +365 Domain Spanish 0 +48,053 English 96,067 = Italian 30,607 = Catalan 0 +35,177 Basque 0 +25,860 MEANING: Porting Process: MEANING: Porting Process UPLOAD0 PORT0 Top Ontology Spanish 1,290 = English 0 +1,554 Italian 0 +946 Catalan 1,180 = Basque 1,126 = MEANING: MCR0: MEANING: MCR0 vaso_1 02755829n 06-NOUN.ARTIFACT FACTOTUM GLOSS: a glass container for holding liquids while drinking TO: 1stOrderEntity-Form-Object TO: 1stOrderEntity-Origin-Artifact TO: 1stOrderEntity-Function-Container TO: 1stOrderEntity-Function-Instrument EN: drinking_glass glass IT: bicchiere BA: edontzi baso edalontzi CA: got vas DOBJ SemCor 00849393v 0.0074 polish shine smooth ... 00201878v 0.0013 beautify embellish prettify 00826635v 0.0010 get_hold_of take 00140937v 0.0001 ameliorate amend ... 00083947v 0.0000 alter changeMEANING: MCR0: MEANING: MCR0 vaso_2 04195626n 08-NOUN.BODY ANATOMY GLOSS: a tube in which a body fluid circulates TO: 1stOrderEntity-Form-Substance-Solid TO: 1stOrderEntity-Origin-Natural-Living TO: 1stOrderEntity-Composition-Part TO: 1stOrderEntity-Function-Container EN: vessel vas IT: vaso canale BA: hodi baso CA: vas DOBJ SemCor SUBJ SemCor 01781222v 0.0334 be occur 01831830v 0.0133 stop terminate 00058757v 0.0072 inject shoot 01357963v 0.0127 flow travel_along 01357963v 0.0068 flow travel_along 01830886v 0.0043 discontinue 00055849v 0.0045 administer dispense ... 01779664v 0.0008 cease end finish ...MEANING: MCR0: MEANING: MCR0 vaso_3 09914390n 23-NOUN.QUANTITY NUMBER GLOSS: the quantity a glass will hold TO: 1stOrderEntity-Composition-Part TO: 2ndOrderEntity-SituationType-Static TO: 2ndOrderEntity-SituationComponent-Quantity EN: glassful glass IT: bicchierata bicchiere BA: basokada CA: got vas DOBJ SemCor 00795711v 0.0026 drink imbibe 01530096v 0.0009 accept have take 00786286v 0.0009 consume have ingest take take_in 01513874v 0.0001 acquire getMEANING: MCR: MEANING: MCRMEANING: MCR: MEANING: MCRMEANING: MCR1: MEANING: MCR1 vaso_1 02755829n 06-NOUN.ARTIFACT FACTOTUM SUMO: &%Artifact+ LOGICAL FORMULA: glass:NN(x1) -> glass:NN(x1) container:NN(x2) for:IN(x1, e1) hold:VB(e1, x1, x3) liquid:NN(x3) while:IN(e0, e2) drink:VB(e2, x1) PARSING: (TOP (S (NP (NN glass) ) (VP (VBZ is) (NP (NP (DT a) (NN glass) (NN container) ) (PP (IN for) (S (VP (VBG holding) (PP (NP (NNS liquids) ) (IN while) ) (VBG drinking) ) ) ) ) ) (. .) ) ) WSD: <wf pos="DT" >a</wf> <wf pos="NN" lemma="glass" quality="silver" wnsn="2" >glass</wf> <wf pos="NN" lemma="container" quality="silver" wnsn="1" >container</wf> <wf pos="IN" >for</wf> <wf pos="VBG" lemma="hold" quality="normal" wnsn="8" >holding</wf> <wf pos="NNS" lemma="liquid" quality="normal" wnsn="1" >liquids</wf> <wf pos="IN" >while</wf> <wf pos="VBG" lemma="drink" quality="normal" wnsn="1" >drinking</wf> MEANING: MCR1: MEANING: MCR1 vaso_2 04195626n 08-NOUN.BODY ANATOMY SUMO: &%BodyVessel+ LOGICAL FORMULA: vessel:NN(x1) -> tube:NN(x1) in:IN(x2, x3) body_fluid:NN(x2) circulate:VB(e1, x2) PARSING: (TOP (S (NP (NN vessel) ) (VP (VBZ is) (NP (NP (DT a) (NN tube) ) (SBAR (WHPP (IN in) (WHNP (WDT which) ) ) (S (NP (DT a) (NN body) (NN fluid) ) (VP (VBZ circulates) ) ) ) ) ) (. .) ) ) WSD: <wf pos="DT" >a</wf> <wf pos="NN" lemma="tube" quality="gold" wnsn="4" wnsn="4" >tube</wf> <wf pos="IN" >in</wf> <wf pos="WDT" >which</wf> <wf pos="DT" >a</wf> <wf pos="NN" lemma="body_fluid" quality="silver" wnsn="1" >body_fluid</wf> <wf pos="VBZ" lemma="circulate" quality="gold" wnsn="4" wnsn="4“ >circulates</wf> MEANING: MCR1: MEANING: MCR1 vaso_3 09914390n 23-NOUN.QUANTITY NUMBER SUMO: &%ConstantQuantity+ LOGICAL FORMULA: glass:NN(x1) -> quantity:NN(x1) glass:NN(x2) hold:VB(e1, x2) PARSING: (TOP (S (NP (NN glass) ) (VP (VP (VBZ is) (NP (DT the) (NN quantity) ) (NP (DT a) (NN glass) ) ) (VP (MD will) (VP (VB hold) ) ) ) (. .) ) ) WSD: <wf pos="DT" >the</wf> <wf pos="NN" lemma="quantity" quality="silver" wnsn="1" >quantity</wf> <wf pos="DT" >a</wf> <wf pos="NN" lemma="glass" quality="normal" wnsn="2" >glass</wf> <wf pos="MD" >will</wf> <wf pos="VB" lemma="hold" quality="normal" wnsn="1" >hold</wf> MEANING: MCR and consistency checking: MEANING: MCR and consistency checking 00536235n blow &%Breathing+ anatomy 00005052v blow &%Breathing+ medicine 00003430v exhale &%Breathing+ biology 00003142v exhale &%Breathing+ medicine 00899001a exhaled &%Breathing+ factotum 00263355a exhaling &%Breathing+ factotum 00536039n expiration &%Breathing+ anatomy 02849508a expiratory &%Breathing+ anatomy 00003142v expire &%Breathing+ medicine 02579534a inhalant &%Breathing+ anatomy 00536863n inhalation &%Breathing+ anatomy 00003763v inhale &%Breathing+ medicine 00898664a inhaled &%Breathing+ factotum 00263512a inhaling &%Breathing+ factotum 00537041n pant &%Breathing+ anatomy 00004002v pant &%Breathing+ medicine 00535106n panting &%Breathing+ anatomy 00264603a panting &%Breathing+ factotum 00411482r pantingly &%Breathing+ factotum ...MEANING: MCR and consistency checking: Does an orchard apple tree have leaves? Does an orchad apple tree have fruits? Does a cactus have leaves? MEANING: MCR and consistency checkingMEANING: MCR and consistency checking: MEANING: MCR and consistency checkingMEANING: MCR and consistency checking: Example SUMO: Boiling (subclass Boiling StateChange) (documentation Boiling "The Class of Processes where an Object is heated and converted from a Liquid to a Gas.") (=> (instance ?BOIL Boiling) (exists (?HEAT) (and (instance ?HEAT Heating) (subProcess ?HEAT ?BOIL)))) "if instance BOIL Boiling, then there exists HEAT such that instance HEAT Heating and subProcess HEAT BOIL" MEANING: MCR and consistency checkingMEANING: MCR: MEANING: MCR MCR produced by Meaning is going to constitute the natural multilingual large-scale linguistic resource for a number of semantic processes that need large amounts of linguistic knowledge to be effective tools (e.g. Web ontologies). All wordnets gained some kind of new knowledge coming from other wordnets by means of the first porting process. The resulting MCR is one of the largest and richest multilingual lexical--knowledge ever built. http://nipadio.lsi.upc.es/cgi-bin/mcrWei/public/wei.consult.perl MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP5 Acquisition: MEANING: WP5 Acquisition University of Sussex ACQ0 Subcategorisation frequencies Topic signatures Domain Information for Named Entities Sense examples ACQ1 New senses Coarser-grained sense distinctions Selectional Preferences ACQ2 Specific lexico-semantic relations Thematic role assignments for nominalisations Diathesis alternationsMEANING: WP5 Acquisition: MEANING: WP5 Acquisition 11 ongoing experiments A Multilingual Acquisition for predicates B Collocations C Domain information for NEs D Topic signatures E Sense Examples F MRDs G Selectional Preferences H Coarse-grained senses I Multiword Acquisition J Enriching WordNet with collocations K New sensesMEANING: WP5 Acquisition E: Sense Examples: <evento social> <competición, concurso> <evento> <partido_1> <semifinal> <cuartos_de_final> <grupo_social> <organización> <agrupación grupo colectivo> <partido_2, partido_político> <partido_laborista> MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: partido 1 Pero España puso al partido intensidad, ritmo y coraje. El seleccionador cree que el partido de hoy contra Italia dará la medida de España El Racing no gana en su campo desde hace seis partidos. partido 2 Todos los partidos piden reformas legales para TV3. La derecha planea agruparse en un partido. El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols. MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: partido 1 Rivera pide el soporte de la afición para encarrilar las semifinales. Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado. El Racing ganó los cuartos de final en su campo. partido 2 No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan. Una vez más es noticia la desviación de fondos destinados a la formación ocupacional hacia la financiación de un partido político. Estas lleyes fueron votadas gracias a un consenso general de los partidos políticos. MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Senseval-2 BNC Google art%1:04:00:: -> 61 (48+13) 26 37.400 art%1:06:00:: -> 88 (70+18) 146 1.260.000 art%1:09:00:: -> 37 (29+8) 368 542.000 art%1:10:00:: -> 1 (1+0) 275 2.920.050 arts%1:09:00:: -> 32 (25+7) 311 3.289.320 BNC Google art 9.989 56.000.000 MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Goal of Experiment E: automatically produce training data for WSD systems of size and coverage orders of magnitude larger than currently available (manually produced) resources First release of ExRetriever (Desember 2003) Experiments (February 2004) Future work (February 2005 and beyond …) MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: First release of ExRetriever ExRetriever is able to use MCR and different corpora (SemCor, BNC, Google) through a common API. ExRetriever has been powered with a declarative language for query construction. A tool for performance evaluation and summarization (P/R/F-meassures) MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Experiments The experiment has been devoted to test the first prototype of ExRetriever. Direct evaluation of accuracy and productivity of the different approaches for building queries have been performed for English on SemCor. Words from Senseval 2 (lexical sample) Different queries inspired by (Leacock et al. 98), (Mihalcea and Moldovan 99), etc. MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Query set using a declarative language Lea1Semcor query=or(nrel(1,syns)) or or(nrel(1,hypo)) or or(nrel(1,hype)); Meaning1Semcor query=Glos(or,and,noempty) or or(nrel(1,syns)) or or(nrel(1,hypo)); Meaning2Semcor query=Glos(or,and,noempty) or Glos(or,and,or,rel(hypo),noempty) or Glos(or,and,or,rel(syns),noempty); Moldo1Semcor query=or(nrel(1,syns)); Moldo2Semcor query=or(rel(glos)); Moldo3Semcor query=Glos(or,and,noempty); MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Example Using LDB: WordNet Using Indexer: Swish Using Corpus: Semcor Base on which the query is made (lemma#POS): grip#n Query for sense (1): (clutches) or (embracing or "wrestling hold") or ("taking hold“ or prehension) <Example Sentences="1" src="brownv/tagfiles/br-e03#1112" Chars="60" size_tagged_Semcor="399" Words="12"> The pulsating vibration of energy <MEANING synsetPOS="n" baseSense="1" baseLema="grip" origPOS="n" rel="syns" synsetSense="1" synsetLema="clutches" basePOS="n"> clutches </MEANING> at the_pit of your stomach. </Example> MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Future work (February 2004 and beyond …) Analysis of the Results (which query is best in which conditions) Designing New Queries using more knowledge (Domains, EWN Top ontology, SUMO, new relations, ...) Latent Semantic Analisis and logic operations with vectors (Widdows et al. 2003) Indirect evaluation using BNC ... MEANING: WP5 Acquisition E: Sense ExamplesMEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP6 WSD: MEANING: WP6 WSD IXA group, UPV/EHU Overall WP6 objective: high precision system for all open-class words for all languages Combining unsupervised knowledge-based systems with supervised Machine Learning algorithms Current state-of-the-art: 69% in Senseval-2 all-words for English Based on supervised ML on Semcor (500 Kw) as training data No baseline for other languages MEANING: WP6 WSD: Main problem: Need of dozens of manually tagged examples for each word sense (how many?) MEANING strategy: Automatically acquiring a huge number of examples per sense from the web (ACQ, MCR, bootstrapping, sense ranking, ...) Improve current supervised and unsupervised systems Using sophisticated linguistic information, such as, syntactic relations, semantic classes, selectional restrictions, subcategorisation information, domains, etc. Efficient margin-based Machine Learning algorithms Novel algorithms that combine tagged examples with huge amounts of untagged examples in order to increase the precision of the system MEANING: WP6 WSDMEANING: WP6 WSD: MEANING: WP6 WSD IXA group, UPV/EHU WSD0 State-of-the-art all words systems Explore improvements of current supervised systems WSD1 Improved all words systems using richer linguistic features (better Linguistic Processors, MCR0) WSD2 Improved all words systems using richer linguistic features (better Linguistic Processors, MCR1) examples automatically acquired from the webMEANING: WP6 WSD: MEANING: WP6 WSD 9 ongoing experiments A All-words for English B High precision WSD for Boostrapping => H C High quality sense examples => H D TSVM => H E All-words for non-English F More informed features G Unsupervised WSD H Boostrapping I Effect of sense clusters J Semantic class classifiers K Ranking senses automatically L Disambiguating WN glossesMEANING: WP6 WSD K: Ranking Senses Automatically: MEANING: WP6 WSD K: Ranking Senses Automatically The first sense heuristic (FSH) is a powerful one Usually, unsupervised WSD systems perform worse! Sense distributions change according to the type of text (Escudero et al. 2000, Martínez and Eneko 2000) Supervised systems only work if we do change the type of text!MEANING: WP6 WSD K: Ranking Senses Automatically: MEANING: WP6 WSD K: Ranking Senses Automatically Ranking Method Use nearest neighbours acquired from corpora using distributional similarity (e.g. Lin 1998) star: superstar 0.1666, player (0.157), teammate (0.121), actor (0.121) ... galaxy (0.078), sun (0.077), world (0.063), planet (0,061) ... The dominance of a given sense is related to the distributional similarity of their neighbours Disambiguate the neighbours using the WordNet Similarity package MEANING: WP6 WSD K: Ranking Senses Automatically: MEANING: WP6 WSD K: Ranking Senses Automatically Ranking Experiments Ranking from different corpora: pipe Semcor: tobacco pipe BNC: underground pipe Ranking from domain specific corpora: tie BNC: necktie Reuters Finance: affiliation Reuters sport: draw Senseval-2 all nouns task: 65% precission, 60% recallMEANING: WP6 WSD J: Semantic Class Classifiers : MEANING: WP6 WSD J: Semantic Class Classifiers From Financial Times US officials has expected Basra to fall early Music sales will fall by up to 15% this year No missiles have fallen and ... (21) v.motion Motion+ (3) v.possession UnilateralGetting+ (46) v.motion Decreasing+ MEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <play_7, play_on_1> perform music on (a musical instrument); “He plays the flute” “Can you play on this old recorder?” <pipe_3> play one a pipe <drum_2> play the drums <trumpet_2> play or blow the trumpetMEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <play_7, play_on_1> perform music on (a musical_instrument_1); “He plays the flute_3” “Can you play on this old_recorder_4?” <pipe_3> play one a pipe_4 <drum_2> play the drums_1 <trumpet_2> play or blow the trumpet_1MEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <play_7, play_on_1> perform music on (a musical_instrument_1); “He plays the flute_3” “Can you play on this old recorder_4?” <pipe_3> play one a pipe_4 <drum_2> play the drums_1 <trumpet_2> play or blow the trumpet_1 <instrument_1> ROLE INSTRUMENTMEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <tocar_13> <play_7, play_on_1> perform music on (a musical_instrument_1); “He plays the flute_3” “Can you play on this old recorder_4?” <pipe_3> play one a pipe_4 <drum_2> play the drums_1 <tambor_2> <trumpet_2> play or blow the trumpet_1 <instrument_1> ROLE INSTRUMENT <instrumento_musical_1>MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP8 User validation: MEANING: WP8 User validation Irion Technologies (University of Sussex) To provide the project with industrial feedback Demonstration of MEANING by integrating the results in existing web products of Irion TwentyOne: CLIR system Adjust: Cross-Lingual classification system Pidgin: Cross-Lingual Q/A dialogue system EFE: Spanish News Agency Huge multilingual database of picture captionsMEANING: WP8 User validation: MEANING: WP8 User validation Baselines of Irion applications Cross-lingual retrieval system: English, Dutch, German, French, Spanish and Italian Document classification system Resources SemNet WordNet & WordNet Domains Linking between SemNet and WordNet Test collection Reuters News Archive 1996-1997, English CLIR: 100 ambiguous queries extracted from NPs and translated Document classification: 125 categoriesMEANING: WP8 User validation: MEANING: WP8 User validation CLIR Expansion with wordnet is only useful for synonymous queries in a monolingual setting Expansion with wordnet is always useful in cross-lingual setting Synonym selection is slightly better than concept selection (WSD based on SemNet and WordNet domains) Best approach: combining synonym-selection with concept selection Base-line setting without MEANING results Classification Best results: using disambiguated classifiers and classifiers expanded with most frequent synonyms. Recall is up to 80% and precision is a bit lower than NO expansion. However, coverage is now 100%.MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP9 Exploitation and dissemination: MEANING: WP9 Exploitation and dissemination IXA, UPV/EHU Journals, conferences (First year: 41 published papers) Cooperation SWAP – EDAMOK ESPERONTO BALKANET SENSEVAL-3 Coordinating several tasks: Basque, Catalan, Italian, Spanish During spring 2004: First release of the MCR! MEANING user group! Two workshops First year: San Sebastián (Basque country) Third year: Trento (Italy)Slide73: Donostia / San Sebastian – April 10-12 2003 Proceedings on the Web 8 invited speakers to give feedback (4 euro, 4 american) Walter Daelemans (WSD, ML) Fernando Gomez (Acquisition, semantic interpretation) Julio Gonzalo (WSD, CLIR) Anna Korhonen (Acquisition) Dekang Lin (Acquisition) Alexande Maedche (Acquisition, Semantic WEB) Rada Mihalcea (WSD) David Yarowsky (WSD) MEANING: WP9 First workshopMEANING: Conclusions and Results: MEANING: Conclusions and Results The good news: MEANING works! A Tool Set that using the semantic knowledge of MCR will obtain automatically from the web large collections of examples for each particular word sense. A Tool Set for enriching the MCR using the knowledge acquired automatically from the Web. A Tool Set for selecting accurately the senses of the open-class words for the languages involved in the project. Multilingual Central Repository to maintain compatibility between wordnets of different languages and versions, past and new. The results of MEANING will be public and free.MEANING: Semantic Interpretation: MEANING: Semantic InterpretationMEANING as a framework: MEANING as a framework The bad news: MEANING will focus only on the most promising research lines MEANING has a large amount of work to do! MEANING has only one more cycle! MEANING can be also seen as a common framework to acquire and port knowledge (information/data?) across languages, resources and tools useful for many large-scale Semantic Processing tasks Your collaborations and contributions are welcome! MEANING as a framework: MEANING as a framework Don’t waste your effort! MEANING can recycle your resources!Slide78: MEANING Developing Multilingual Web-scale Language Technologies IST-2001-34460 http://www.lsi.upc.es/~nlp/meaning/meaning.html German Rigau i Claramunt You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
meaning Marianna Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 479 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 09, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: MEANING Developing Multilingual Web-scale Language Technologies IST-2001-34460 http://www.lsi.upc.es/~nlp/meaning/meaning.html German Rigau i ClaramuntMEANING: Introduction: MEANING: Introduction From Financial Times US officials has expected Basra to fall early Music sales will fall by up to 15% this year No missiles have fallen and ...MEANING: Introduction: MEANING: Introduction Sense 10 fall -- (be captured; "The cities fell to the enemy") => yield -- (cease opposition; stop fighting) Sense 2 descend, fall, go down, come down -- (move downward but not necessarily all the way; "The temperature is going down"; "The barometer is falling"; "Real estate prices are coming down") => travel, go, move, locomote -- (change location; …) Sense 1 fall -- (descend in free fall under the influence of gravity; "The branch fell from the tree"; "The unfortunate hiker fell into a crevasse") => travel, go, move, locomote -- (change location; …)MEANING: Introduction: MEANING: Introduction From NLP to NLU Large-scale Semantic Processing dealing with concepts (senses) rather than words Two complementary OPEN problems: Acquisition bottleneck Autonomous large-scale knowledge acquisition systems Ambiguity bottleneck Highly accurate WSD systemsMEANING: Introduction: MEANING: Introduction Dealing with the ACQ/WSD deadlock Dealing with knowledge acquisition Need of texts automatically sense tagged Current state-of-the-art 60%-70% accuracy! Dealing with concepts Need of knowledge not currently available: Subcategorization frequencies for predicates Selectional Preferences, etc. Dealing with multilingualism Need of compatibility across resourcesMEANING: Introduction: MEANING: Introduction Dealing with the ACQ/WSD deadlock Addressing Acquisition and WSD simultaneously three consecutive MEANING cycles Language is highly polysemous but also highly redundant Multilingualism maybe is part of the solution using EuroWordNet Reuse of incompatible large-scale resources Mapping technology to connect already available data Cross-checking capabilities to detect inconsistenciesMEANING: Architecture: MEANING: Architecture Multilingual Central Repository Italian EWN Basque EWN Spanish EWN English EWN Basque Web Corpus Italian Web Corpus English Web Corpus Catalan EWN Spanish Web Corpus Catalan Web Corpus ACQ ACQ ACQ ACQ UPLOAD UPLOAD UPLOAD UPLOAD PORT PORT PORT PORT WSD WSD WSD WSDMEANING: Overview: MEANING: Overview 3 years research project (2002-2005) 1.610 Million Euro Consortium TALP Research Center, UPC ITC-IRST IXA group, UPV/EHU University of Sussex Irion TechnologiesMEANING: Workplan: MEANING: Workplan MEANING: Workplan: MEANING: Workplan WP3 (Linguistic Processors) Three development cycles: WP5 (Acquisition): (ACQ0, ACQ1, ACQ2) Local acquisition of knowledge using specially designed tools and resources, corpus and wordnets WP4 (Integration): (PORT0, PORT1, PORT2) Uploading the acquired knowledge from each language into the Multilingual Central Repository and porting to the local wordnets WP6 (WSD): (WSD0, WSD1, WSD2) Word Sense Disambiguation using the local wordnets and the enriched knowledge ported from the MCR WP7 (evaluation and assessment) of the software tools and resources producedMEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP3 Linguistic Processors & Infrastructure: MEANING: WP3 Linguistic Processors & Infrastructure ITC-IRST Basque, Catalan, English, Italian, Spanish Tokenization and sentence boundary detection Lemmatization Part of Speech tagging Noun-group chunking Robust-shallow parsing NERC Keyword, topic and terminology detection Text Classification (e.g. FINANCE, SPORT, etc.) Direct access to web Search Engines MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP4 (Knowledge) Integration: MEANING: WP4 (Knowledge) Integration TALP-UPC The Multilingual Central Repository acts as a multilingual interface for uploading, integrating and porting all the knowledge produced by MEANING Uploading the knowledge acquired from one language to the MCR Integrating and validating the knowledge uploaded Porting all the knowledge acquired to the local wordnets, balancing resources and technological advances across languages MEANING: MCR Software: MEANING: MCR Software Web Interface to the MCR Based on Web EuroWordNet Interface (WEI) APIs SOAP Perl, C++ Import/Export facilities XML Advanced Analysis Module Provides different views of the multilingual dataMEANING: MCR Content: MEANING: MCR Content ILI WordNet1.6 EuroWordNet Base Concepts EuroWordNet Top Ontology Multiwordnet Domains SUMO Local wordnets Wordnets of five Languages Basque, Catalan, English, Italian, Spanish Five WordNet versions (1.5, 1.6, 1.7, 1.7.1, 2.0) eXtended WordNet Large collections of Semantic Preferences Acquired from SemCor (179,942) Acquired from BNC (295,422) Instances Named InstancesMEANING: MCR: MEANING: MCRMEANING: Porting Process: MEANING: Porting Process Uploading process Checking errors and inconsistencies Coherent integration of every piece of information Dealing with several WordNet versions Integration process Consistency checking and direct inference Making explicit all knowledge contained into the MCR Realisation (top-down) Generalisation (bottom-up) Porting process Direct porting to local wordnets or New inference rules When detecting particular semantic patternsMEANING: MCR Content: MEANING: MCR Content ILI WordNet1.6 EuroWordNet Base Concepts => WN1.5 EuroWordNet Top Ontology => WN1.5 Multiwordnet Domains => WN1.6 SUMO => WN1.6 Local wordnets Wordnets of five European Languages Basque, Catalan, English, Italian, Spanish Five WordNet versions (1.5, 1.6, 1.7, 1.7.1, 2.0) eXtended WordNet => WN1.7 Large collections of Semantic Preferences Acquired from SemCor (179,942) => WN1.6 Acquired from BNC (295,422) => WN1.6 Instances Named Instances => WN1.6MEANING: Mapping technology: C1 C2 C3 C5 C6 C4 MEANING: Mapping technologyMEANING: Mapping technology: C1 C2 C3 C5 C6 C4 MEANING: Mapping technologyMEANING: Mapping Technology: MEANING: Mapping Technology Mapping technology for connecting already existing semantic networks (i.e. wordnets) Relaxation Labelling Algorithm (Daudé et al. 2003) Iterative algorithm for function optimisation based on local information Local constraints with global effects! Structural Constraints (hierarchical and non hierarchical) Non structural constraints (synonym words, gloss, etc.) Given a set of constraints, provides de best possible mapping!MEANING: Mapping Technology: MEANING: Mapping TechnologyMEANING: Porting Process: MEANING: Porting Process UPLOAD0 PORT0 Relations Spanish 53,272 = English 59,951 +4,246 Italian 18,175 +763 Catalan 53,272 = Basque 53,272 = Role Spanish 0 +162,212 English 390,109 = Italian 0 +103,002 Catalan 0 +125,997 Basque 0 +161,807 MEANING: Porting Process: MEANING: Porting Process UPLOAD0 PORT0 Instance Spanish 0 +1,599 English 0 +2,128 Italian +791 = Catalan 0 +1,599 Basque 0 +365 Domain Spanish 0 +48,053 English 96,067 = Italian 30,607 = Catalan 0 +35,177 Basque 0 +25,860 MEANING: Porting Process: MEANING: Porting Process UPLOAD0 PORT0 Top Ontology Spanish 1,290 = English 0 +1,554 Italian 0 +946 Catalan 1,180 = Basque 1,126 = MEANING: MCR0: MEANING: MCR0 vaso_1 02755829n 06-NOUN.ARTIFACT FACTOTUM GLOSS: a glass container for holding liquids while drinking TO: 1stOrderEntity-Form-Object TO: 1stOrderEntity-Origin-Artifact TO: 1stOrderEntity-Function-Container TO: 1stOrderEntity-Function-Instrument EN: drinking_glass glass IT: bicchiere BA: edontzi baso edalontzi CA: got vas DOBJ SemCor 00849393v 0.0074 polish shine smooth ... 00201878v 0.0013 beautify embellish prettify 00826635v 0.0010 get_hold_of take 00140937v 0.0001 ameliorate amend ... 00083947v 0.0000 alter changeMEANING: MCR0: MEANING: MCR0 vaso_2 04195626n 08-NOUN.BODY ANATOMY GLOSS: a tube in which a body fluid circulates TO: 1stOrderEntity-Form-Substance-Solid TO: 1stOrderEntity-Origin-Natural-Living TO: 1stOrderEntity-Composition-Part TO: 1stOrderEntity-Function-Container EN: vessel vas IT: vaso canale BA: hodi baso CA: vas DOBJ SemCor SUBJ SemCor 01781222v 0.0334 be occur 01831830v 0.0133 stop terminate 00058757v 0.0072 inject shoot 01357963v 0.0127 flow travel_along 01357963v 0.0068 flow travel_along 01830886v 0.0043 discontinue 00055849v 0.0045 administer dispense ... 01779664v 0.0008 cease end finish ...MEANING: MCR0: MEANING: MCR0 vaso_3 09914390n 23-NOUN.QUANTITY NUMBER GLOSS: the quantity a glass will hold TO: 1stOrderEntity-Composition-Part TO: 2ndOrderEntity-SituationType-Static TO: 2ndOrderEntity-SituationComponent-Quantity EN: glassful glass IT: bicchierata bicchiere BA: basokada CA: got vas DOBJ SemCor 00795711v 0.0026 drink imbibe 01530096v 0.0009 accept have take 00786286v 0.0009 consume have ingest take take_in 01513874v 0.0001 acquire getMEANING: MCR: MEANING: MCRMEANING: MCR: MEANING: MCRMEANING: MCR1: MEANING: MCR1 vaso_1 02755829n 06-NOUN.ARTIFACT FACTOTUM SUMO: &%Artifact+ LOGICAL FORMULA: glass:NN(x1) -> glass:NN(x1) container:NN(x2) for:IN(x1, e1) hold:VB(e1, x1, x3) liquid:NN(x3) while:IN(e0, e2) drink:VB(e2, x1) PARSING: (TOP (S (NP (NN glass) ) (VP (VBZ is) (NP (NP (DT a) (NN glass) (NN container) ) (PP (IN for) (S (VP (VBG holding) (PP (NP (NNS liquids) ) (IN while) ) (VBG drinking) ) ) ) ) ) (. .) ) ) WSD: <wf pos="DT" >a</wf> <wf pos="NN" lemma="glass" quality="silver" wnsn="2" >glass</wf> <wf pos="NN" lemma="container" quality="silver" wnsn="1" >container</wf> <wf pos="IN" >for</wf> <wf pos="VBG" lemma="hold" quality="normal" wnsn="8" >holding</wf> <wf pos="NNS" lemma="liquid" quality="normal" wnsn="1" >liquids</wf> <wf pos="IN" >while</wf> <wf pos="VBG" lemma="drink" quality="normal" wnsn="1" >drinking</wf> MEANING: MCR1: MEANING: MCR1 vaso_2 04195626n 08-NOUN.BODY ANATOMY SUMO: &%BodyVessel+ LOGICAL FORMULA: vessel:NN(x1) -> tube:NN(x1) in:IN(x2, x3) body_fluid:NN(x2) circulate:VB(e1, x2) PARSING: (TOP (S (NP (NN vessel) ) (VP (VBZ is) (NP (NP (DT a) (NN tube) ) (SBAR (WHPP (IN in) (WHNP (WDT which) ) ) (S (NP (DT a) (NN body) (NN fluid) ) (VP (VBZ circulates) ) ) ) ) ) (. .) ) ) WSD: <wf pos="DT" >a</wf> <wf pos="NN" lemma="tube" quality="gold" wnsn="4" wnsn="4" >tube</wf> <wf pos="IN" >in</wf> <wf pos="WDT" >which</wf> <wf pos="DT" >a</wf> <wf pos="NN" lemma="body_fluid" quality="silver" wnsn="1" >body_fluid</wf> <wf pos="VBZ" lemma="circulate" quality="gold" wnsn="4" wnsn="4“ >circulates</wf> MEANING: MCR1: MEANING: MCR1 vaso_3 09914390n 23-NOUN.QUANTITY NUMBER SUMO: &%ConstantQuantity+ LOGICAL FORMULA: glass:NN(x1) -> quantity:NN(x1) glass:NN(x2) hold:VB(e1, x2) PARSING: (TOP (S (NP (NN glass) ) (VP (VP (VBZ is) (NP (DT the) (NN quantity) ) (NP (DT a) (NN glass) ) ) (VP (MD will) (VP (VB hold) ) ) ) (. .) ) ) WSD: <wf pos="DT" >the</wf> <wf pos="NN" lemma="quantity" quality="silver" wnsn="1" >quantity</wf> <wf pos="DT" >a</wf> <wf pos="NN" lemma="glass" quality="normal" wnsn="2" >glass</wf> <wf pos="MD" >will</wf> <wf pos="VB" lemma="hold" quality="normal" wnsn="1" >hold</wf> MEANING: MCR and consistency checking: MEANING: MCR and consistency checking 00536235n blow &%Breathing+ anatomy 00005052v blow &%Breathing+ medicine 00003430v exhale &%Breathing+ biology 00003142v exhale &%Breathing+ medicine 00899001a exhaled &%Breathing+ factotum 00263355a exhaling &%Breathing+ factotum 00536039n expiration &%Breathing+ anatomy 02849508a expiratory &%Breathing+ anatomy 00003142v expire &%Breathing+ medicine 02579534a inhalant &%Breathing+ anatomy 00536863n inhalation &%Breathing+ anatomy 00003763v inhale &%Breathing+ medicine 00898664a inhaled &%Breathing+ factotum 00263512a inhaling &%Breathing+ factotum 00537041n pant &%Breathing+ anatomy 00004002v pant &%Breathing+ medicine 00535106n panting &%Breathing+ anatomy 00264603a panting &%Breathing+ factotum 00411482r pantingly &%Breathing+ factotum ...MEANING: MCR and consistency checking: Does an orchard apple tree have leaves? Does an orchad apple tree have fruits? Does a cactus have leaves? MEANING: MCR and consistency checkingMEANING: MCR and consistency checking: MEANING: MCR and consistency checkingMEANING: MCR and consistency checking: Example SUMO: Boiling (subclass Boiling StateChange) (documentation Boiling "The Class of Processes where an Object is heated and converted from a Liquid to a Gas.") (=> (instance ?BOIL Boiling) (exists (?HEAT) (and (instance ?HEAT Heating) (subProcess ?HEAT ?BOIL)))) "if instance BOIL Boiling, then there exists HEAT such that instance HEAT Heating and subProcess HEAT BOIL" MEANING: MCR and consistency checkingMEANING: MCR: MEANING: MCR MCR produced by Meaning is going to constitute the natural multilingual large-scale linguistic resource for a number of semantic processes that need large amounts of linguistic knowledge to be effective tools (e.g. Web ontologies). All wordnets gained some kind of new knowledge coming from other wordnets by means of the first porting process. The resulting MCR is one of the largest and richest multilingual lexical--knowledge ever built. http://nipadio.lsi.upc.es/cgi-bin/mcrWei/public/wei.consult.perl MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP5 Acquisition: MEANING: WP5 Acquisition University of Sussex ACQ0 Subcategorisation frequencies Topic signatures Domain Information for Named Entities Sense examples ACQ1 New senses Coarser-grained sense distinctions Selectional Preferences ACQ2 Specific lexico-semantic relations Thematic role assignments for nominalisations Diathesis alternationsMEANING: WP5 Acquisition: MEANING: WP5 Acquisition 11 ongoing experiments A Multilingual Acquisition for predicates B Collocations C Domain information for NEs D Topic signatures E Sense Examples F MRDs G Selectional Preferences H Coarse-grained senses I Multiword Acquisition J Enriching WordNet with collocations K New sensesMEANING: WP5 Acquisition E: Sense Examples: <evento social> <competición, concurso> <evento> <partido_1> <semifinal> <cuartos_de_final> <grupo_social> <organización> <agrupación grupo colectivo> <partido_2, partido_político> <partido_laborista> MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: partido 1 Pero España puso al partido intensidad, ritmo y coraje. El seleccionador cree que el partido de hoy contra Italia dará la medida de España El Racing no gana en su campo desde hace seis partidos. partido 2 Todos los partidos piden reformas legales para TV3. La derecha planea agruparse en un partido. El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols. MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: partido 1 Rivera pide el soporte de la afición para encarrilar las semifinales. Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado. El Racing ganó los cuartos de final en su campo. partido 2 No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan. Una vez más es noticia la desviación de fondos destinados a la formación ocupacional hacia la financiación de un partido político. Estas lleyes fueron votadas gracias a un consenso general de los partidos políticos. MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Senseval-2 BNC Google art%1:04:00:: -> 61 (48+13) 26 37.400 art%1:06:00:: -> 88 (70+18) 146 1.260.000 art%1:09:00:: -> 37 (29+8) 368 542.000 art%1:10:00:: -> 1 (1+0) 275 2.920.050 arts%1:09:00:: -> 32 (25+7) 311 3.289.320 BNC Google art 9.989 56.000.000 MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Goal of Experiment E: automatically produce training data for WSD systems of size and coverage orders of magnitude larger than currently available (manually produced) resources First release of ExRetriever (Desember 2003) Experiments (February 2004) Future work (February 2005 and beyond …) MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: First release of ExRetriever ExRetriever is able to use MCR and different corpora (SemCor, BNC, Google) through a common API. ExRetriever has been powered with a declarative language for query construction. A tool for performance evaluation and summarization (P/R/F-meassures) MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Experiments The experiment has been devoted to test the first prototype of ExRetriever. Direct evaluation of accuracy and productivity of the different approaches for building queries have been performed for English on SemCor. Words from Senseval 2 (lexical sample) Different queries inspired by (Leacock et al. 98), (Mihalcea and Moldovan 99), etc. MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Query set using a declarative language Lea1Semcor query=or(nrel(1,syns)) or or(nrel(1,hypo)) or or(nrel(1,hype)); Meaning1Semcor query=Glos(or,and,noempty) or or(nrel(1,syns)) or or(nrel(1,hypo)); Meaning2Semcor query=Glos(or,and,noempty) or Glos(or,and,or,rel(hypo),noempty) or Glos(or,and,or,rel(syns),noempty); Moldo1Semcor query=or(nrel(1,syns)); Moldo2Semcor query=or(rel(glos)); Moldo3Semcor query=Glos(or,and,noempty); MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Example Using LDB: WordNet Using Indexer: Swish Using Corpus: Semcor Base on which the query is made (lemma#POS): grip#n Query for sense (1): (clutches) or (embracing or "wrestling hold") or ("taking hold“ or prehension) <Example Sentences="1" src="brownv/tagfiles/br-e03#1112" Chars="60" size_tagged_Semcor="399" Words="12"> The pulsating vibration of energy <MEANING synsetPOS="n" baseSense="1" baseLema="grip" origPOS="n" rel="syns" synsetSense="1" synsetLema="clutches" basePOS="n"> clutches </MEANING> at the_pit of your stomach. </Example> MEANING: WP5 Acquisition E: Sense ExamplesMEANING: WP5 Acquisition E: Sense Examples: Future work (February 2004 and beyond …) Analysis of the Results (which query is best in which conditions) Designing New Queries using more knowledge (Domains, EWN Top ontology, SUMO, new relations, ...) Latent Semantic Analisis and logic operations with vectors (Widdows et al. 2003) Indirect evaluation using BNC ... MEANING: WP5 Acquisition E: Sense ExamplesMEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP6 WSD: MEANING: WP6 WSD IXA group, UPV/EHU Overall WP6 objective: high precision system for all open-class words for all languages Combining unsupervised knowledge-based systems with supervised Machine Learning algorithms Current state-of-the-art: 69% in Senseval-2 all-words for English Based on supervised ML on Semcor (500 Kw) as training data No baseline for other languages MEANING: WP6 WSD: Main problem: Need of dozens of manually tagged examples for each word sense (how many?) MEANING strategy: Automatically acquiring a huge number of examples per sense from the web (ACQ, MCR, bootstrapping, sense ranking, ...) Improve current supervised and unsupervised systems Using sophisticated linguistic information, such as, syntactic relations, semantic classes, selectional restrictions, subcategorisation information, domains, etc. Efficient margin-based Machine Learning algorithms Novel algorithms that combine tagged examples with huge amounts of untagged examples in order to increase the precision of the system MEANING: WP6 WSDMEANING: WP6 WSD: MEANING: WP6 WSD IXA group, UPV/EHU WSD0 State-of-the-art all words systems Explore improvements of current supervised systems WSD1 Improved all words systems using richer linguistic features (better Linguistic Processors, MCR0) WSD2 Improved all words systems using richer linguistic features (better Linguistic Processors, MCR1) examples automatically acquired from the webMEANING: WP6 WSD: MEANING: WP6 WSD 9 ongoing experiments A All-words for English B High precision WSD for Boostrapping => H C High quality sense examples => H D TSVM => H E All-words for non-English F More informed features G Unsupervised WSD H Boostrapping I Effect of sense clusters J Semantic class classifiers K Ranking senses automatically L Disambiguating WN glossesMEANING: WP6 WSD K: Ranking Senses Automatically: MEANING: WP6 WSD K: Ranking Senses Automatically The first sense heuristic (FSH) is a powerful one Usually, unsupervised WSD systems perform worse! Sense distributions change according to the type of text (Escudero et al. 2000, Martínez and Eneko 2000) Supervised systems only work if we do change the type of text!MEANING: WP6 WSD K: Ranking Senses Automatically: MEANING: WP6 WSD K: Ranking Senses Automatically Ranking Method Use nearest neighbours acquired from corpora using distributional similarity (e.g. Lin 1998) star: superstar 0.1666, player (0.157), teammate (0.121), actor (0.121) ... galaxy (0.078), sun (0.077), world (0.063), planet (0,061) ... The dominance of a given sense is related to the distributional similarity of their neighbours Disambiguate the neighbours using the WordNet Similarity package MEANING: WP6 WSD K: Ranking Senses Automatically: MEANING: WP6 WSD K: Ranking Senses Automatically Ranking Experiments Ranking from different corpora: pipe Semcor: tobacco pipe BNC: underground pipe Ranking from domain specific corpora: tie BNC: necktie Reuters Finance: affiliation Reuters sport: draw Senseval-2 all nouns task: 65% precission, 60% recallMEANING: WP6 WSD J: Semantic Class Classifiers : MEANING: WP6 WSD J: Semantic Class Classifiers From Financial Times US officials has expected Basra to fall early Music sales will fall by up to 15% this year No missiles have fallen and ... (21) v.motion Motion+ (3) v.possession UnilateralGetting+ (46) v.motion Decreasing+ MEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <play_7, play_on_1> perform music on (a musical instrument); “He plays the flute” “Can you play on this old recorder?” <pipe_3> play one a pipe <drum_2> play the drums <trumpet_2> play or blow the trumpetMEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <play_7, play_on_1> perform music on (a musical_instrument_1); “He plays the flute_3” “Can you play on this old_recorder_4?” <pipe_3> play one a pipe_4 <drum_2> play the drums_1 <trumpet_2> play or blow the trumpet_1MEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <play_7, play_on_1> perform music on (a musical_instrument_1); “He plays the flute_3” “Can you play on this old recorder_4?” <pipe_3> play one a pipe_4 <drum_2> play the drums_1 <trumpet_2> play or blow the trumpet_1 <instrument_1> ROLE INSTRUMENTMEANING: WP6 WSD L: Disambiguating WN glosses: MEANING: WP6 WSD L: Disambiguating WN glosses <tocar_13> <play_7, play_on_1> perform music on (a musical_instrument_1); “He plays the flute_3” “Can you play on this old recorder_4?” <pipe_3> play one a pipe_4 <drum_2> play the drums_1 <tambor_2> <trumpet_2> play or blow the trumpet_1 <instrument_1> ROLE INSTRUMENT <instrumento_musical_1>MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP8 User validation: MEANING: WP8 User validation Irion Technologies (University of Sussex) To provide the project with industrial feedback Demonstration of MEANING by integrating the results in existing web products of Irion TwentyOne: CLIR system Adjust: Cross-Lingual classification system Pidgin: Cross-Lingual Q/A dialogue system EFE: Spanish News Agency Huge multilingual database of picture captionsMEANING: WP8 User validation: MEANING: WP8 User validation Baselines of Irion applications Cross-lingual retrieval system: English, Dutch, German, French, Spanish and Italian Document classification system Resources SemNet WordNet & WordNet Domains Linking between SemNet and WordNet Test collection Reuters News Archive 1996-1997, English CLIR: 100 ambiguous queries extracted from NPs and translated Document classification: 125 categoriesMEANING: WP8 User validation: MEANING: WP8 User validation CLIR Expansion with wordnet is only useful for synonymous queries in a monolingual setting Expansion with wordnet is always useful in cross-lingual setting Synonym selection is slightly better than concept selection (WSD based on SemNet and WordNet domains) Best approach: combining synonym-selection with concept selection Base-line setting without MEANING results Classification Best results: using disambiguated classifiers and classifiers expanded with most frequent synonyms. Recall is up to 80% and precision is a bit lower than NO expansion. However, coverage is now 100%.MEANING: Workplan: MEANING: Workplan WP0 Management WP9 Dissemination WP1 User Requirements WP3 Linguistic Processors WP5 ACQ WP6 WSD WP2 Design WP4 (Knowledge) Integration WP7 Evaluation & Assessment WP8 User Validation MEANING: WP9 Exploitation and dissemination: MEANING: WP9 Exploitation and dissemination IXA, UPV/EHU Journals, conferences (First year: 41 published papers) Cooperation SWAP – EDAMOK ESPERONTO BALKANET SENSEVAL-3 Coordinating several tasks: Basque, Catalan, Italian, Spanish During spring 2004: First release of the MCR! MEANING user group! Two workshops First year: San Sebastián (Basque country) Third year: Trento (Italy)Slide73: Donostia / San Sebastian – April 10-12 2003 Proceedings on the Web 8 invited speakers to give feedback (4 euro, 4 american) Walter Daelemans (WSD, ML) Fernando Gomez (Acquisition, semantic interpretation) Julio Gonzalo (WSD, CLIR) Anna Korhonen (Acquisition) Dekang Lin (Acquisition) Alexande Maedche (Acquisition, Semantic WEB) Rada Mihalcea (WSD) David Yarowsky (WSD) MEANING: WP9 First workshopMEANING: Conclusions and Results: MEANING: Conclusions and Results The good news: MEANING works! A Tool Set that using the semantic knowledge of MCR will obtain automatically from the web large collections of examples for each particular word sense. A Tool Set for enriching the MCR using the knowledge acquired automatically from the Web. A Tool Set for selecting accurately the senses of the open-class words for the languages involved in the project. Multilingual Central Repository to maintain compatibility between wordnets of different languages and versions, past and new. The results of MEANING will be public and free.MEANING: Semantic Interpretation: MEANING: Semantic InterpretationMEANING as a framework: MEANING as a framework The bad news: MEANING will focus only on the most promising research lines MEANING has a large amount of work to do! MEANING has only one more cycle! MEANING can be also seen as a common framework to acquire and port knowledge (information/data?) across languages, resources and tools useful for many large-scale Semantic Processing tasks Your collaborations and contributions are welcome! MEANING as a framework: MEANING as a framework Don’t waste your effort! MEANING can recycle your resources!Slide78: MEANING Developing Multilingual Web-scale Language Technologies IST-2001-34460 http://www.lsi.upc.es/~nlp/meaning/meaning.html German Rigau i Claramunt