logging in or signing up SALSA RTE Burchardt Frank Dabby Download Post to : URL : Related Presentations : Let's Connect Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Copy embed code: Embed: Flash iPad Dynamic Copy Does not support media & animations Automatically changes to Flash or non-Flash embed WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 183 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 01, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Approximating Textual Entailment with LFG and FrameNet Frames: Approximating Textual Entailment with LFG and FrameNet Frames SALSA Workshop, Saarbrücken, June 27-28, 2006 Multilingual semantic annotation: theory and applications Aljoscha Burchardt and Anette Frank Computational Linguistics Department Language Technology Lab Saarland University DFKI GmbH Saarbrücken Saarbrücken Overview: Overview The PASCAL Recognizing Textual Entailment task (RTE): What is it, and how to approach it? The SALSA RTE System: A baseline system for approximating Textual Entailment Building on LFG-based syntactic analysis and frame semantics Computing structural and semantic overlap as an approximation of textual entailment in a learning architecture Open architecture for future extensions towards deeper modelling Linguistic analysis: LFG and FrameNet frames Approximating Textual Entailment Computing a match graph for structural and semantic overlap Feature extraction and machine learning Results of this year’s RTE task Discussion, error analysis and perspectives ConclusionThe PASCAL RTE Task: What is it?: The PASCAL RTE Task: What is it? A recently established Challenge for the NLP/AI community Testing a system‘s capacity to recognize „Textual Entailment“ „Realistic“, open-domain data set drawn from system outputs in NLP applications: IR, IE, QA, SUM Controlled set-up: balanced training and test sets 800/800 text-hypothesis pairs Taking a look at the data: Taking a look at the data Fine-grained linguistic analysis T: Oscar-winning actor Nicolas Cage‘s new son and Superman have sth. in common ... H: Nicolas Cage‘s new son was awarded an Oscar. — No (IE) Lexical semantics and paraphrases (nominalisation, synonymy) T: [o]n December 10th 1936 King Edward VIII gave up his right to the British throne. H: King Edward VIII abdicated on the 10th of December, 1936. — Yes (QA) Inference and world knowledge T: Olson, 62, previously worked as a partner at Ernst & Young LLP, before joining the Fed board in 2001, to serve a term ending in 2010. H: Olson is a member of the Fed board. — Yes (IE) Modality T: U.S. Secretary of State Condoleezza Rice said Thursday that North Korea should return to nuclear disarmament talks and ... H: North Korea says it will rejoin nuclear talks. — No (SUM) Temporal and local restrictions (monotonicity) T: In most Pacific countries there are very few women in parliament. H: Women are poorly represented in parliament. — Yes (!) (IR)Textual Entailment: Textual Entailment „We say that T entails H if the meaning of H can be inferred from the meaning of T, as would typically be interpreted by people. This somewhat informal definition is based on (and assumes) common human understanding of language as well as common background knowledge.“ „Cases in which inference is very probable (but not completely certain) are still judged True.“ (Dagan, Glickmann, Magnini, RTE 2005 Workshop Proceedings) “Circumscribing Textual Entailment”? See discussions in: Zaenen, Karttunen and Crouch (2005), Manning (2006), Crouch, Karttunen and Zaenen (2006).A Challenge, ... in fact: A Challenge, ... in fact T: Hundreds of divers and treasure hunters, including the Duke of Argyll, have risked their lives in the dangerous waters of the Isle of Mull trying to discover the reputed 30,000,000 pounds in Gold carried by this vessel--the target of the most enduring treasure hunt in British history. H: Shipwreck salvaging was attempted. (Yes, IR) T: The 26-member International Energy Agency said, Friday, that member countries would release oil to help relieve the U.S. fuel crisis caused by Hurricane Katrina. H: Responding to a plea from the International Energy Agency for member countries to release reserves, Canada is prepared to help. (No, SUM) Approximating Textual Entailment: Approximating Textual Entailment How to reconcile obvious complexity and required depth? Parsing complexity Semantic analysis Argument structure, anaphora, lexical meaning, semantic and discourse relations, presupposition, ... Inferences based on linguistic meaning and world knowledge Statistical/ML approximation of Textual Entailment Based on state-of-the-art syntactic and shallow semantic analysis Measuring structural and semantic overlap With possibilities for extensions towards deeper modelling Inference on partial structures (lexical entailment) Targeted modelling of specific aspects, e.g. modality contexts … A baseline system for approximating Textual Entailment: A baseline system for approximating Textual Entailment Fine-grained LFG-based syntactic analysis English LFG grammar (Riezler et al. 2002) broad-coverage with high-quality probabilistic disambiguation Frame Semantics Coarse-grained lexical-semantic classification of predicates with role-based argument structure encoding Extended semantic representations: WordNet senses, SUMO concepts Computing structural and semantic overlap Hypothesis: high/low ratio of H/T overlap => entailment: yes/noA baseline system for approximating Textual Entailment: A baseline system for approximating Textual Entailment Fine-grained LFG-based syntactic analysis English LFG grammar (Riezler et al. 2002) broad-coverage with high-quality probabilistic disambiguation Frame Semantics Coarse-grained lexical-semantic classification of predicates with role-based argument structure encoding Extended semantic representations: WordNet senses, SUMO concepts Computing structural and semantic overlap A learning problem: measures of overlap, weighted entailment decision The SALSA RTE System: The SALSA RTE System XLE parsing: LFG f-structure f-structure w/ (extended) frame- semantic projection WordNet-based WSD: WordNet & SUMO Fred/Detour + Rosy: frames & roles Linguistic analysis components and Integration Using XLE term rewriting system (Crouch 2005)Linguistic ComponentsLFG analysis combined with FrameNet frames: Linguistic Components LFG analysis combined with FrameNet frames Deep syntactic LFG analysis Broad-coverage grammar with probabilistic disambiguation Fine-grained grammatical function analysis with integrated NER Performance on RTE-II development and test set: Coverage: 99% ( 86% full parses, 13% partial parses) On RTE H/T pairs: 76% fully analysed pairs – 2% single analysis only Frame semantic analysis Focusing on lexical semantic classes and role-based argument structure Disregarding aspects of „deep“ semantics: modality, quantification, ... Normalisation over syntactic and lexical alternations (diatheses, lexicalisation, PoS) Linguistic ComponentsFrame and role assignment: Linguistic Components Frame and role assignment Shalmaneser (Erk & Pado, 2006) Shallow semantic parser for FrameNet frame and role assignment Fred: statistical frame assignment WSD system for predicates, in terms of frames Rosy: semantic role assignment Argument recognition and argument labelling Using state-of-the-art features from robust syntactic parsing Detour (to FrameNet via WordNet) (Burchardt et al., 2005) Aim: overcome lexical gaps in FrameNet A rule-based frame assignment system that takes a “detour to FrameNet via WordNet” Determine similarity of “unknown LUs” to existing frames (their LUs) based on WordNet-similarity measuresSlide13: Linguistic Components Frame and role assignment Fred & Rosy Fred, Detour & Rosy Slide14: Linguistic Components Frame and role assignment Fred & Detour – different sense assignments (FN coverage) Linguistic ComponentsIntegration and extended semantics projection: Linguistic Components Integration and extended semantics projection Porting frame and role assignments to LFG f-structure Defining a frame semantics projection using head lemmata as interface layer (accounts for parser discrepancies) Using XLE rewrite system (Crouch 2005) Head-indexed frame & role assignmentsLinguistic ComponentsIntegration and extended semantics projection: Linguistic Components Integration and extended semantics projection Rule-based extensions of LFG-frame structures Frames corresponding to LFG NE classes Locations, companies, dates, … Extra-thematic roles, based on LFG adjunct classes, etc. Time, Reason, Location, Concessive, … +adjunct(Z,Y), ntype_sem(Y,time) ==> s::(Z,SemZ), s::(Y,SemY), time(SemZ,SemY). Extended semantics projection: WordNet and SUMO classes WSD: Banerjee & Pedersen, 2003 WordNet – SUMO/MILO mapping: Niles and Pease (20019Linguistic ComponentsIntegration and extended semantics projection: Linguistic Components Integration and extended semantics projection Normalisations of syntactic structure Passive: Mapping SUBJ and OBJ to dsubj and dobj argument slots Coindexing relative pronouns and relativised head, appositives, etc. Heuristic rules collect antecedent candidate sets for pronominals FEF: Frame-Exchange-Format (Partial) Visualisation of extended syntactic-semantic graph structures in FEFViewer (Alexander Koller, Coli Saarbrücken)A walk-through-example from RTE 2006: A walk-through-example from RTE 2006 Pair 716 Text In 1983, Aki Kaurismäki directed his first full-time feature. Hypothesis Aki Kaurismäki directed a film.LFG F-Structuresin XLE graphical display: LFG F-Structures in XLE graphical display Automatic Frame Annotation for Textin SALTO Viewer: Automatic Frame Annotation for Text in SALTO Viewer Collins ParseAutomatic Frame Annotation for Hypothesis: Automatic Frame Annotation for Hypothesis 716_h: Aki Karusmäki directed a film.LFG and Frames for Hypothesisin FEFViewer: LFG and Frames for Hypothesis in FEFViewer Aki Kaurismäki directed a film.The SALSA RTE System: The SALSA RTE System f-structure w/ frames & concepts f-structure w/ frames & concepts text hypothesis text-hypothesis-match graph matching nodes and edges different match types (similarity types) extensions for deeper modelling (modality, lexical entailment) Feature extraction Model training & classification Recognizing Textual Entailment: Graph matching & Statistical approximation XLE parsing: LFG f-structure f-structure w/ (extended) frame- semantic projection WordNet-based WSD: WordNet & SUMO Fred/Detour + Rosy: frames & roles Linguistic analysis components and Integration Hypothesis-Text-Match GraphsComputing structural and semantic overlap: Computing structural and semantic overlap Computing a “match graph” from text and hypothesis graphs Matches are established by different aspects and degrees of “similarity” Approximating textual entailment High/low overlap ratio of hypothesis and match graph => entailment: yes/no Hypothesis-Text-Match Graphs Computing structural and semantic overlapHypothesis-Text-Match Graphs Different matching strategies: Hypothesis-Text-Match Graphs Different matching strategies Match graph/Text overlap: Ratio of matched material and non-matched material in Text Match graph/Hypothesis overlap: Ratio of the matched material and non-matched material in Hypothesis T: Leo Fender invented the first electric guitar and the electric bass guitar. H: Leo Fender invented the first electric guitar. I: 7/12 = 58% – II: 7/7 = 100% hypothesis Hypothesis-Text-Match GraphsComputing structural and semantic overlap: Graph matching using XLE rewrite system Defining different types of match conditions on t- and h-graph, triggering new nodes and edges in m-graph, with match-type info Matching algorithm tied to rewrite-logic Locally defined matches (no graph traversal) Starting with (multiple) node matches Edge matches: restricted to connect matched nodes Hypothesis-Text-Match Graphs Computing structural and semantic overlap frame(h:x1,killing) text-hypothesis ==> text-hypothesis-match frame(t:y1,killing) frame(m:(z1,x1,y1), killing), match_type(m:(z1,x1,y1),killing,frame) ==> Rewrite rule +frame(h:X1,Frame), +frame(t:Y1,Frame) ==> frame(m:(Z1,X1,Y1),Frame), match_type(m:(Z1,X1,Y1),Frame,frame).Hypothesis-Text-Match GraphsComputing structural and semantic overlap: Hypothesis-Text-Match Graphs Computing structural and semantic overlap Aspects of similarity Syntax-based (i.e. lexical and structural) similarity Identical PREDs and attribute values trigger node matches Identical ATTRIBUTES (GF, morph. features) trigger edge matches Semantics-based similarity Identical FRAMES and CONCEPTS trigger node matches Identical ROLES trigger edge matches Match graph consists of identical partial syntactic & semantic graphs Degrees of similarity (strict vs. weak matching) Non-identical, but “structurally related” PREDs coreferentially related (relative clauses, appositives, pronominals) Non-identical, but “semantically related” PREDs (WN-related, path<3) Non-identical, but “semantically related” FRAMES (FN-/Detour-related) Match graph establishes overlapping partial graphs (marked by match types) Slide28: t: In 1983, Aki Kaurismäki directed his first full-time feature.Approximating Textual Entailment Extensions for deeper modelling: Modality: Approximating Textual Entailment Extensions for deeper modelling: Modality Detecting indicators of inconsistent modality types T: A pet must have rabies protection confirmed by a blood test. H: A case of rabies was confirmed. Marking modal contexts in text and hypothesis 5 modality types: conditional, future, diamond, box, negation Handling inconsistent modality types in matching process Introducing negatively marked match nodes Blocking embedded structures for similarity-based matches Thus, reducing the size of the match graph Approximating Textual Entailment Extensions for deeper modelling: Lexical Entailments: Bridging partial non-matching text and hypothesis pairs T: Olson, 62, previously worked as a partner at Ernst & Young LLP, as a Minnesota bank president and as a congressional aide, before joining the Fed board in 2001, to serve a term ending in 2010. H: Olsen is a member of the Fed board. Lexically induced inferences, defined as rewrite rules on h/t/m graphs Similar: non-lexical heuristic inferences Appositions: prime minister X X is prime minister Possessive constructions: X’s Y the Y of X Approximating Textual Entailment Extensions for deeper modelling: Lexical Entailments t: (X1) joins X2 h: (Y1) member-of Y2 m:(Z2,Y2,X2) => match_type(heuristic_entailment_match). Approximating Textual EntailmentMachine learning: Approximating Textual Entailment Machine learning Feature selection with WEKA Classifiers Many learners select intuitively important features, but also “idiosyncratic” ones Selected learners and models Model 1 Simple Conjunctive Rule classifier: generated a single rule Medium/high threshold on pred/frame matches as criterion for rejection High degree of frame similarity /w medium predicate similarity models entailment Model 2 Meta-classifier LogitBoost (additive logistic regression) Features (1.-4.) used in iteration; final feature set: 1.,2.,4.Results in RTE-II: Results in RTE-II SALSA RTE system results Both models score SUM > IR > QA > IE Refined model better on QA – simple model better on SUM Overall RTE-II results Average accuracy: 60% (Median: 59%) Shallow overlap measures vary considerably between data sets, whereas “deeper” approaches remain more stable Tendency towards deeper, knowledge-rich methodsDiscussion of ResultsTrue positives: Discussion of Results True positives High ratio of matching predicates, frames, and f-structure Typical phenomena Non-identical predicates compensated by matching frames (626) Missing frame assignments compensated by WN relatedness die – pass away (wn-related, 103) Active-passive diathesis resolved by f-structure normalisation (129) Relative overlap measures also work for longer hypothesesDiscussion of ResultsTrue negatives: Discussion of Results True negatives Modal context marking seems to be effective 27% of all true negatives involved modality mismatches, while only 11.9% of all sentences involve marked modal contexts Future plans Extend to lexically induced modality/facticity indicators Testing for non-monotonicity contexts Error analysisFalse positives: Error analysis False positives Typical cases Semantic dissimilarity Non-matching predicates within larger match graphs, which are in fact semantically dissimilar Structural distance Matching nodes within a match graph correspond to far distant nodes in the text graph – compared to neighbouring nodes in the match graphSlide36: T:Some 420 people have been hanged in Singapore since 1991, mostly for drug trafficking, an Amnesty International 2004 report said. That gives the country of 4.4 million people the highest execution rate in the world relative to population. H:4.4 million people were executed in Singapore. (198) – False positive Error analysis False positives Unconnected nodes matched with distant nodes in text grapError analysisFalse positives: Error analysis False positives Graph matching process Not a top-down process Starts by relating any nodes, and builds growing clusters by finding matching edges This allows criss-cross matching of nodes in the match graph Introduce weighted edges that reflect the relative distance of pairs of match nodes in text and hypothesis (path distance) Error analysisFalse positives: Error analysis False positives Graph matching process Not a top-down process Starts by relating any nodes, and builds growing clusters by finding matching edges This allows criss-cross matching of nodes in the match graph text hypothesis Introduce weighted edges that reflect the relative distance of pairs of match nodes in text and hypothesis (path distance) Conclusions: Conclusions A medium-depth approach: Approximating Textual Entailment Lexical and syntactic overlap, semantic similarity (WordNet) Frame semantics: lexical semantic classes & argument structure Flexible graph matching method with extensions to deeper processing Modality contexts, lexical inferences Perspectives for future extensions Engineering and fine-tuning Combination with shallow (and deeper) methods in voting architecture Frame and role assignment Sense discrimination: outlier detection (Erk, 2006) Coverage: integration with other resources (VerbNet, NomBank) Modelling dissimilarity Semantic distance measures and distance-weighted graph edges Acquisition of lexical modality indicators and (lexical) entailment rulesReferences: References RTE Proceedings RTE Challenge Homepage: http://www.pascal-network.org/Challenges/RTE2 I. Dagan, O. Glickman, and B. Magnini(2005): „The PASCAL recognising textual entailment challenge“. In Proceedings of the RTE-1 Workshop, Southampton, UK. B. Magnini and I. Dagan, editors (2006). Proceedings of the Second PASCAL Recognising Textual Entailment Challenge, Venice, Italy. Electronic proceedings and slides: http://ir-srv.cs.biu.ac.il:64080/RTE2/proceedings/ Discussion about RTE Task: Zaenen, Karttunen and Crouch, 2005: “Local Textual Inference: can it be defined or circumscribed?”, In ACL 2005 Workshop on Empirical Modelling of Semantic Equivalence and Entailment, Ann Arbor, Michigan. Manning (2006): “Local Textual Inference: It's hard to circumscribe, but you know it when you see it - and NLP needs it”, MS. Stanford University. Crouch, Karttunen and Zaenen (2006): “Circumscribing is not excluding: A reply to Manning”, MS. Palo Alto Research Center. All papers: http://www2.parc.com/istl/members/zaenen/ References: References A. Burchardt and A. Frank (2006): “Approximating Textual Entailment with LFG and FrameNet Frames” In Proceedings of the Second Recognising Textual Entailment Workshop, Venice, Italy. http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications K. Erk and S. Pado (2006): “Shalmaneser - a flexible toolbox for semantic role assignment.” In Proceedings of LREC-06, Genoa. http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications A. Burchardt, K. Erk, and A. Frank (2005): “A WordNet Detour to FrameNet.” In Proceedings of the GLDV 2005 Workshop GermaNet II, Bonn. http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications R. Crouch (2005). “Packed Rewriting for Mapping Semantics to KR.” In Proceedings of the Sixth International Workshop on Computational Semantics, Tilburg. http://www2.parc.com/istl/groups/nltt/papers/iwcs05_crouch.pdf Approximating Textual EntailmentSimilarity/Entailment measures and feature extraction: Approximating Textual Entailment Similarity/Entailment measures and feature extractionError analysisSparse features: Error analysis Sparse features Feature set High-frequency features that measure similarity Few, and low-frequency features that model dissimilarity Bias towards similarity 29,5% false positives 12,75% false negatives Plans for further development Introducing distance measures (semantic and structural) Getting a grip on remaining differences, i.e. non-matched edges between matching clusters You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.