Introduction to Computational Linguistics: Introduction to Computational Linguistics Eleni Miltsakaki
AUTH
Fall 2005-Lecture 9
What’s the plan for today?: What’s the plan for today? Discourse models cont’d
DLTAG: Lexicalized Tree Adjoining Grammar for Discourse
A DLTAG-based system for parsing discourse
The Penn Discourse Treebank
http://www.cis.upenn.edu/~pdtb
Basic references : Basic references
Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse (1998),
B. Webber and A. Joshi
What are Little Texts Made of? A Structural Presuppositional Account Using Lexicalized TAG
B. Webber, A. Joshi, A. Knott, M. Stone
DLTAG System: Discourse Parsing with a Lexicalized Tree-Adjoining Grammar (2001)
K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi and B. Webber
The Penn Discourse Treebank (2004)
E. Miltsakaki, R. Prasad, A. Joshi and B. Webber
Motivation and basics of the DLTAG approach: Motivation and basics of the DLTAG approach Discourse meaning: more than its parts
Compositional vs non-compositional aspects of discourse meaning
This distinction is often conflated in most of related work
Smooth transition from sentence level structure to discourse level structure
The DLTAG view of discourse connectives: The DLTAG view of discourse connectives Discourse connectives are treated as higher level predicates taking clausal arguments
Basic types of discourse connectives:
Structural
Subordinate conjunctions (when, although, because etc)
Coordinate conjunctions (and, but, or)
“Anaphoric”
Adverbials (however, therefore, as a result, etc)
Elements of LTAG: Elements of LTAG Initial and auxiliary trees
Initial: Encode predicate-argument dependencies
Auxiliary: recursive, modify elementary trees
anchors of elementary trees are semantic predicates
substitution and adjunction
D-LTAG is similar
anchors of elementary trees are semantic features which can be lexicalized with discourse connectives
D-LTAG Structures and Semantics: D-LTAG Structures and Semantics
Initial Trees
(a) John failed his exam because he was lazy
Slide8: Auxiliary trees
(a) Mary saw John but she decided to ignore him.
(b) Mary saw John. She decided to ignore him.
1. On the one hand, John loves Barolo.
2. So he ordered three cases of the ‘97.
3. On the other hand, he had to cancel the order
4. because he then found that he was broke.
Phenomena that DLTAG captures: Phenomena that DLTAG captures Arguments of a coherence relation can be stretched “long distance”
Multiple discourse connectives can appear in a single sentence or even a single clause
Coherence relations can vary in how and when they are realized lexically
Stretching arguments: Stretching arguments
On the one hand, John loves Barolo.
So he ordered three cases of the ’97.
On the other hand, he had to cancel the order
Because he then found that he was broke.
Non-Compositional Semantics: Non-Compositional Semantics Non-defeasible vs defeasible causal connection
The City Council refused the women a permit because they feared violence.
The City Council refused the women a permit. They feared violence.
Presuppositional semantics (Knott et al, 1996):
Defeasible rule: When people go to the zoo, they leave their work behind.
(c) John went to the zoo. However, he took his cell phone with him.
DLTAG system for parsing discourse: DLTAG system for parsing discourse Theoretical framework: DLTAG
Main system components:
Sentence level parsing
Tree extractor
Tree mapper
Discourse input representation
Discourse level parsing
Slide13: Parser (Sarkar, 2000)
XTAG grammar
One derivation per sentence
E.g. Mary was amazed
Slide14: Tree extractor:identifying discourse units
(a) While she was eating lunch she saw a dog
Slide15: Tree mapper
From sentence level structure to discourse structure
Slide16: Discourse input representation
System Architecture: System Architecture
Example Discourse: Example Discourse
(a) Mary was amazed.
(b) While she was eating lunch, she saw a dog.
(c) She’d seen a lot of dogs, but this one was amazing.
(d) The dog barked and Mary smiled.
(e) Then, she gave it a sandwich
Slide19: Derived and Derivation trees
Corpus example: Corpus example The pilots could play hardball by noting they were crucial to any sale or restructuring because they can refuse to fly the airplanes. If they were to insist on a low bid of, say $200 a share the board mightn’t be able to obtain a higher offer from the bidders because banks might hesitate to finance a transaction the pilots oppose. Also, because UAL chairman Stephen Wolf and other UAL executives have joined the pilots’ bid, the board might be able to exclude him for its deliberations in order to be fair to other bidders
(Wall Street Journal) LEXTRACT (Xia et al 2000)
Corpus: Derivation Tree: Corpus: Derivation Tree
Slide22: Derived Tree
Summary points of the DLTAG system: Summary points of the DLTAG system Implementation of D-LTAG
use LTAG grammar to parse each clause
use the same LTAG-based parser both at the sentence level and discourse level
build the semantics compositionally from the sentence to the discourse level
factor away non-compositional semantic contributions
In the output representation
The semantics of the connectives form only part of the compositional derivation of discourse relations
Discourse connectives are NOT viewed as names of relations
The Penn Discourse Treebank: The Penn Discourse Treebank
Annotation of discourse connective and their arguments
Large scale: annotation of the entire Penn Treebank (1 million words)
Merits of the PDTB: Merits of the PDTB Discourse relations are lexically grounded
Exposing a clearly defined level of discourse structure
Enabling annotations with high reliability
Building on existing syntactic and semantic layers of annotation (Treebank, PropBank)
Annotations independent of the DLTAG (or any other) framework
Project description: Project description Annotation of connectives in the Penn Treebank
30K tokens of connectives
20K explicit conns + 10K implicit conns
Annotation of ARG1 and ARG2 of conns
Ex. Mary left early because she was sick.
ARG1: Mary left early
CONN: because
ARG2: she was sick
Four annotators at the beginning, then two
To come: Semantic role labels for ARG1 and ARG2
Connectives : Connectives Subordinate conjunctions
(when, because, although, etc.)
ARG1 – ARG2
(1) Because [the drought reduced U.S. stockpiles], [they have more than enough storage space for their new crop], and that permits them to wait for prices to rise.
Connectives : Connectives Coordinate conjunctions
(and, but, or, etc.)
ARG1 – ARG2
(2) [William Gates and Paul Allen in 1975 developed an early language-housekeeper system for PCs], and [Gates became an industry billionaire six years after IBP adapted one of these versions in 1981].
Connectives : Connectives Adverbials
(therefore, then, as a result, etc.)
ARG1 – ARG2
(3) For years, costume jewelry makers fought a losing battle. Jewelry displays in department stores were often cluttered and uninspired. And the merchandise was, well, fake. As a result, marketers of faux gems steadily lost space in department stores to more fashionable rivals -- cosmetics makers.
Connectives : Connectives Implicit
(annotators provide named expression for implicit connective)
ARG1 – ARG2
(4) …[The $6 billion that some 40 companies are looking to raise in the year ending March 31 compares with only $2.7 billion raised on the capital market in the previous fiscal year]. IMPLICIT-(In contrast) [In fiscal 1984 before Mr. Gandhi came to power, only $810 million was raised].
Annotation guidelines: Annotation guidelines http://www.cis.upenn.edu/~pdtb
What counts as a connective?
Including distinction between clausal adverbials and discourse adverbials
What counts as an argument?
Minimally a clause
How far does the argument extend?
Including distinction between arguments (ARG1 and ARG2) and supplements to arguments (SUP1 and SUP2 respectively)
Interesting comparison with ProbBank annotations of verbs
WordFreak (T. Morton & J. Lacivita) : WordFreak (T. Morton & J. Lacivita)
Comparison with the RST corpus: Comparison with the RST corpus RST-corpus
Higher-level annot.
Abstract discourse relations
Doesn’t contain the basis of the relations
Low inter-annotator agreement
Small scale (385 wsj files)
No explicit links to Treebank PDTB
Basic level annot.
Connectives+args
Relations anchored to lexical items
High inter-annotator agreement
Large scale(Treebank: 2,500 wsj files)
Links to Treebank and PropBank
Interesting to see how RST labels relate to semantic role assignment in PDTB
Preliminary experiments: Preliminary experiments 10 explicit connectives (2717 tokens)
Therefore, as a result, instead, otherwise, nevertheless, because, although, even though, when, so that
386 tokens of implicit connectives
2 annotators
Inter-annotator agreement (1) : Inter-annotator agreement (1) Measure by token (ARG1+ARG2)
ARG1 and ARG2 counted together
Total number of connective ARG1/ARG2 tokens = 2717
Agreement = 82.8%
Subord. Conj. = 86%
Adverbials = 57%
Agreement per connective (1): Agreement per connective (1)
Inter-annotator agreement (2) : Inter-annotator agreement (2) Measure by ARG (ARG1, ARG2)
Check agreement for ARG1 and ARG2
Total number of argument tokens = 5434 (2717 ARG1 + 2717 ARG2)
Agreement = 90.2%
ARG1 = 86.3%
ARG2 = 94.1%
Subord. Conj. =92.4%
Adverbial: =71.8%
Agreement per connective (2) : Agreement per connective (2)
Analysis of disagreement: Analysis of disagreement Majority of disagreement due to ‘partial overlap’: 79%
(5) It was forced into liquidation before trial when investors yanked their funds after the government demanded a huge pre-trial asset forfeiture.
Reanalysis of agreement: Reanalysis of agreement Inter-annotator agreement counting in partial overlap
94.5%
Dealing with extent of the argument
Revise guidelines
BUT: Some disagreement will persist
Comparing predicates : Comparing predicates PropBank – sentence level predicates (verbs)
Arity of arguments: Hard
Extent of the argument: Easy
Penn Discourse Treebank – discourse predicates
Arity of arguments: Easy
Extent of the argument: Hard
Summary points for PDTB: Summary points for PDTB http://www.cis.upenn.edu/~pdtb
The Penn Discourse Treebank
Large scale discourse annotation
Basic level of annotation: connectives and their arguments
Links to Penn Treebank and Penn PropBank (rich substrate for extracting syntactic and semantic features)
Expected completion November 2005
Inter-annotator agreement
Most conservative: 82.8%
Relaxing exact match: 94.5%