Presentation Transcript
LIN6932 Topics in Computational Linguistics : LIN6932 Topics in Computational Linguistics Lecture 6: Grammar and Parsing (I)
February 15, 2007
Hana Filip
Outline for Grammar/Parsing : Outline for Grammar/Parsing Context-Free Grammars (CFG) and Constituency
Some common CFG phenomena for English
Sentence-level constructions
NP, PP, VP
Coordination
Subcategorization
Levels of complexity in grammars and automata:
The Chomsky hierarchy
Top-down and Bottom-up Parsing
Earley Parsing
Quick sketch of probabilistic parsing
Review : Review Parts of Speech
Basic syntactic/morphological categories that words belong to
Part of Speech tagging
Assigning parts of speech to all the words in a sentence
Syntax : Syntax Syntax: from Greek syntaxis, “setting out together, arrangement’
Refers to the way words are arranged together, and their relationships
Distinction:
Prescriptive grammar: how people ought to talk
Descriptive grammar: how they do talk
Goal of syntax is to model the knowledge that people unconsciously have about the grammar of their native language
Syntax : Syntax Why should you care?
Grammar checkers
Question answering
Information extraction
Machine translation
4 key ideas of syntax : 4 key ideas of syntax Constituency
Grammatical relations
Subcategorization
Lexical dependencies
Plus one part we won’t have time for:
Movement/long-distance dependency
Context-Free Grammars : Context-Free Grammars Capture constituency and ordering
Ordering:
What are the rules that govern the ordering of words and bigger units in the language?
Constituency:
How words group into units and how the various kinds of units behave
Constituency : Constituency Noun phrases (NPs)
Three parties from Brooklyn
A high-class spot such as Mindy’s
The Broadway coppers
They
Harry the Horse
The reason he comes into the Hot Box
How do we know these form a constituent?
They can all appear before a verb:
Three parties from Brooklyn arrive…
A high-class spot such as Mindy’s attracts…
The Broadway coppers love…
They sit…
Constituency (II) : Constituency (II) They can all appear before a verb:
Three parties from Brooklyn arrive…
A high-class spot such as Mindy’s attracts…
The Broadway coppers love…
They sit
But individual words can’t always appear before verbs:
*from arrive…
*as attracts…
*the is
*spot is…
Must be able to state generalizations like:
Noun phrases occur before verbs
Constituency (III) : Constituency (III) Preposing and postposing:
On September 17th, I’d like to fly from Atlanta to Denver
I’d like to fly on September 17th from Atlanta to Denver
I’d like to fly from Atlanta to Denver on September 17th.
But not:
*On September, I’d like to fly 17th from Atlanta to Denver
*On I’d like to fly September 17th from Atlanta to Denver
CFG Examples : CFG Examples S -> NP VP
NP -> Det NOMINAL
NOMINAL -> Noun
VP -> Verb
Det -> a
Noun -> flight
Verb -> left
CFGs : CFGs S -> NP VP
This says that there are units called S, NP, and VP in this language
That an S consists of an NP followed immediately by a VP
Doesn’t say that that’s the only kind of S
Nor does it say that this is the only place that NPs and VPs occur
Generativity : Generativity As with FSAs you can view these rules as either analysis or synthesis machines
Generate strings in the language
Reject strings not in the language
Impose structures (trees) on strings in the language
Derivations : Derivations A derivation is a sequence of rules applied to a string that accounts for that string
Covers all the elements in the string
Covers only the elements in the string
Derivations as Trees : Derivations as Trees
CFGs more formally : CFGs more formally A context-free grammar has 4 parameters (“is a 4-tuple”)
A set of non-terminal symbols (“variables”) N
A set of terminal symbols (disjoint from N)
A set of productions P, each of the form
A ->
Where A is a non-terminal and is a string of symbols from the infinite set of strings ( N)*
A designated start symbol S
Defining a CF language via derivation : Defining a CF language via derivation A string A derives a string B if
A can be rewritten as B via some series of rule applications
More formally:
If A -> is a production of P
and are any strings in the set ( N)*
Then we say that
A directly derives
Or A
Derivation is a generalization of direct derivation
Let 1, 2, … m be strings in ( N)*, m>= 1, s.t.
1 2, 2 3… m-1 m
We say that 1derives m or 1* m
We then formally define language LG generated by grammar G
As set of strings composed of terminal symbols derived from S
LG = {w | w is in * and S * w}
Parsing : Parsing Parsing is the process of taking a string and a grammar and assigning correct trees to input strings
Correct tree: a tree that covers all and only the elements of the input and has an S node at the top
Context? : Context? The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language.
All it means is that the non-terminal on the left-hand side of a rule is out there all by itself (free of context)
A -> B C
Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found
Key Constituents (English) : Key Constituents (English) Sentences
Noun phrases
Verb phrases
Prepositional phrases
Sentence-Types : Sentence-Types Declaratives: A plane left
S -> NP VP
Imperatives: Leave!
S -> VP
Yes-No Questions: Did the plane leave?
S -> Aux NP VP
WH Questions: When did the plane leave?
S -> WH Aux NP VP
NPs : NPs NP -> Pronoun
I came, you saw it, they conquered
NP -> Proper-Noun
Los Angeles is west of Texas
NP -> Det Noun
The president
NP -> Nominal
Nominal -> Noun Noun
A morning flight to Denver
PPs : PPs PP -> Preposition NP
From LA
To Boston
On Tuesday
With lunch
Recursion : Recursion We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly).
NP -> NP PP [[The flight] [to Boston]]
VP -> VP PP [[departed Miami] [at noon]]
Recursion : Recursion Of course, this is what makes syntax interesting
flights from Denver
Flights from Denver to Miami
Flights from Denver to Miami in February
Flights from Denver to Miami in February on a Friday
Flights from Denver to Miami in February on a Friday under $300
Flights from Denver to Miami in February on a Friday under $300 with lunch
Etc.
Recursion : Recursion Of course, this is what makes syntax interesting
[[flights] [from Denver]]
[[[Flights] [from Denver]] [to Miami]]
[[[[Flights] [from Denver]] [to Miami]] [in February]]
[[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a Friday]]
Etc.
Implications of recursion and context-freeness : Implications of recursion and context-freeness If you have a rule like
VP -> V NP
It only cares that the thing after the verb is an NP. It doesn’t have to know about the internal affairs of that NP
The Point : The Point VP -> V NP
I hate
flights from Denver
Flights from Denver to Miami
Flights from Denver to Miami in February
Flights from Denver to Miami in February on a Friday
Flights from Denver to Miami in February on a Friday under $300
Flights from Denver to Miami in February on a Friday under $300 with lunch
Bracketed Notation : Bracketed Notation [S [NP [PRO I] [VP [V prefer [NP [NP [Det a] [Nom [N morning] [N flight]]]]
Coordination Constructions : Coordination Constructions S -> S and S
John went to NY and Mary followed him
NP -> NP and NP
VP -> VP and VP
…
In fact the right rule for English is
X -> X and X
Other Syntactic stuff : Other Syntactic stuff Grammatical Relations
Subject
I booked a flight to New York
The flight was booked by my agent.
Object
I booked a flight to New York
Complement
I said that I wanted to leave
Problems : Problems Agreement
Subcategorization
Movement (for want of a better term)
Agreement : Agreement This dog
Those dogs
This dog eats
Those dogs eat *This dogs
*Those dog
*This dog eat
*Those dogs eats
Possible CFG Solution : Possible CFG Solution S -> NP VP
NP -> Det Nominal
VP -> V NP
… SgS -> SgNP SgVP
PlS -> PlNp PlVP
SgNP -> SgDet SgNom
PlNP -> PlDet PlNom
PlVP -> PlV NP
SgVP ->SgV Np
…
CFG Solution for Agreement : CFG Solution for Agreement It works and stays within the power of CFGs
But its ugly (loss of generalization)
And it doesn’t scale all that well
Subcategorization : Subcategorization Verbs have preferences for the number and syntactic kinds of constituents they co-occur with
Sneeze: John sneezed intransitive verb
Find: Please find [a flight to NY]NP transitive verb
Give: Give [me]NP[a cheaper fare]NP ditransitive verb
Help: Can you help [me]NP[with a flight]PP DO IO/Oblique
Prefer: I prefer [to leave earlier]TO-VP VP complement
Said: You said [United has a flight]S sentential complement
…
Subcategorization : Subcategorization But not:
*John sneezed the book
*I prefer United has a flight
*Give with a flight
Subcategorization expresses the constraints that a lexical predicate places on the number and syntactic kinds of arguments it wants to take (occur with)
So far we have only considerate one type of lexical predicate: verb
Nouns, adjectives and prepositions also take arguments
So? : So? So the various CFG rules for VPs overgenerate.
They permit the presence of strings containing verbs and arguments that don’t go together
For example
VP -> V NP
predicts that
Sneezed the book is a VP since “sneeze” is a verb and
“the book” is a valid NP
Forward Pointer : Forward Pointer It turns out that verb subcategorization facts will provide a key element for semantic analysis (determining who did what to who in an event).
Possible CFG Solution : Possible CFG Solution VP -> V
VP -> V NP
VP -> V NP PP
… VP -> IntransV
VP -> TransV NP
VP -> TransVwPP NP PP
…
Movement : Movement Core example
My travel agent booked the flight
Movement : Movement Core example
[[My travel agent]NP [booked [the flight]NP]VP]S
I.e. “book” is a straightforward transitive verb. It expects a single NP arg as the subject and a single NP arg within the VP
Movement - Long Distance Dependencies : Movement - Long Distance Dependencies What about?
Which flight do you want me to have the travel agent book __?
The direct object argument to “book” isn’t appearing in the right place. It is a long way from where it originally appeared
And note that it is separated from its verb by 2 other verbs.
CFGs: a summary : CFGs: a summary CFGs appear to be just about what we need to account for a lot of basic syntactic structure in English
But there are problems
That can be dealt with adequately, although not elegantly, by staying within the CFG framework.
There are simpler, more elegant, solutions that take us out of the CFG framework (beyond its formal power)
Syntactic theories: HPSG, LFG, CCG, Minimalism, etc
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.