LSA2005 Lecture4

Category: Education

Presentation Description

No description available.


Presentation Transcript

Finite-State Methods in Natural Language Processing: 

Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute July 27, 2005

Course Outline: 

Course Outline July 18: Intro to computational morphology XFST Readings Lauri Karttunen, “Finite-State Constraints”, The Last Phonological Rule. J. Goldsmith (ed.), pages 173-194, University of Chicago Press, 1993. Karttunen and Beesley, “25 Years of Finite-State Morphology” Chapter 1: “Gentle Introduction” (B&K) July 20: Regular expressions More on XFST Readings Chapter 2: “Systematic Introduction” Chapter 3: “The XFST interface”


July 25 More on XFST: Date Parser Concatenative morphotactics: The LEXC language Readings Chapter 4. “The LEXC Language” July 27 Constraining non-local dependencies: Flag Diacritics Complex morphotactics and alternations: Finnish Numerals Readings Chapter 5. “Flag Diacritics” ”


August 1 Non-concatenative morphotactics Reduplication, interdigitation Realizational morphology Readings Chapter 8. “Non-Concatenative Morphotactics” Gregory T. Stump. Inflectional Morphology. A Theory of Paradigm Structure. Cambridge U. Press. 2001. (An excerpt) Lauri Karttunen, “Computing with Realizational Morphology”, Lecture Notes in Computer Science, Volume 2588, Alexander Gelbukh (ed.), 205-216, Springer Verlag. 2003. August 3 Optimality theory Readings Paul Kiparsky “Finnish Noun Inflection” Generative Approaches to Finnic and Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161, CSLI Publications, 2003. Nine Elenbaas and René Kager. "Ternary rhythm and the lapse constraint". Phonology 16. 273-329.

Syllabification revisited: 

Syllabification revisited define MarkNonDiphthongs [ [. .] -> "." || [HighV | MidV] _ LowV, # i.a, e.a LowV _ MidV, # a.e i _ [MidV - e], # i.o, i.ä u _ [MidV - o], # u.e y _ [MidV - ö], # y.e $V i _ e, # poiki.en V u _ o, # $V y _ ö, # $V [MidV | LowV] _ [u|y] C C|.#.]]; # define Syllabify [ C* V+ C* @-> ... "." || _ C V ]; regex FinnWords .o. MarkNonDiphthongs .o. Syllabify;


Constraints ge hund bon ne mal eg et in eg et o a ec j n MF%+ => _ ~$[%+Fem] %+Pl ; MF+ +Fem +Pl

Constraining by composition: 

Constraining by composition xfst[0]: read lexc < adj-noun-tags.lexc Root...2, Nouns...2, NounRoots...4, Nmf...5, .... Building lexicon...Minimizing...Done! 2.7 Kb. 45 states, 70 arcs, Circular. xfst[1]: up gehundino MF+hund+Noun+Fem+Sg xfst[1]: regex "MF+" => _ ~$["+Fem"] "+Pl" ; 1.2 Kb, 2 states, 7 arcs, Circular xfst[2]: compose 3.2 Kb, 61 states, 89 arcs, Circular xfst[1]: up gehundino xfst[1]: *** Not accepted *** Less words, bigger network.

Esperanto with Flags: 

Esperanto with Flags Multichar_Symbols +Noun +Adj +Nsuff +ASuff +Nize +Pl +Sg +Acc MF+ +Aug +Dim +Fem Op+ Neg+ @U.MF.Yes@ @U.MF.No@ LEXICON Root Nouns ; Adjectives ; LEXICON Nouns NounRoots ; @U.MF.Yes@ Ge ; LEXICON Ge MF+:ge NounRoots; LEXICON NounRoots bird Nmf ; hund Nmf ; kat Nmf ; LEXICON Nmf +Noun:0 AugDimFem ; LEXICON AugDimFem @U.MF.No@ Fem ; +Dim:et AugDimFem ; +Aug:eg AugDimFem ; Nend ; Adjend ; LEXICON Fem +Fem:in AugDimFem ;

Constraining by flags: 

Constraining by flags xfst[0]: read lexc < esperanto-flags.lexc xfst[1]: up gehundino xfst[1]: xfst[1]: down MF+hund+Noun+Fem+NSuff+Sg xfst[1]: xfst[1]: set obey-flags off variable obey-flags = off xfst[1]: up gehundino xfst[1]: MF+hund+Noun+Fem+NSuff+Sg xfst[1]: set show-flags on variable show-flags = on xfst[1]: down MF+hund+Noun+Fem+NSuff+Sg @U.MF.Yes@gehund@U.MF.No@ino@U.MF.No@

Flags in the sigma: 

Flags in the sigma xfst[1]: print sigma MF+ Neg+ Op+ a b c d e f g h i j k l m n o r t u v +ASuff +Acc +Adj +Aug +Dim +Fem +Nsuff +Nize +Noun +Pl +Sg @U.MF.No@ @U.MF.Yes@ Size: 35 @U.MF.Yes@: UNIFY feature 'MF' with value 'Yes' @U.MF.No@: UNIFY feature 'MF' with value 'No' 2 flag diacritics

Eliminating flags: 

Eliminating flags xfst[1]: eliminate flag MF 3.2 Kb. 61 states 89 arcs, Circular Size: 35 xfst[1]: print sigma MF+ Neg+ Op+ a b c d e f g h i j k l m n o r t u v +ASuff +Acc +Adj +Aug +Dim +Fem +NSuff +Nize +Noun +Pl +Sg Size: 33 The eliminate flag command composes the network with constraint networks that have the same effect as the flag diacritics that are removed.

Flag Diacritics: 

Flag Diacritics Special symbols for encoding features, that is, attribute-value pairs. Checked at runtime to avoid the cost of compiling them into the structure of the network If a check fails, the path is abandoned.

Attributes and Values: 

Attributes and Values Epsilon arcs with feature constraints. @U.Feature.Value@ @C.Feature@ Unify ‘Feature’ with ‘Value’ if possible. Set ‘Feature’ to the unspecified value.


Rules There can be any number of attributes. An attribute can have any number of values. If the value of an attribute is unspecified, it unifies successfully with any given value and is set to that value. If the value of an attribute is specified, it unifies only with the given value.

Actions: Unify, Positive Set: 

Actions: Unify, Positive Set @U.Feature.Value@ Unify Value with the current setting of Feature, if possible. Otherwise fail. @P.Feature.Value@ Set Feature to Value regardless of the current setting. Always succeeds.

More Actions: Negative Set, Clear: 

More Actions: Negative Set, Clear @N.Feature.Value@ Set Feature to the complement of Value regardless of the current setting. Always succeeds. @C.Feature@ Make Feature be unspecified. Always succeeds.

More Actions: Require: 

More Actions: Require @R.Feature.Value@ Succeed in Feature is set to Value. Otherwise fail. @R.Feature@ Succeed if Feature has been set to some value. Otherwise fail.

More Actions: Equality: 

More Actions: Equality @E.Feature1.Feature2@ Succeed if Feature1 has the same value as Feature2. Otherwise fail.

Eliminating flags: 

Eliminating flags The constraints on "@U.FEATURE.VALUE@" have the form ~[?* PROHIBIT_FLAGS ~$[ALLOW_FLAGS] SELF ?*] Constraint for eliminating @U.MF.No@: ~[?* ["@U.MF.Yes@"] # prohibit ~$["@P.MF.No@" | ”@C.MF@”] # allow "@U.MF.No@" ?*]

Finnish Numerals: 

Finnish Numerals

Numbers and Numerals: 

Numbers and Numerals The mapping from integers 0, 1, 2, 3 … to the corresponding numerals one, two, three… is a regular relation. Some languages have a very simple numeral system, some are more complicated: seventy-three, soixante-treize, drei-und-sibzig We can compile transducers that map between the numbers and the corresponding numerals.

Number-to-Numeral transducer: 

Number-to-Numeral transducer

The Goal Ahead: Finnish: 

The Goal Ahead: Finnish

Finnish Numerals: 

Finnish Numerals Compound numerals written as one word 2 • 1000 + 5 • 100 + 3 • 10 + 1 = 2531 kaksituhattaviisisataakolmekymmentäyksi Express ordinality, number, and case sata+Sg+Nom (100) sata+Ord+Sg+Nom (100th) sata sadas sata+Sg+Gen (100) sata+Ord+Sg+Gen (100th) sadan sadannen sata+Pl+Gen (100) sata+Ord+Pl+Gen (100th) satojen sadansien

Singular vs. Plural: 

Singular vs. Plural Numerals generally occur with singular nouns kaksi+Sg+Gen kenkä+Sg+Gen kahden kengän omistaja (owner of two shoes) Sets and public events may be in plural kaksi+Pl+Gen kenkä+Pl+Gen kaksien kenkien omistaja (owner of two pairs of shoes) kolme+Ord+Pl+Nom olympialainen+Pl+Nom kolmannet olympialaiset (third olympic games) yksi+Pl+Nom hää+Pl+Nom yhdet häät (one wedding)


Morphotactics All parts of compound numerals agree in all respects two thousand five hundred (2500) kaksi+Sg+Gen tuhat+Sg+Gen viisi+Sg+Gen sata+Sg+Gen kahden tuhannen viiden sadan two ten eighth (28th) kaksi+Ord+Pl+Gen kymmenen+Ord+Pl+Gen kahdeksan+Ord+Pl+Gen kahde ns i en kymmene ns i en kahdeksa ns i en

Singular nominative is exceptional: 

Singular nominative is exceptional Numeral with a noun kaksi+Gen kenkä+Gen kahden kengän (two shoes) kaksi+Nom kenkä+Part kaksi kenkää (two shoes) Compound numeral kaksi+Gen tuhat+Gen viisi+Gen sata+Gen kolme+Gen (2503) kahden tuhannen viiden sadan kolmen kaksi+Nom tuhat+Part viisi+Nom sata+Part kolme+Nom (2503) (kaksi • tuhatta) + (viisi • sataa) + kolme

Morphological Alternations: 

Morphological Alternations Semiregular stem alternations yksi+Sg+Nom : yksi (one) yksi+Sg+Ess : yhtenä yksi+Sg+Gen : yhden yksi+Sg+Part : yhtä yksi+Pl+Gen : yksien Irregular stem alternations yksi+Ord+Sg+Nom : ensimmäinen (first) Regular suffix alternations Vowel harmony kolme+Sg+Part : kolmea vs. neljä+Sg+Part : neljää Illative vowel kolme+Sg+Ill : kolmeen vs. neljä+Ill+Part : neljään Partitive t yksi+Sg+Part : yhtä vs. neljä+Sg+Part : neljää

Solution for Finnish: 

Solution for Finnish



Sublexicon for One: 

Sublexicon for One LEXICON Yksi YKSI+Sg:yksi Nom; # singular nominative YKSI+Sg:yhde WeakGrade; # weak stem (most cases) YKSI+Sg:yhte StrongGrade; # strong stem (essive, ill.) YKSI+Sg:yht Par; # partitive stem YKSI:yks PlStem1; # plural stem YKSI+Ord1+Sg:ensimmäinen Nom; # singular nominative YKSI+Ord1+Sg:ensimmäise AnyGrade; # weak/strong stem YKSI+Ord1+Sg:ensimmäis Par; # partitive stem YKSI+Ord+Sg:yhdes Nom; # singular nominative YKSI+Ord+Sg:yhdenne WeakGrade; # weak stem YKSI+Ord+Sg:yhdente StrongGrade; # strong stem YKSI+Ord+Sg:yhdet Par; # partitive stem YKSI+Ord:yhdens PlStem1; # plural stem

Some sublexicons: 

Some sublexicons LEXICON WeakGrade SgGen; ! Singular Genitive PlNom; ! Plural Nominative InvarWeak; ! Invariant (plural and singular) cases LEXICON InvarWeak +Tra:ksi Next; ! Translative “into” +Ine:ssA Next; ! Inessive “in” +Ela:ltA Next; ! Elative “from” (inside) +Ade:llA Next; ! Adessive “on” +Abl:ltA Next; ! Ablative “from” (outside) +All:lle Next; ! Allative “onto” +Abe:ttA Next; ! Abessive “without”

Sample paths for Two: 

Sample paths for Two kaksi+Sg+Nom kaksi+Sg+Gen kaksi+Sg+Ess kaksi kahde n kahte na kaksi+Sg+Par kaksi+Pl+Gen kaksi+Pl+Ill kah TA kaks i en kaks i Vn kaksi+Ord+Sg+Nom kaksi+Ord1+Sg+Nom kahde s toinen kaksi+Ord+Sg+Ill kaksi+Ord1+Sg+Ill kahde nte Vn toise Vn

Morphophonologial rules: 

Morphophonologial rules define BackV [a | o | u]; define FrontV [ä | ö | y]; define Vow [BackV | FrontV | i | e]; define VHarmony [A -> a || BackV ~$[FrontV] _ .o. A -> ä]; define IllativeV [V -> a || a (h) _ , V -> e || e (h) _ , … ] define PartitiveT [T -> 0 || \Vow Vow _ ];

Example again: 

Example again

Remaining problems: 

Remaining problems Special ordinals for yksi (one), kaksi (two) ensimmäinen (1st) vs. kahdeskymmenesyhdes (21st) Compose the lexicon with an appropriate filter to eliminate unwanted variants. No internal tags 2+Sg+Gen00+Sg+Gen Delete them: 0 <- Tag || _ $[\Tag Tag+] .#. ; Singular nominative as partitive in compounds %+Nom -> %+Par // %+Sg %+Nom ~$Tag %+Sg _ ; Ordinal/Plural/Case agreement Flag diacritics!

Flags for Finnish numerals: 

Flags for Finnish numerals @U.Type.Card@ @U.Type.Ord@ @U.Number.Sg@ @U.Number.Pl@ @U.Case.Nom@ @U.Case.Gen@ @U.Case.Par@ @U.Case.Tra@ @U.Case.Ess@ @U.Case.Abe@ @U.Case.Ine@ @U.Case.Ela@ @U.Case.Ill@ @U.Case.Ade@ @U.Case.Abl@ @U.Case.All@ @U.Case.Com@ @U.Case.Ins@ 300+Sg+Gen kolmensadan


Conclusion Mapping from numbers to numerals can be done in a simple and elegant way even for languages with complex morphology. Necessary for text to speech applications. Tervetuloa kahdensienkymmenensienkahdeksansien olympialaisten avajaisiin! Welcome to the opening ceremonies of the 28th Olympic Games!



authorStream Live Help