logging in or signing up M CAST PS Rinald Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 65 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Information Query Formulation in a Slavonic Language and its Automatic Processing: Information Query Formulation in a Slavonic Language and its Automatic Processing Experience from Polish and Czech in comparison to Western European Languages Petr Strossa University of Economics, Prague Department of Information & Knowledge EngineeringGeneral Issue: General Issue 86 Question/Answer Types and the basic idea of their recognition in texts [D. Laurent et al., SYNAPSE, Toulouse]Technology: Technology Priberam’s lexicon data structure SintaGest software tool [Priberam Informática, Lisbon]Question-Answer Pattern(Example): Question-Answer Pattern (Example) Question(WEIGHT) : Root("jaký")? Dist(0,5) WeightNoun = 20 // Jaká je hmotnost Země? : Wrd(jak) WeightAdj = 20 // Jak těžký může být slon? : Wrd(kolik) WeightUnit = 20 // Kolik kg má dospělý kapr? : Wrd(kolik) Root("vážit") = 20 // Kolik váží kapr? Answer : WeightNoun Definition With Pivot Dist(0,5) {Number6 WeightUnit} = 20 // Váha kapra může dosáhnout až 5 kg. : Pivot Dist(0, 5) Cat(V) Dist(0,5) {Number6 WeightUnit} = 20 // Roční kapr může dosáhnout 5 kg tělesné váhy. ; Answer(WEIGHT) : Number6 WeightUnit = 20 ; Definitions of Constants Used in the Previous Example: Definitions of Constants Used in the Previous Example Const WeightNoun = AnyRoot(hmotnost, hmota, "tíha", "váha", "zatížení"); Const WeightAdj = AnyRoot("těžký", "lehký"); Const WeightUnit1 = AnyRoot(mikrogram, miligram, centigram, decigram, gram, dekagram, hektogram, kilogram, kilo, cent, megagram, miligram, tuna, "karát", pond, kilopond, megapond, libra); Const WeightUnit2 = AnyWrd(mg, cg, dg, g, dag, deka, Dg, dkg, hg, kg, q, Mg, t, p, kp, Mp, lb, "lb.", lbs, "lbs.", cwt, "cwt."); Const WeightUnit = AnyConst(WeightUnit1, WeightUnit2);General Observation: General Observation The conception and the tools designed to process Western European languages can be adapted to process Slavonic languages, as Polish and Czech. Some basic differences between the language families must be kept in mind during such an adaptation!The Abundance of Morphology: The Abundance of Morphology Nouns: 4 (!) genders, 2 numbers, 7 cases Adjectives: e.g. světlý (bright) 3 degrees: světlý ↔ světlejší, nejsvětlejší 4 genders: světlý ↔ světlá, světlé 2 numbers: světlý ↔ světlí 7 cases: světlý ↔ světlého, světlému, ... The Abundance of Morphology (2): The Abundance of Morphology (2) Adjectives Continued: Theoretically every adjective may have 3*4*2*7 = 168 forms altogether! Practically some of them are regularly (without exceptions) equal... A general scheme for a morphology pattern description cannot work with less than 57 forms (= 3 degrees * 19 possibly differing gender/number/case endings).The Abundance of Morphology (3):Illustration – the 19 Ending System: The Abundance of Morphology (3): Illustration – the 19 Ending SystemThe Abundance of Morphology (4): The Abundance of Morphology (4) Adjectives Continued: In fact, not all of them may have all the forms. Some adjectives cannot undergo gradation for purely morphological reasons: domácí (home, home-made) Other adjectives usually do not undergo gradation for semantic reasons: jednofázový (one-phase)Morphological Pattern (Ex. 1): Morphological Pattern (Ex. 1)Morphological Pattern (Ex. 2): Morphological Pattern (Ex. 2)Morphology of Nouns: Some Statistics: Morphology of Nouns: Some StatisticsMorphology of Nouns: Some Statistics (2): Morphology of Nouns: Some Statistics (2) We need about 300 noun patterns altogether. We have about 90 noun patterns that describe the declension of at least 10 different nouns. We have about 80 noun patterns that describe only 1 noun each. About one half of the noun patterns describe the declension of 1–3 nouns each.Inherent Homonymy of Forms: Inherent Homonymy of Forms A typical situation for our type of morphology: světlé (bright) nominative/accusative/vocative singular neuter genitive/dative/locative singular feminine nom./acc./voc. plural fem. acc. pl. masculine animate nom./acc./voc. pl. masculine inanimate i.e. 13 possible grammatical interpretations altogether!Inherent Homonymy of Forms (2): Inherent Homonymy of Forms (2) Only a little bit less typical situation: Ženu holí stroj. I am setting a machine in motion with a stick. OR: I am setting a machine of sticks in motion. (*) The woman is shaved by a machine. Dress the woman with a stick. OR: Dress the woman of sticks. (*) Inherent Homonymy of Forms (3): Inherent Homonymy of Forms (3) All the previous once again – in a question: Jaký je plat Petra Hanka? What is the salary of XY? X {Petr, Peter, Petar} Y {Hank, Hanek, Hanke, Hanko} The only thing we know for sure: X ≠ Petra (though such name exists); Y ≠ Hanka (though such name exists)! Inherent Homonymy of Forms (4): Inherent Homonymy of Forms (4) Jaký je plat Petra Hanka? What is the salary of XY? The only thing we know for sure: X ≠ Petra (though such name exists); Y ≠ Hanka (though such name exists)! : Jaký plat Hanka dává svým zaměstnancům? What salary does Hanka give to her/his employees?Inherent Homonymy of Forms (Conclusion): Inherent Homonymy of Forms (Conclusion) Due to our free word order, it is generally quite problematic to try any limited context disambiguation. A really safe disambiguation is possible only after a complete syntactic analysis of a sentence (which should keep all the possible meanings of all the words up to the end). (But we do not make complete syntactic analysis of sentences in M-CAST.)Free Word Order Again: Free Word Order Again How far is it to Brno? Jak daleko je do Brna? (+++) Jak je daleko do Brna? (+++) Jak je do Brna daleko? (++) Do Brna je jak daleko? (++) Do Brna jak je daleko? (+) Do Brna je daleko jak? (+) Daleko je do Brna jak? (+) You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
M CAST PS Rinald Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 65 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Information Query Formulation in a Slavonic Language and its Automatic Processing: Information Query Formulation in a Slavonic Language and its Automatic Processing Experience from Polish and Czech in comparison to Western European Languages Petr Strossa University of Economics, Prague Department of Information & Knowledge EngineeringGeneral Issue: General Issue 86 Question/Answer Types and the basic idea of their recognition in texts [D. Laurent et al., SYNAPSE, Toulouse]Technology: Technology Priberam’s lexicon data structure SintaGest software tool [Priberam Informática, Lisbon]Question-Answer Pattern(Example): Question-Answer Pattern (Example) Question(WEIGHT) : Root("jaký")? Dist(0,5) WeightNoun = 20 // Jaká je hmotnost Země? : Wrd(jak) WeightAdj = 20 // Jak těžký může být slon? : Wrd(kolik) WeightUnit = 20 // Kolik kg má dospělý kapr? : Wrd(kolik) Root("vážit") = 20 // Kolik váží kapr? Answer : WeightNoun Definition With Pivot Dist(0,5) {Number6 WeightUnit} = 20 // Váha kapra může dosáhnout až 5 kg. : Pivot Dist(0, 5) Cat(V) Dist(0,5) {Number6 WeightUnit} = 20 // Roční kapr může dosáhnout 5 kg tělesné váhy. ; Answer(WEIGHT) : Number6 WeightUnit = 20 ; Definitions of Constants Used in the Previous Example: Definitions of Constants Used in the Previous Example Const WeightNoun = AnyRoot(hmotnost, hmota, "tíha", "váha", "zatížení"); Const WeightAdj = AnyRoot("těžký", "lehký"); Const WeightUnit1 = AnyRoot(mikrogram, miligram, centigram, decigram, gram, dekagram, hektogram, kilogram, kilo, cent, megagram, miligram, tuna, "karát", pond, kilopond, megapond, libra); Const WeightUnit2 = AnyWrd(mg, cg, dg, g, dag, deka, Dg, dkg, hg, kg, q, Mg, t, p, kp, Mp, lb, "lb.", lbs, "lbs.", cwt, "cwt."); Const WeightUnit = AnyConst(WeightUnit1, WeightUnit2);General Observation: General Observation The conception and the tools designed to process Western European languages can be adapted to process Slavonic languages, as Polish and Czech. Some basic differences between the language families must be kept in mind during such an adaptation!The Abundance of Morphology: The Abundance of Morphology Nouns: 4 (!) genders, 2 numbers, 7 cases Adjectives: e.g. světlý (bright) 3 degrees: světlý ↔ světlejší, nejsvětlejší 4 genders: světlý ↔ světlá, světlé 2 numbers: světlý ↔ světlí 7 cases: světlý ↔ světlého, světlému, ... The Abundance of Morphology (2): The Abundance of Morphology (2) Adjectives Continued: Theoretically every adjective may have 3*4*2*7 = 168 forms altogether! Practically some of them are regularly (without exceptions) equal... A general scheme for a morphology pattern description cannot work with less than 57 forms (= 3 degrees * 19 possibly differing gender/number/case endings).The Abundance of Morphology (3):Illustration – the 19 Ending System: The Abundance of Morphology (3): Illustration – the 19 Ending SystemThe Abundance of Morphology (4): The Abundance of Morphology (4) Adjectives Continued: In fact, not all of them may have all the forms. Some adjectives cannot undergo gradation for purely morphological reasons: domácí (home, home-made) Other adjectives usually do not undergo gradation for semantic reasons: jednofázový (one-phase)Morphological Pattern (Ex. 1): Morphological Pattern (Ex. 1)Morphological Pattern (Ex. 2): Morphological Pattern (Ex. 2)Morphology of Nouns: Some Statistics: Morphology of Nouns: Some StatisticsMorphology of Nouns: Some Statistics (2): Morphology of Nouns: Some Statistics (2) We need about 300 noun patterns altogether. We have about 90 noun patterns that describe the declension of at least 10 different nouns. We have about 80 noun patterns that describe only 1 noun each. About one half of the noun patterns describe the declension of 1–3 nouns each.Inherent Homonymy of Forms: Inherent Homonymy of Forms A typical situation for our type of morphology: světlé (bright) nominative/accusative/vocative singular neuter genitive/dative/locative singular feminine nom./acc./voc. plural fem. acc. pl. masculine animate nom./acc./voc. pl. masculine inanimate i.e. 13 possible grammatical interpretations altogether!Inherent Homonymy of Forms (2): Inherent Homonymy of Forms (2) Only a little bit less typical situation: Ženu holí stroj. I am setting a machine in motion with a stick. OR: I am setting a machine of sticks in motion. (*) The woman is shaved by a machine. Dress the woman with a stick. OR: Dress the woman of sticks. (*) Inherent Homonymy of Forms (3): Inherent Homonymy of Forms (3) All the previous once again – in a question: Jaký je plat Petra Hanka? What is the salary of XY? X {Petr, Peter, Petar} Y {Hank, Hanek, Hanke, Hanko} The only thing we know for sure: X ≠ Petra (though such name exists); Y ≠ Hanka (though such name exists)! Inherent Homonymy of Forms (4): Inherent Homonymy of Forms (4) Jaký je plat Petra Hanka? What is the salary of XY? The only thing we know for sure: X ≠ Petra (though such name exists); Y ≠ Hanka (though such name exists)! : Jaký plat Hanka dává svým zaměstnancům? What salary does Hanka give to her/his employees?Inherent Homonymy of Forms (Conclusion): Inherent Homonymy of Forms (Conclusion) Due to our free word order, it is generally quite problematic to try any limited context disambiguation. A really safe disambiguation is possible only after a complete syntactic analysis of a sentence (which should keep all the possible meanings of all the words up to the end). (But we do not make complete syntactic analysis of sentences in M-CAST.)Free Word Order Again: Free Word Order Again How far is it to Brno? Jak daleko je do Brna? (+++) Jak je daleko do Brna? (+++) Jak je do Brna daleko? (++) Do Brna je jak daleko? (++) Do Brna jak je daleko? (+) Do Brna je daleko jak? (+) Daleko je do Brna jak? (+)