The Global Wordnet Grid: anchoring languages to universal meaning : The Global Wordnet Grid: anchoring languages to universal meaning
Piek Vossen
Irion Technologies/Free University of Amsterdam
Overview : Overview Wordnet, EuroWordNet background
Architecture of the Global Wordnet Grid
Mapping wordnets to the Grid
Advantages of shared knowledge structure
7th Frame work project KYOTO
WordNet1.5 : WordNet1.5 Semantic network in which concepts are defined in terms of relations to other concepts.
Structure:
organized around the notion of synsets (sets of synonymous words)
basic semantic relations between these synsets
http://www.cogsci.princeton.edu/~wn/w3wn.html
Developed at Princeton by George Miller and his team as a model of the mental lexicon.
Relational model of meaning : Relational model of meaning man woman boy girl man woman boy meisje cat kitten dog puppy animal
Structure of WordNet : Structure of WordNet
Wordnet Data Model : Wordnet Data Model bank fiddle violin violist fiddler string rec: 12345
financial institute rec: 54321
- side of a river rec: 9876
- small string instrument rec: 65438
- musician playing violin rec:42654
- musician rec:25876
- string instrument rec:35576
- string of instrument rec:29551
- underwear type-of type-of part-of Vocabulary of a language Concepts Relations 1 2 2 1 1 2
Usage of Wordnet : Usage of Wordnet Improve recall of textual based analysis:
Query -> Index
Synonyms: commence – begin
Hypernyms: taxi -> car
Hyponyms: car -> taxi
Meronyms: trunk -> elephant
Lexical entailments: gun -> shoot
Inferencing:
what things can burn?
Expression in language generation and translation:
alternative words and paraphrases
Improve recall : Improve recall Information retrieval:
small databases without redundancy, e.g. image captions, video text
Text classification:
small training sets
Question & Answer systems
query analysis: who, whom, where, what, when
Improve recall : Improve recall Anaphora resolution:
The girl fell off the table. She....
The glass fell of the table. It...
Coreference resolution:
When he moved the furniture, the antique table got damaged.
Information extraction (unstructed text to structured databases):
generic forms or patterns "vehicle" - > text with specific cases "car"
Improve recall : Improve recall Summarizers:
Sentence selection based on word counts -> concept counts
Avoid repetition in summary -> language generation
Limited inferencing: detect locations, organisations, etc.
Many others : Many others Data sparseness for machine learning: hapaxes can be replaced by semantic classes
Use redundancy for more robustness: spelling correction and speech recognition can built semantic expections using Wordnet and make better choices
Sentiment and opinion mining
Natural language learning
EuroWordNet : EuroWordNet The development of a multilingual database with wordnets for several European languages
Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328
March 1996 - September 1999
2.5 Million EURO.
http://www.hum.uva.nl/~ewn
http://www.illc.uva.nl/EuroWordNet/finalresults-ewn.html
EuroWordNet : EuroWordNet Languages covered:
EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian
EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.
Size of vocabulary:
EuroWordNet-1: 30,000 concepts - 50,000 word meanings.
EuroWordNet-2: 15,000 concepts- 25,000 word meaning.
Type of vocabulary:
the most frequent words of the languages
all concepts needed to relate more specific concepts
Wordnet family : Wordnet family Princeton WordNet, (Fellbaum 1998):
115,000 conceps EuroWordNet, (Vossen 1998): 8 languages BalkaNet, (Tufis 2004): 6 languages Global Wordnet Association: all languages
EuroWordNet : EuroWordNet Wordnets are unique language-specific structures:
different lexicalizations
differences in synonymy and homonymy
different relations between synsets
same organizational principles: synset structure and same set of semantic relations.
Language independent knowledge is assigned to the ILI and can thus be shared for all language linked to the ILI: both an ontology and domain hierarchy
Autonomous & Language-Specific : Autonomous & Language-Specific voorwerp
{object} lepel
{spoon} werktuig{tool} tas
{bag} bak
{box} blok
{block} lichaam
{body} Wordnet1.5 Dutch Wordnet
Linguistic versus Artificial Ontologies : Artificial ontology:
better control or performance, or a more compact and coherent structure.
introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool),
neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise).
What properties can we infer for spoons?
spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking
Linguistic versus Artificial Ontologies
Linguistic versus Artificial Ontologies : Linguistic ontology:
Exactly reflects the relations between all the lexicalized words and expressions in a language.
Captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language.
What words can be used to name spoons?
spoon -> object, tableware, silverware, merchandise, cutlery, Linguistic versus Artificial Ontologies
Wordnets versus ontologies : Wordnets versus ontologies Wordnets:
autonomous language-specific lexicalization patterns in a relational network.
Usage: to predict substitution in text for information retrieval,
text generation, machine translation, word-sense-disambiguation.
Ontologies:
data structure with formally defined concepts.
Usage: making semantic inferences.
The Multilingual Design : Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages;
Index-records are mainly based on WordNet synsets and consist of synonyms, glosses and source references;
Various types of complex equivalence relations are distinguished;
Equivalence relations from synsets to index records: not on a word-to-word basis;
Indirect matching of synsets linked to the same index items; The Multilingual Design
Equivalent Near Synonym : Equivalent Near Synonym 1. Multiple Targets (1:many)
Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5:
make clean by removing dirt, filth, or unwanted substances from
remove unwanted substances from, such as feathers or pits, as of chickens or fruit
remove in making clean; "Clean the spots off the rug"
remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)
Dutch wordnet: versiersel near_synonym versiering ILI-Record: decoration.
3. Multiple Targets and Sources (many:many)
Dutch wordnet: toestel near_synonym apparaat ILI-records: machine; device; apparatus; tool
Equivalent Hyperonymy : Equivalent Hyperonymy Typically used for gaps in English WordNet:
genuine, cultural gaps for things not known in English culture:
Dutch: klunen, to walk on skates over land from one frozen water to the other
pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English:
Dutch: kunstproduct = artifact substance <=> artifact object
From EuroWordNet to Global WordNet : From EuroWordNet to Global WordNet Currently, wordnets exist for more than 40 languages, including:
Arabic, Bantu, Basque, Chinese, Bulgarian, Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean, Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, Zulu...
Many languages are genetically and typologically unrelated
http://www.globalwordnet.org
Some downsides : Some downsides Construction is not done uniformly
Coverage differs
Not all wordnets can communicate with one another
Proprietary rights restrict free access and usage
A lot of semantics is duplicated
Complex and obscure equivalence relations due to linguistic differences between English and other languages
Next step: Global WordNet Grid : Inter-Lingual
Ontology Device Object TransportDevice Czech Words dopravní prostředník auto vlak 2 1 French Words véhicule voiture train 2 1 Estonian Words liiklusvahend auto killavoor 2 1 Dutch Words voertuig auto trein 2 1 Next step: Global WordNet Grid
GWNG: Main Features : GWNG: Main Features Construct separate wordnets for each Grid language
Contributors from each language encode the same core set of concepts plus culture/language-specific ones
Synsets (concepts) can be mapped crosslinguistically via an ontology
No license constraints, freely available
The Ontology: Main Features : The Ontology: Main Features Formal, artificial ontology serves as universal index of concepts
List of concepts is not just based on the lexicon of a particular language (unlike in EuroWordNet) but uses ontological observations
Concepts are related in a type hierarchy
Concepts are defined with axioms
The Ontology: Main Features : The Ontology: Main Features
In addition to high-level (“primitive”) concept ontology needs to express low-level concepts lexicalized in the Grid languages
Additional concepts can be defined with expressions in Knowledge Interchange Format (KIF) based on first order predicate calculus and atomic element
The Ontology: Main Features : The Ontology: Main Features Minimal set of concepts (Reductionist view):
to express equivalence across languages
to support inferencing
Ontology must be powerful enough to encode all concepts that are lexically expressed in any of the Grid languages
The Ontology: Main Features : The Ontology: Main Features Ontology need not and cannot provide a linguistic encoding for all concepts found in the Grid languages
Lexicalization in a language is not sufficient to warrant inclusion in the ontology
Lexicalization in all or many languages may be sufficient
Ontological observations will be used to define the concepts in the ontology
Ontological observations : Ontological observations Identity criteria as used in OntoClean (Guarino & Welty 2002), :
rigidity: to what extent are properties true for entities in all worlds? You are always a human, but you can be a student for a short while.
essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of.
unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not.
Type-role distinction : Type-role distinction Current WordNet treatment:
(1) a husky is a kind of dog(type)
(2) a husky is a kind of working dog (role)
What’s wrong?
(2) is defeasible, (1) is not:
*This husky is not a dog
This husky is not a working dog
Other roles: watchdog, sheepdog, herding dog, lapdog, etc….
Ontology and lexicon : Ontology and lexicon Hierarchy of disjunct types:
Canine PoodleDog; NewfoundlandDog; GermanShepherdDog; Husky
Lexicon:
NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP
((instance x Poodle)
LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP
((instance x Canine) and (role x GuardingProcess))
Ontology and lexicon : Ontology and lexicon Hierarchy of disjunct types:
River; Clay; etc…
Lexicon:
NAMES for TYPES:
{river}EN, {rivier, stroom}NL
((instance x River)
LABELS for dependent concepts:
{rivierwater}NL (water from a river => water is not Unit)
((instance x water) and (instance y River) and (portion x y)
{kleibrok}NL (irregularly shared piece of clay=>Non-essential)
((instance x Object) and (instance y Clay) and (portion x y) and (shape X Irregular))
Rigidity : Rigidity The “primitive” concepts represented in the ontology are rigid types
Entities with non-rigid properties will be represented with KIF statements
But: ontology may include some universal, core concepts referring to roles like father, mother
Properties of the Ontology : Properties of the Ontology Minimal: terms are distinguished by essential properties only
Comprehensive: includes all distinct concepts types of all Grid languages
Allows definitions via KIF of all lexemes that express non-rigid, non-essential properties of types
Logically valid, allows inferencing
Mapping Grid Languages onto the Ontology : Mapping Grid Languages onto the Ontology Explicit and precise equivalence relations among synsets in different languages, which is somehow easier:
type hierarchy is minimal
subtle differences can be encoded in KIF expressions
Grid database contains wordnets with synsets that label
either “primitive” types in the hierarchies,
or words relating to these types in ways made explicit in KIF expressions
If 2 lgs. create the same KIF expression, this is a statement of equivalence!
How to construct the GWNG : How to construct the GWNG Take an existing ontology as starting point;
Use English WordNet to maximize the number of disjunct types in the ontology;
Link English WordNet synsets as names to the disjunct types;
Provide KIF expressions for all other English words and synsets
How to construct the GWNG : How to construct the GWNG Copy the relation from the English Wordnet to the ontology to other languages, including KIF statements built for English
Revise KIF statements to make the mapping more precise
Map all words and synsets that are and cannot be mapped to English WordNet to the ontology:
propose extensions to the type hierarchy
create KIF expressions for all non-rigid concepts
Initial Ontology: SUMO (Niles and Pease) : Initial Ontology: SUMO (Niles and Pease) SUMO = Suggested Upper Merged Ontology
--consistent with good ontological practice
--fully mapped to WordNet(s): 1000 equivalence mappings, the rest through subsumption
--freely and publicly available
--allows data interoperability
--allows NLP
--allows reasoning/inferencing
Mapping Grid languages onto the Ontology : Mapping Grid languages onto the Ontology Check existing SUMO mappings to Princeton WordNet -> extend the ontology with rigid types for specific concepts
Extend it to many other WordNet synsets
Observe OntoClean principles! (Synsets referring to non-rigid, non-essential, non-unicitous concepts must be expressed in KIF)
Lexicalizations not mapped to WordNet : Lexicalizations not mapped to WordNet Not added to the type hierarchy:
{straathond}NL (a dog that lives in the streets)
((instance x Canine) and (habitat x Street))
Added to the type hierarchy:
{klunen}NL (to walk on skates from one frozen body to the next over land)
KluunProcess => WalkProcess
Axioms:
(and (instance x Human) (instance y Walk) (instance z Skates) (wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y) (before y s2) etc…
National dishes, customs, games,....
Most mismatching concepts are not new types : Most mismatching concepts are not new types Refer to sets of types in specific circumstances or to concept that are dependent on these types, next to {rivierwater}NL there are many others:
{theewater}NL (water used for making tea)
{koffiewater}NL (water used for making coffee)
{bluswater}NL (water used for making extinguishing file)
Relate to linguistic phenomena:
gender, perspective, aspect, diminutives, politeness, pejoratives, part-of-speech constraints
KIF expression for gender marking : {teacher}EN
((instance x Human) and (agent x TeachingProcess))
{Lehrer}DE ((instance x Man) and (agent x TeachingProcess))
{Lehrerin}DE ((instance x Woman) and (agent x TeachingProcess))
KIF expression for gender marking
KIF expression for perspective : KIF expression for perspective sell: subj(x), direct obj(z),indirect obj(y)
versus
buy: subj(y), direct obj(z),indirect obj(x)
(and (instance x Human)(instance y Human) (instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e)
The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn
Parallel Noun and Verb hierarchy : Parallel Noun and Verb hierarchy event
act
deed
sail
promise
change
movement
change of location to happen
to act
to do
to sell
a promise
to change
to move
to move position Encoded once as a Process in the ontology!
Part-of-speech mismatches : Part-of-speech mismatches {bankdrukken-V}NL vs.{bench press-N}EN
{gehuil-N}NL vs. {cry-V}EN
{afsluiting-N}NL vs. {close-V}EN
Process in the ontology is neutral with respect to POS!
Aspectual variants : Aspectual variants Slavic languages: two members of a verb pair for an ongoing event and a completed event.
English: can mark perfectivity with particles, as in the phrasal verbs eat up and read through.
Romance languages: mark aspect by verb conjugations on the same verb.
Dutch, verbs with marked aspect can be created by prefixing a verb with door: doorademen, dooreten, doorfietsen, doorlezen, doorpraten (continue to breathe/eat/bike/read/talk).
These verbs are restrictions on phases of the same process
Which does NOT warrant the extension of the ontology with separate processes for each aspectual variant
Aspectual lexicalization : Aspectual lexicalization Regular compositional verb structures:
doorademen: (lit. through+breath, continue to breath)
doorbetalen: (lit. through+pay, continue to pay)
doorlopen: (lit. through+walk, continue to walk)
doorfietsen: (lit. through+walk, continue to walk)
doorrijden: (lit. through+walk, continue to walk)
(and (instance x BreathProcess)(instance y Time) (instance z Time) (end x z) (expected (end x y) (after z y))
Slide51 : MORE GENERAL VERBS:
openmaken: (lit. open+make, to cause to be open);
dichtmaken: (lit. close+make, to cause to be open);
MORE SPECIFIC VERBS:
openknijpen (lit. open+squeeze, to open by squeezing)
has_hyperonym knijpen (squeeze) & openmaken (to open)
opendraaien (lit. open+turn, to open by turning)
has_hyperonym draaien (to turn) & openmaken (to open)
dichtknijpen: (lit. closed+squeeze, to close by squeezing)
has_hyperonym knijpen (squeeze) & dichtmaken (to close)
dichtdraaien: (lit. closed +turn, to close by turning)
has_hyperonym draaien (to turn) & dichtmaken (to close) Lexicalization of Resultatives
Kinship relations in Arabic : Kinship relations in Arabic عَم(Eam~) father's brother, paternal uncle.
خَال (xaAl) mother's brother, maternal uncle.
عَمَّة (Eam~ap) father's sister, paternal aunt.
خَالَة (xaAlap) mother's sister, maternal aunt
Kinship relations in Arabic : Kinship relations in Arabic .........
شَقِيقَة ($aqiyqapfull) sister, sister on the paternal and maternal side (as distinct from أُخْت (>uxot): 'sister' which may refer to a 'sister' from paternal or maternal side, or both sides).
ثَكْلان (vakolAna) father bereaved of a child (as opposed to يَتِيم (yatiym) or يَتِيمَة (yatiymap) for feminine: 'orphan' a person whose father or mother died or both father and mother died).
ثَكْلَى (vakolaYa) other bereaved of a child (as opposed to يَتِيم or يَتِيمَة for feminine: 'orphan' a person whose father or mother died or both father and mother died).
Complex Kinship concepts : father's brother, paternal uncle
WORDNET
paternal uncle => uncle
=> brother of ....????
ONTOLOGY
(=>
(paternalUncle ?P ?UNC)
(exists (?F)
(and
(father ?P ?F)
(brother ?F ?UNC)))) Complex Kinship concepts
Advantages of the Global Wordnet Grid : Advantages of the Global Wordnet Grid Shared and uniform world knowledge:
universal inferencing
uniform text analysis and interpretation
More compact and less redundant databases
More clear notion how languages map to the knowledge
better criteria for expressing knowledge
better criteria for understanding variation
Expansion with pure hyponymy relations : dog watchdog poodle street dog dachshund lapdog short hair
dachshund long hair
dachshund Expansion from a type to roles hunting dog Expansion with pure hyponymy relations puppy bitch
Expansion with pure hyponymy relations : dog watchdog poodle street dog dachshund lapdog short hair
dachshund long hair
dachshund Expansion from a role to types and other roles hunting dog Expansion with pure hyponymy relations puppy bitch
Automotive ontology: (http://www.ontoprise.de) : Automotive ontology: (http://www.ontoprise.de)
Who uses ontologies? : Who uses ontologies?
Human dialogues with Alice-bot : Human dialogues with Alice-bot
Full understanding is fundamentally impossible BUT? : Full understanding is fundamentally impossible BUT? How can people communicate?
How can people coomunicate with computers?
As long as language is effective:
meaning= to have the desired effect!
Link language to useful content!
Slide62 : Ontology Objects
in reality Knowledge &
information Useful and effective behavior:
reason over knowledge
collect information and data
deliver services and be helpful
Concrete goals for GWG : Concrete goals for GWG Global Wordnet Association website:
http://www.globalwordnet.org/gwa/gwa_grid.htm
5000 Base Concepts or more:
English
Spanish
Catalan
Czech, Polish, Dutch, other wordnets
7th Frame Work project Kyoto
KYOTO Project : KYOTO Project 7th Frame Work project (under negotiation)
Kowledge Yielding Ontologies for Transition-based Organisations
Goal:
Global Wordnet Grid = ontology + wordnets
AutoCons = Automatic concept extractors
Kybots = Knowledge yielding robots
Wiki environment for encoding domain knowledge in expert groups
Index and retrieval software for deep semantic search
Languages: Dutch, English, Spanish, Basque, Italian, Chinese and Japanese
Domain of application: environmental organisations
Period: March/April 2008 - 2011
KYOTO Consortium : KYOTO Consortium Universities
Vrije Universiteit Amterdam, Amsterdam, Netherlands
Consiglio Nazionale delle Ricerche, Pisa, Italy
Berlin-Brandenburg Academy of Sciences and Humantities, Berlin, Germany
Euskal Herriko Unibertsitatea, San Sebastian, Spain
Academia Sinica, Taipei, Taiwan
National Institute of Information and Communications Technology, Kyoto, Japan
Masaryk University, Brno, Czech
Companies
Irion Technologies, Delft, Netherlands
Synthema, Pisa, Italy
Users
European Centre for Nature Conservation, Tilburg, Netherlands
World Wide Fund for Nature, Zeist, Netherlands
Slide69 : END