logging in or signing up marjolein Jolene Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 17 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 10, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript SemanTic Interoperability To access Cultural Heritage: SemanTic Interoperability To access Cultural Heritage Lourens van der Meij Antoine Isaac Marjolein van Gendt OLP AIO Workshop January 27th 2006Outline: Outline Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools Output of mapping task Lessons learned Current Cultural Heritage (CH) Situation: Current Cultural Heritage (CH) SituationResearch and development in CH: Research and development in CH Portals for heterogeneous collections access Different databases/vocabularies/MD schemes Syntactic interoperability Access can be granted Semantic interoperability Links with original vocabularies/MD structures are lostCurrent Development in CH: Current Development in CHPilot Project Goals: Pilot Project Goals Show in a small use case, using Two Cultural Heritage collections Two controlled vocabularies Existing mapping tools Existing SW techniques – SKOS, RDF, RDFS, Sesame (representation, reasoning, storage, mapping) That: Semantic links between controlled vocabularies can result in integrated access to heterogeneous Cultural Heritage collectionsSTITCH ultimate goal: STITCH ultimate goalPilot Project Modules: Pilot Project ModulesCollection selection (1/3): Collection selection (1/3) Domain: Cultural Heritage Collections: Medieval Illuminated Manuscripts from KB Masterpieces from RijksmuseumCollection selection (2/3): Collection selection (2/3) Controlled vocabularies: Iconclass Lupus (wolf) Fol. 62r: column min. 50x60 Iconclass: 25F23(WOLF) 47I2133 Illuminated Manuscripts > 24.000 concepts 10+ levels Keys Structural digits Cross-references Bracketed text etc. Collection selection (3/3): Collection selection (3/3) Controlled vocabularies: ARIA Title The Artist Painting a Cow in a Meadow Landscape Year 1850 Artist Hendrikus van de Sande Bakhuyzen Technique Oil on panel Dimensions 73,2 x 96,7 cm Object number SK-A-4163 Catalogue Man, Self portraits, Cattle, Dutch landscapes, Fields and meadows Masterpieces <500 terms, some of them redundant 2-levels Fuzzy multi-inheritance Top and Topia TermsOutline: Outline Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools Output of mapping task Lessons learnedThesauri Formalisation: Thesauri Formalisation ARIA CHIP issued a SKOS version Only used Topia Terms Iconclass SKOS Only used basic hierarchy No keys/structural digits/keywords Mapping tools: Mapping tools S-Match, Trento Required input: TAB indented trees Tree-like structures mapper http://dit.unitn.it/~accord/ Falcon-AO, Nanjing Required input : Standard RDFS Class/subClassOf Subdivision of Iconclass Standard OWL ontology mapper http://xobjects.seu.edu.cn/project/falcon/falcon.htm Method Lexical/element level matching Oracle (e.g. Wordnet) Structure matchingOutput of mapping task: Output of mapping task Output format S-Match: Less General More General Equivalence Iconclass vs. ARIA only gives IC LG ARIA Falcon AO: Equivalence Confidence measure (always 1) Sequence of mappings might indicate usefulness Application specific requirements UI needs precision Annotators might need recallOutput of mapping task (S-Match) – nice results: Output of mapping task (S-Match) – nice resultsOutput of mapping task (S-Match) – awful results: Output of mapping task (S-Match) – awful resultsLessons learned: Lessons learned Annotation of results Lexical matching Gloss vs. label NLP Non-convenient priorities are given to lexical elements rdfs:label vs. rdf:about/ID Oracle based matching Wordnet Sense Disambiguation Structure based matching Structure overvaluation (BT vs. NT vs. EQ) Thesaurus simplicity makes it (almost?) useless No attributes, fuzzy hierarchies Differences in hierarchical structure levels Complex structure-based algorithms are not always intuitiveLessons learned: Lessons learned Annotation of results (contn’d) Output format Wrong kind of relation (RT, siblings) 1-1 mapping Precision: S-Match: 41% (subset of IC) Falcon-AO from 1 out of 1000 (subset of IC) To 5% if data tricked To 52% if artificial but realistic threshold is introduced Manual cleaning needed for use in UI Expert mapping Size of vocabularies Ambiguous e.g.: is Nature/World as celestial body/Animals equal to or a subclass of Animals? To be continued Lessons learned: Improvements: Lessons learned: Improvements Lexical matching Introduce NLP Let only complete concepts match …. Further research (decipher black-boxes) Oracle based matching Stricter Wordnet interpretation Include other oracles Structure based matching Create thesaurus based structure mapping (RT, keywords, siblings) ….. Further research (decipher black-boxes)Lessons learned: Conclusion: Lessons learned: Conclusion We have ontology mappers, not thesaurus mappers Input: needs pre-processing from thesaurus data Output: needs re-interpretation of mapping relations Mapping process Using resources that may be absent from thesauri E.g. properties Not (properly) using all information found in thesauri E.g. synonyms, RT, textual descriptions Leads to ‘low-quality’ thesaurus mappingThanks! Any questions?: Thanks! Any questions? ? User Interface Future workCollections Access: Single View: Collections Access: Single View Facets based on 1 point of view and its associated concept scheme(s) Access to objects indexed against concepts from other schemes If mapping between their index and the concepts from single view A single point of view on integrated data setCollections Access: Combined View: Collections Access: Combined View Search based on 2 points of view One facet uses 1 vocabulary from 1 point of view Facets attached to the different points of view are presented Simultaneous access to different points of view of the same dataCollections Access: Merged View: Collections Access: Merged View Facets using a merged concept scheme Mapping leads to hierarchical links between schemes Making the links between vocabularies more visible during search A way to ‘enrich’ weakly structured vocabularies Future work: Future work A lot to do for the rest of STITCH! Method Thinking about roadmap for using ontology matching techniques for CH voc. Taking into account MD schemes (structure) Evaluation of mappings Use cases KB Other institutions and projects Practical Scalability of tools Deployment for SW data (distributed/centralized) Implementation of thesaurus-specific (adaptations of) toolsFuture work: Future work Concerning PP: Mappings Assessing criteria for proper application-specific evaluation (Keep on) tuning tools to obtain better results for PP collections Interface Dynamic view switching/facet activation Better use of all kinds of exploitable relationships RT-like Expert evaluation of the whole prototype Integrating other collections What’s a thesaurus: What’s a thesaurus (Wikipedia) A list of every important term (single-word or multi-word) in a given domain of knowledge; and A set of related terms for each term in the list. Possible relations and additions: Scope Note Related Term (RT) Broader Term (BT) Narrower Term (NT) BT and NT are reciprocals Use (USE) = non-preferred term -> preferred term Used For (UF) = preferred term -> non-preferred term You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
marjolein Jolene Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 17 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 10, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript SemanTic Interoperability To access Cultural Heritage: SemanTic Interoperability To access Cultural Heritage Lourens van der Meij Antoine Isaac Marjolein van Gendt OLP AIO Workshop January 27th 2006Outline: Outline Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools Output of mapping task Lessons learned Current Cultural Heritage (CH) Situation: Current Cultural Heritage (CH) SituationResearch and development in CH: Research and development in CH Portals for heterogeneous collections access Different databases/vocabularies/MD schemes Syntactic interoperability Access can be granted Semantic interoperability Links with original vocabularies/MD structures are lostCurrent Development in CH: Current Development in CHPilot Project Goals: Pilot Project Goals Show in a small use case, using Two Cultural Heritage collections Two controlled vocabularies Existing mapping tools Existing SW techniques – SKOS, RDF, RDFS, Sesame (representation, reasoning, storage, mapping) That: Semantic links between controlled vocabularies can result in integrated access to heterogeneous Cultural Heritage collectionsSTITCH ultimate goal: STITCH ultimate goalPilot Project Modules: Pilot Project ModulesCollection selection (1/3): Collection selection (1/3) Domain: Cultural Heritage Collections: Medieval Illuminated Manuscripts from KB Masterpieces from RijksmuseumCollection selection (2/3): Collection selection (2/3) Controlled vocabularies: Iconclass Lupus (wolf) Fol. 62r: column min. 50x60 Iconclass: 25F23(WOLF) 47I2133 Illuminated Manuscripts > 24.000 concepts 10+ levels Keys Structural digits Cross-references Bracketed text etc. Collection selection (3/3): Collection selection (3/3) Controlled vocabularies: ARIA Title The Artist Painting a Cow in a Meadow Landscape Year 1850 Artist Hendrikus van de Sande Bakhuyzen Technique Oil on panel Dimensions 73,2 x 96,7 cm Object number SK-A-4163 Catalogue Man, Self portraits, Cattle, Dutch landscapes, Fields and meadows Masterpieces <500 terms, some of them redundant 2-levels Fuzzy multi-inheritance Top and Topia TermsOutline: Outline Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools Output of mapping task Lessons learnedThesauri Formalisation: Thesauri Formalisation ARIA CHIP issued a SKOS version Only used Topia Terms Iconclass SKOS Only used basic hierarchy No keys/structural digits/keywords Mapping tools: Mapping tools S-Match, Trento Required input: TAB indented trees Tree-like structures mapper http://dit.unitn.it/~accord/ Falcon-AO, Nanjing Required input : Standard RDFS Class/subClassOf Subdivision of Iconclass Standard OWL ontology mapper http://xobjects.seu.edu.cn/project/falcon/falcon.htm Method Lexical/element level matching Oracle (e.g. Wordnet) Structure matchingOutput of mapping task: Output of mapping task Output format S-Match: Less General More General Equivalence Iconclass vs. ARIA only gives IC LG ARIA Falcon AO: Equivalence Confidence measure (always 1) Sequence of mappings might indicate usefulness Application specific requirements UI needs precision Annotators might need recallOutput of mapping task (S-Match) – nice results: Output of mapping task (S-Match) – nice resultsOutput of mapping task (S-Match) – awful results: Output of mapping task (S-Match) – awful resultsLessons learned: Lessons learned Annotation of results Lexical matching Gloss vs. label NLP Non-convenient priorities are given to lexical elements rdfs:label vs. rdf:about/ID Oracle based matching Wordnet Sense Disambiguation Structure based matching Structure overvaluation (BT vs. NT vs. EQ) Thesaurus simplicity makes it (almost?) useless No attributes, fuzzy hierarchies Differences in hierarchical structure levels Complex structure-based algorithms are not always intuitiveLessons learned: Lessons learned Annotation of results (contn’d) Output format Wrong kind of relation (RT, siblings) 1-1 mapping Precision: S-Match: 41% (subset of IC) Falcon-AO from 1 out of 1000 (subset of IC) To 5% if data tricked To 52% if artificial but realistic threshold is introduced Manual cleaning needed for use in UI Expert mapping Size of vocabularies Ambiguous e.g.: is Nature/World as celestial body/Animals equal to or a subclass of Animals? To be continued Lessons learned: Improvements: Lessons learned: Improvements Lexical matching Introduce NLP Let only complete concepts match …. Further research (decipher black-boxes) Oracle based matching Stricter Wordnet interpretation Include other oracles Structure based matching Create thesaurus based structure mapping (RT, keywords, siblings) ….. Further research (decipher black-boxes)Lessons learned: Conclusion: Lessons learned: Conclusion We have ontology mappers, not thesaurus mappers Input: needs pre-processing from thesaurus data Output: needs re-interpretation of mapping relations Mapping process Using resources that may be absent from thesauri E.g. properties Not (properly) using all information found in thesauri E.g. synonyms, RT, textual descriptions Leads to ‘low-quality’ thesaurus mappingThanks! Any questions?: Thanks! Any questions? ? User Interface Future workCollections Access: Single View: Collections Access: Single View Facets based on 1 point of view and its associated concept scheme(s) Access to objects indexed against concepts from other schemes If mapping between their index and the concepts from single view A single point of view on integrated data setCollections Access: Combined View: Collections Access: Combined View Search based on 2 points of view One facet uses 1 vocabulary from 1 point of view Facets attached to the different points of view are presented Simultaneous access to different points of view of the same dataCollections Access: Merged View: Collections Access: Merged View Facets using a merged concept scheme Mapping leads to hierarchical links between schemes Making the links between vocabularies more visible during search A way to ‘enrich’ weakly structured vocabularies Future work: Future work A lot to do for the rest of STITCH! Method Thinking about roadmap for using ontology matching techniques for CH voc. Taking into account MD schemes (structure) Evaluation of mappings Use cases KB Other institutions and projects Practical Scalability of tools Deployment for SW data (distributed/centralized) Implementation of thesaurus-specific (adaptations of) toolsFuture work: Future work Concerning PP: Mappings Assessing criteria for proper application-specific evaluation (Keep on) tuning tools to obtain better results for PP collections Interface Dynamic view switching/facet activation Better use of all kinds of exploitable relationships RT-like Expert evaluation of the whole prototype Integrating other collections What’s a thesaurus: What’s a thesaurus (Wikipedia) A list of every important term (single-word or multi-word) in a given domain of knowledge; and A set of related terms for each term in the list. Possible relations and additions: Scope Note Related Term (RT) Broader Term (BT) Narrower Term (NT) BT and NT are reciprocals Use (USE) = non-preferred term -> preferred term Used For (UF) = preferred term -> non-preferred term