marjolein

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

SemanTic Interoperability To access Cultural Heritage: 

SemanTic Interoperability To access Cultural Heritage Lourens van der Meij Antoine Isaac Marjolein van Gendt OLP AIO Workshop January 27th 2006

Outline: 

Outline Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools Output of mapping task Lessons learned

Current Cultural Heritage (CH) Situation: 

Current Cultural Heritage (CH) Situation

Research and development in CH: 

Research and development in CH Portals for heterogeneous collections access Different databases/vocabularies/MD schemes Syntactic interoperability Access can be granted Semantic interoperability Links with original vocabularies/MD structures are lost

Current Development in CH: 

Current Development in CH

Pilot Project Goals: 

Pilot Project Goals Show in a small use case, using Two Cultural Heritage collections Two controlled vocabularies Existing mapping tools Existing SW techniques – SKOS, RDF, RDFS, Sesame (representation, reasoning, storage, mapping) That: Semantic links between controlled vocabularies can result in integrated access to heterogeneous Cultural Heritage collections

STITCH ultimate goal: 

STITCH ultimate goal

Pilot Project Modules: 

Pilot Project Modules

Collection selection (1/3): 

Collection selection (1/3) Domain: Cultural Heritage Collections: Medieval Illuminated Manuscripts from KB Masterpieces from Rijksmuseum

Collection selection (2/3): 

Collection selection (2/3) Controlled vocabularies: Iconclass Lupus (wolf) Fol. 62r: column min. 50x60 Iconclass: 25F23(WOLF) 47I2133 Illuminated Manuscripts > 24.000 concepts 10+ levels Keys Structural digits Cross-references Bracketed text etc.

Collection selection (3/3): 

Collection selection (3/3) Controlled vocabularies: ARIA Title The Artist Painting a Cow in a Meadow Landscape Year 1850 Artist Hendrikus van de Sande Bakhuyzen Technique Oil on panel Dimensions 73,2 x 96,7 cm Object number SK-A-4163 Catalogue Man, Self portraits, Cattle, Dutch landscapes, Fields and meadows Masterpieces <500 terms, some of them redundant 2-levels Fuzzy multi-inheritance Top and Topia Terms

Outline: 

Outline Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools Output of mapping task Lessons learned

Thesauri Formalisation: 

Thesauri Formalisation ARIA CHIP issued a SKOS version Only used Topia Terms Iconclass SKOS Only used basic hierarchy No keys/structural digits/keywords

Mapping tools: 

Mapping tools S-Match, Trento Required input: TAB indented trees Tree-like structures mapper http://dit.unitn.it/~accord/ Falcon-AO, Nanjing Required input : Standard RDFS Class/subClassOf Subdivision of Iconclass Standard OWL ontology mapper http://xobjects.seu.edu.cn/project/falcon/falcon.htm Method Lexical/element level matching Oracle (e.g. Wordnet) Structure matching

Output of mapping task: 

Output of mapping task Output format S-Match: Less General More General Equivalence Iconclass vs. ARIA only gives IC LG ARIA Falcon AO: Equivalence Confidence measure (always 1) Sequence of mappings might indicate usefulness Application specific requirements UI needs precision Annotators might need recall

Output of mapping task (S-Match) – nice results: 

Output of mapping task (S-Match) – nice results

Output of mapping task (S-Match) – awful results: 

Output of mapping task (S-Match) – awful results

Lessons learned: 

Lessons learned Annotation of results Lexical matching Gloss vs. label NLP Non-convenient priorities are given to lexical elements rdfs:label vs. rdf:about/ID Oracle based matching Wordnet Sense Disambiguation Structure based matching Structure overvaluation (BT vs. NT vs. EQ) Thesaurus simplicity makes it (almost?) useless No attributes, fuzzy hierarchies Differences in hierarchical structure levels Complex structure-based algorithms are not always intuitive

Lessons learned: 

Lessons learned Annotation of results (contn’d) Output format Wrong kind of relation (RT, siblings) 1-1 mapping Precision: S-Match: 41% (subset of IC) Falcon-AO from 1 out of 1000 (subset of IC) To 5% if data tricked To 52% if artificial but realistic threshold is introduced Manual cleaning needed for use in UI Expert mapping Size of vocabularies Ambiguous e.g.: is Nature/World as celestial body/Animals equal to or a subclass of Animals? To be continued 

Lessons learned: Improvements: 

Lessons learned: Improvements Lexical matching Introduce NLP Let only complete concepts match …. Further research (decipher black-boxes) Oracle based matching Stricter Wordnet interpretation Include other oracles Structure based matching Create thesaurus based structure mapping (RT, keywords, siblings) ….. Further research (decipher black-boxes)

Lessons learned: Conclusion: 

Lessons learned: Conclusion We have ontology mappers, not thesaurus mappers Input: needs pre-processing from thesaurus data Output: needs re-interpretation of mapping relations Mapping process Using resources that may be absent from thesauri E.g. properties Not (properly) using all information found in thesauri E.g. synonyms, RT, textual descriptions Leads to ‘low-quality’ thesaurus mapping

Thanks! Any questions?: 

Thanks! Any questions? ? User Interface Future work

Collections Access: Single View: 

Collections Access: Single View Facets based on 1 point of view and its associated concept scheme(s) Access to objects indexed against concepts from other schemes If mapping between their index and the concepts from single view A single point of view on integrated data set

Collections Access: Combined View: 

Collections Access: Combined View Search based on 2 points of view One facet uses 1 vocabulary from 1 point of view Facets attached to the different points of view are presented Simultaneous access to different points of view of the same data

Collections Access: Merged View: 

Collections Access: Merged View Facets using a merged concept scheme Mapping leads to hierarchical links between schemes Making the links between vocabularies more visible during search A way to ‘enrich’ weakly structured vocabularies

Future work: 

Future work A lot to do for the rest of STITCH! Method Thinking about roadmap for using ontology matching techniques for CH voc. Taking into account MD schemes (structure) Evaluation of mappings Use cases KB Other institutions and projects Practical Scalability of tools Deployment for SW data (distributed/centralized) Implementation of thesaurus-specific (adaptations of) tools

Future work: 

Future work Concerning PP: Mappings Assessing criteria for proper application-specific evaluation (Keep on) tuning tools to obtain better results for PP collections Interface Dynamic view switching/facet activation Better use of all kinds of exploitable relationships RT-like Expert evaluation of the whole prototype Integrating other collections

What’s a thesaurus: 

What’s a thesaurus (Wikipedia) A list of every important term (single-word or multi-word) in a given domain of knowledge; and A set of related terms for each term in the list. Possible relations and additions: Scope Note Related Term (RT) Broader Term (BT) Narrower Term (NT) BT and NT are reciprocals Use (USE) = non-preferred term -> preferred term Used For (UF) = preferred term -> non-preferred term