P13

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

XMDR Project Overview: 

XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.gov Lawrence Berkeley National Laboratory Presentation to Open Metadata Forum Kobe, Japan March 21, 2006

XMDR means:: 

XMDR means: Extended Metadata Registry

The Cast: 

The Cast Bruce Bargmeyer (LBNL) = Principal Investigator Kevin Keck (LBNL) = architect & stds. (design) Frank Olken (LBNL) = content characterization & stds. (design) John McCarthy (LBNL) = prototype development (management) Karlo Berket (LBNL) = prototype development Harold Solbrig (Mayo) = content preprocessing via LexGrid, stds Gayle Hodge (USGS) = content characterization, acquisition Denise Warzel (NCI) = content acquisition, standards, design Larry Fitzwater (EPA) = program mgt. (vision, direction) Nancy Lawler (DOD) = program mgt. (vision, direction) Sam Chance (DOD) = program mgt. (vision, direction)

Organizational Cast: 

Organizational Cast Lawrence Berkeley National Laboratory Environmental Protection Agency National Cancer Institute Mayo Clinic United States Geological Survey Department of Defense

Goals: 

Goals Assist revisions of ISO/IEC 11179 Metadata Registry Standard to encompass additional semantic descriptions and resources Vocabularies, thesauri, etc. Ontologies Relationships Semantic types Design and implement prototype Extended Metadata Registry Load metadata content into prototype Demonstrate prototype

Why Metadata Registries?: 

Why Metadata Registries? Facilitate reuse/standardization/integration/exchange of data Design time: Database / messaging / application / forms designers Data warehouse design Run-time: Query formulation / optimization Federated data query optimization / processing Extraction, Translation, Load (ETL) of Data Warehouses Semantic services, composition, workflows, ... Users Finding, understanding data Understanding data entry forms

Why Standards?: 

Why Standards? Developing metamodel to serve as design for next generation metadata registries Evolve ISO/IEC 11179 Metadata Data Registry Standard Edition 2 (current) UML modeling, relational DB technology implementation Edition 3 (new) UML + OWL (Ontology Web Language) / MOF (Meta Object Facility) / CL (Common Logic) modeling Add support for ontologies

More on Why MDR Standards?: 

More on Why MDR Standards? MDR Standards Can improve metadata creation practice Can improve metadata and data reuse Facilitate MDR adoption by organizations Facilitate MDR interoperability Facilitate MDR software marketing Facilitate MDR procurement Facilitate alignment / mapping among metadata schemas, ...

Proposed Changes to ISO/IEC 11179: 

Proposed Changes to ISO/IEC 11179 Support for ontologies, etc. More formal modeling of relationships Semantic types (?)

Changes to ISO/IEC 11179 Std.: 

Changes to ISO/IEC 11179 Std. Add support for ontologies, vocabularies Add ontologies Add predicates (logical formulae) Add axioms (asserted to be true) Add support for modularization of ontologies Add inclusion mechanisms for concept systems and ontologies Assert axioms in context of containing ontology

Why add support for ontologies?: 

Why add support for ontologies? More precise specification of data semantics (than natural language definitions) Machine processing of semantic specifications of data Classification, subsumption testing, alignment, spatial, temporal reasoning Reusable semantic specifications for subject domains Conceptual data models to facilitate data integration Encoding of much current work on data semantics and terminologies as ontologies Useful for machine learning.

Issues in Including Ontologies in ISO/IEC 11179: 

Issues in Including Ontologies in ISO/IEC 11179 Lack of agreement on logical formalisms FOL, description logic (which?), ... Hence, MDR std must be agnostic among logic formalisms Poses difficulties for: Standards specification MDR implementation MDR interoperability See work of OMG Ontology Definition Metamodel (ODM) standard

Changes to ISO/IEC 11179 Std.: 

Changes to ISO/IEC 11179 Std. Formalize specification of semantic relationships Refinement of Edition 2 Classification Schemes Add relationships (types), roles, links (instances) among concepts Specify attributes of relationships Reflexivity, irreflexivity, symmetry, anti-symmetry, transitivity To support inference across semantic relationships e.g., transitive closure over is-a, part-of, ...

Relationship Modeling in ISO/IEC 11179 Edition 3: 

Relationship Modeling in ISO/IEC 11179 Edition 3 Edition 2 has classification schemes and specialized relationships among various metamodel entities Proposed for Edition 3 Binary and N-ary semantic relationships among concepts (a.k.a. relations) Treat data element concept, conceptual value domain, value meaning, etc. as subtypes of concept More detailed characterization of relationships: Roles / links Reflexivity, symmetry, anti-symmetry, transitivity, ....

Why care about relationship characterization?: 

Why care about relationship characterization? Who cares about reflexivity, irreflexivity, symmetry, transitivity? Answer: need this information for inference on semantic relationships (usually binary) Example: Does it make sense to compute transitive closure? Is-a: transitive Part-of: sometimes transitive Equals: transitive, symmetric Similar: usually symmetric, typically not transitive

Semantic Types for ISO/IEC 11179: 

Semantic Types for ISO/IEC 11179 ISO/IEC 11179 Edition 2 has “datatypes” Associated with “value domain” i.e., datatypes are an aspect of representation NOT semantics Semantic Types Concern meaning rather than representation Uses: Constraints over relationship roles Attribute of concepts, conceptual value domains, ... Ubiquitous in ontologies, schemas, ...

Some Issues for Semantic Types: 

Some Issues for Semantic Types Alternative approaches: Build semantic types into 11179 metamodel Reuse relationships for semantic type specifications Treat semantic types as unary predicates in ontologies + axioms Should we have a standard set of semantic types (at least base types) Yes, for interoperability No, for flexibility Collection types, type constructors ?

Why Construct A Prototype?: 

Why Construct A Prototype? To explore alternative revisions to ISO/IEC 11179 To demonstrate that proposed revisions to ISO/IEC 11179 Metadata Registry Std. are: Feasible Useful To experiment with alternative architectures / technologies for constructing extended metadata registries. Text retrieval engines - Lucene Inference engines – Jena, Kowari (?), .... Service oriented architecture (SOA) To facilitate deployment of revised ISO/IEC Metadata Registries Example implementation Open Source Code !

Why Content?: 

Why Content? Content characterization assists in shaping revisions to ISO/IEC 11179 Content characterization assists in selection of content to load Content ingestion, installation, querying provides a means to exercise the prototype Testing Demonstration Performance evaluation Utility evaluation

Metadata Content Activities: 

Metadata Content Activities Content Characterization e.g., graph theoretic characterization Content Acquisition Content Preprocessing Into standard formats for loading (H. Solbrig) Content Loading Content Querying

Desiderata for Content Selection: 

Desiderata for Content Selection Accessibility Licensing, source cooperation, unclassified Documentation, familiarity to XMDR collaborators Funder interest Diversity of metadata types, subject areas Diverse graph structures (of semantic relationships) OWL encodings available Moderate size Opportunities for mappings among metadata sets Multi-linguality

Content Characterization: 

Content Characterization Provenance: Name, source, contact, ... Type of metadata: thesauri, ontology, ISO/IEC 11179 metadata registry, ... Graph Characterization Tree, Faceted Classification, partial order (directed acyclic graph), cyclic graph, ... Size: # concepts, # links, # bytes Definitions ? File Formats OWL encoding ? Multilingual Availability / licensing issues

Why Graph-theoretic Content Characterization?: 

Why Graph-theoretic Content Characterization? Important structural taxonomy Impacts: Expressivity required of registry Content representation, index structures Search, matching algorithms Computational complexity of search, matching, ... Inference algorithms Computational complexity of inference Design / implementation / performance of metadata registries

Loaded content metadatasets: 

Loaded content metadatasets National Cancer Institute Thesaurus (NCIT) Defense Technology Information Center (DTIC) Thesaurus General Multilingual Environmental Thesaurus (GEMET) Adult Mouse Anatomical Dictionary EPA Terms of the Environment ISO 3166 Country Codes ISO 4217 Currency Codes

Other Metadatasets of Interest: 

Other Metadatasets of Interest NCI Cancer Data Standards Repository (caDSR) EPA Environmental Data Registry (EDR) NLM Uniform Medical Language System (UMLS) USGS Geographic Names Information System (GNIS) Integrated Taxonomic Information System (ITIS) NBII Biocomplexity Thesaurus ISO 639 Language Identifiers Logical Observations, Identifiers, Codes (LOINC) Getty Thesaurus of Geographical Names (TGN) NASA Semantic Web Earth and Environmental Terminologies (SWEET) Dublin Core Metadata (?)

Conclusions: 

Conclusions XMDR Activities ISO/IEC 11179 Revisions Support for ontologies, etc. Relationships Semantic types Prototype Development Content (characterization, loading, query) Prototype testing, performance evaluation, demos

Coming in Second Part of Talk (Kevin Keck) :: 

Coming in Second Part of Talk (Kevin Keck) : Detailed discussion of the architecture and technology of the prototype ...

Acknowledgements: 

Acknowledgements Financial support from U.S. Dept. of Defense, U.S. Environmental Protection Agency In kind contributions from U.S. National Cancer Institute, Mayo Clinic, US Geological Survey Support from program managers: Nancy Lawler (DOD) and Sam Chance (DOD) Comments on drafts of this talk by John L. McCarthy

Contact Information:: 

Contact Information: Project: http://xmdr.org/ Frank Olken: Lawrence Berkeley National Laboratory Email: olken@lbl.gov Tel: 510-486-5891 URL: http://www.lbl.gov/~olken