Presentation Transcript
Fedora Formalizing Content Models : Fedora Formalizing Content Models Sandy Payette
Co-Director Fedora Project
Researcher, Cornell Information Science
What is a Content Model? : What is a Content Model? Definitions:
A Structural definition for a 'type' of object (e.g., article, book, image, learning object)
A set of constraints on a digital object
A pattern of datastreams (number and type)
A pattern of datastreams + disseminators
A 'subclass' of a DigitalObject
A set of rules for creating a digital object
What is a Content Model? : What is a Content Model? As of Fedora 2.1, content models are
Informally defined (best practice)
Identified via an object property ('cmodel')
Not validated by the repository
Not expressed via a standard formalism
How are Content Models Useful? : How are Content Models Useful? Object Typing:
Group identity for different kinds of objects
Facilitates discovery via query/search
Object Validation
At ingest, check that object conforms to a model
At modification, make sure changes don’t break conformance to model
Object Creation
Templates for user interfaces enabling object creation
Drive workflows/creation of 'batches' of like objects
Hooks for policy enforcement
Types of Content Models : Types of Content Models Single Object Content Model
Defines a pattern of datastreams (and disseminators) within a single digital object.
Multi-Object Content Model
Defines relationships among a set of digital objects that together make up a particular kind of entity
Each related digital object also has a single object content model defining a pattern of datastreams (and disseminators)
Slide6 : Digital object identifier Reserved Datastreams
Special metadata known by the system Disseminators
Pointers to service definitions to
provide service-mediated views Datastreams
Custom content or metadata items Fedora 2.1 - Digital Object Model Container View Persistent ID (PID) Dublin Core Policy Datastream Audit Trail Relations Disseminator Datastream
Single Object Content Model : Single Object Content Model Persistent ID ( PID ) Dublin Core (DC) TEI TEXT for Letter Audit Trail (AUDIT) Relations (RELS-EXT) Default Disseminator Page Image Page Image Letter – Single Object Approach
Slide8 : Single Object Content Model 'Representational View' Letter
Slide9 : Publication content model, with behaviors Views of Related Stuff
(ref to WSDL) Views of
Document
(ref to WSDL) Persistent ID ( PID ) DC AUDIT RELS-EXT View #1 Disseminator Default Disseminator View #2 Disseminator Document (text/xml) Dataset XSLT Service Contextualization Service
Multi-Object Content Model : Multi-Object Content Model Persistent ID ( PID ) Dublin Core (DC) TEI TEXT for Letter Audit Trail (AUDIT) Relations (RELS-EXT) Letter Disseminator Default Disseminator Letter – Multi-Object ('Atomistic') Approach
Slide11 : Multi-Object Content Model 'Representational View' Letter Image Image
Modeling a whole repository : Modeling a whole repository Some institutions have created an overall data model for the repository to show a macro view of multiple content models and how they relate to each other
Examples:
NSDL’s NDR data model
eSciDoc logical data model
Slide13 :
Dimensions of Content Modeling : Dimensions of Content Modeling Level 1: Data Type Definition
Level 2: Structure Definition
Level 3: Service Definition (Behavior)
Level 4: Logical (abstract semantics)
Requirement: Data Type Definition : Requirement: Data Type Definition Definition of bytestream formats
MIME
Format Identifiers
Pronom
Global Digital Format Registry (GDFR)
Important to enable validation…
JHOVE: provides functions to perform format-specific identification, validation, and characterization of digital objects.
DROID: (Digital Record Object Identification) is a software tool developed by The National Archives to perform automated batch identification of file formats.
Requirement: Structure Definition : Requirement: Structure Definition Definition of a compositional pattern of datastreams in a digital object, often expressed with some notion of semantics
Simple semantic typing (flat model)
Semantic typing, plus datastream relationships
Proposed 'RELS-INT'
Can also be a composition of digital objects
Network of objects
Semantic typing, plus relationships
RELS-EXT
Structure model should be able to support
AND (e.g., article AND dataset)
OR (bitonal OR color image)
Optional: Service Definition (behaviors) : Optional: Service Definition (behaviors) Definition of service operations for an object
Abstract behavior
Concrete (binding information to actually run)
Stored as BDef/BMech objects
Requires assertion of BDef/BMech to compatible digital objects
Currently done via a disseminator
New proposal for looser binding (CMDA)
Future: Logical Semantics : Future: Logical Semantics Semantic view of a digital object
Formal Identifiers (URIs) for components
Promote cross-repository interoperability
Semantic Interoperability Experiments : Semantic Interoperability Experiments Asset Action Definition (DLF Aquifer)
'…precise and standardized information about how to access different views of a resource can facilitate reuse of more advanced tools and the manipulation in aggregated environments of widely dispersed content.'
Demo: http://rama.grainger.uiuc.edu/assetactions/index.asp
DLF Aquifer Asset Actions : DLF Aquifer Asset Actions
Semantic Interoperability Experiments :
Pathways Project (Cornell/LANL)
Repositories expose objects as Pathways Core (RDF/XML)
Includes semantic URIs for entities in the model
Dynamic service matching based on semantics
Demo: http://memex.cs.cornell.edu:8000/fedora/search
Semantic Interoperability Experiments
Pathways Semantic URIs : Pathways Semantic URIs info:pathways/semantic/abstract
info:pathways/semantic/article
info:pathways/semantic/article-fulltext
info:pathways/semantic/bibliographic-citation
info:pathways/semantic/bibliography
info:pathways/semantic/collection
info:pathways/semantic/data
info:pathways/semantic/data-collection
info:pathways/semantic/dataset
info:pathways/semantic/dataset-primary
info:pathways/semantic/dataset-revision
info:pathways/semantic/descriptive-metadata
info:pathways/semantic/figure
info:pathways/semantic/graph
info:pathways/semantic/image
info:pathways/semantic/journal
info:pathways/semantic/journal-article
info:pathways/semantic/journal-issue
Pathways Dissemination: Journal article with semantics : Pathways Dissemination: Journal article with semantics
Content Model FormalizationPossible Approaches : Content Model Formalization Possible Approaches XML-based
Express model constraints via a simple XML format
Fedora CMDA proposal
VTLS content model schema
Other?
Rule-based
Express model rules via Schematron (XPATH)
Other?
Ontology-based
Express formal, logical models via OWL/Protégé
Good for reasoning, harder to use for direct validation
CMDA Proposal: Content Model Objects : CMDA Proposal: Content Model Objects
Which content model style to use ? : Which content model style to use ? Fedora does not prescribe a content model for objects
There appear to be two primary ways of thinking about content models
Multi-object models ('atomistic')
Single-object models ('compound')
Choice of content model is dependent on the structure of your content, your internal workflow, and your anticipated delivery and search methodology
Slide27 : Questions
and
Discussion
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.