Share PowerPoint. Anywhere!

cmodel intro

Uploaded from authorPOINT
Download as Download Not Available PPT
Presentation Description

No description available

Views: 37
Like it  ( Likes) Dislike it  ( Dislikes)
Added: September 25, 2007 This presentation is Public
Presentation Category :Science & Technology
Tags Add Tags
Presentation StatisticsNew!
Views on authorSTREAM: 37
Presentation Transcript

Fedora Formalizing Content Models : Fedora Formalizing Content Models Sandy Payette Co-Director Fedora Project Researcher, Cornell Information Science


What is a Content Model? : What is a Content Model? Definitions: A Structural definition for a 'type' of object (e.g., article, book, image, learning object) A set of constraints on a digital object A pattern of datastreams (number and type) A pattern of datastreams + disseminators A 'subclass' of a DigitalObject A set of rules for creating a digital object


What is a Content Model? : What is a Content Model? As of Fedora 2.1, content models are Informally defined (best practice) Identified via an object property ('cmodel') Not validated by the repository Not expressed via a standard formalism


How are Content Models Useful? : How are Content Models Useful? Object Typing: Group identity for different kinds of objects Facilitates discovery via query/search Object Validation At ingest, check that object conforms to a model At modification, make sure changes don’t break conformance to model Object Creation Templates for user interfaces enabling object creation Drive workflows/creation of 'batches' of like objects Hooks for policy enforcement


Types of Content Models : Types of Content Models Single Object Content Model Defines a pattern of datastreams (and disseminators) within a single digital object. Multi-Object Content Model Defines relationships among a set of digital objects that together make up a particular kind of entity Each related digital object also has a single object content model defining a pattern of datastreams (and disseminators)


Slide6 : Digital object identifier Reserved Datastreams Special metadata known by the system Disseminators Pointers to service definitions to provide service-mediated views Datastreams Custom content or metadata items Fedora 2.1 - Digital Object Model Container View Persistent ID (PID) Dublin Core Policy Datastream Audit Trail Relations Disseminator Datastream


Single Object Content Model : Single Object Content Model Persistent ID ( PID ) Dublin Core (DC) TEI TEXT for Letter Audit Trail (AUDIT) Relations (RELS-EXT) Default Disseminator Page Image Page Image Letter – Single Object Approach


Slide8 : Single Object Content Model 'Representational View' Letter


Slide9 : Publication content model, with behaviors Views of Related Stuff (ref to WSDL) Views of Document (ref to WSDL) Persistent ID ( PID ) DC AUDIT RELS-EXT View #1 Disseminator Default Disseminator View #2 Disseminator Document (text/xml) Dataset XSLT Service Contextualization Service


Multi-Object Content Model : Multi-Object Content Model Persistent ID ( PID ) Dublin Core (DC) TEI TEXT for Letter Audit Trail (AUDIT) Relations (RELS-EXT) Letter Disseminator Default Disseminator Letter – Multi-Object ('Atomistic') Approach


Slide11 : Multi-Object Content Model 'Representational View' Letter Image Image


Modeling a whole repository : Modeling a whole repository Some institutions have created an overall data model for the repository to show a macro view of multiple content models and how they relate to each other Examples: NSDL’s NDR data model eSciDoc logical data model


Slide13 :


Dimensions of Content Modeling : Dimensions of Content Modeling Level 1: Data Type Definition Level 2: Structure Definition Level 3: Service Definition (Behavior) Level 4: Logical (abstract semantics)


Requirement: Data Type Definition : Requirement: Data Type Definition Definition of bytestream formats MIME Format Identifiers Pronom Global Digital Format Registry (GDFR) Important to enable validation… JHOVE: provides functions to perform format-specific identification, validation, and characterization of digital objects. DROID: (Digital Record Object Identification) is a software tool developed by The National Archives to perform automated batch identification of file formats.


Requirement: Structure Definition : Requirement: Structure Definition Definition of a compositional pattern of datastreams in a digital object, often expressed with some notion of semantics Simple semantic typing (flat model) Semantic typing, plus datastream relationships Proposed 'RELS-INT' Can also be a composition of digital objects Network of objects Semantic typing, plus relationships RELS-EXT Structure model should be able to support AND (e.g., article AND dataset) OR (bitonal OR color image)


Optional: Service Definition (behaviors) : Optional: Service Definition (behaviors) Definition of service operations for an object Abstract behavior Concrete (binding information to actually run) Stored as BDef/BMech objects Requires assertion of BDef/BMech to compatible digital objects Currently done via a disseminator New proposal for looser binding (CMDA)


Future: Logical Semantics : Future: Logical Semantics Semantic view of a digital object Formal Identifiers (URIs) for components Promote cross-repository interoperability


Semantic Interoperability Experiments : Semantic Interoperability Experiments Asset Action Definition (DLF Aquifer) '…precise and standardized information about how to access different views of a resource can facilitate reuse of more advanced tools and the manipulation in aggregated environments of widely dispersed content.' Demo: http://rama.grainger.uiuc.edu/assetactions/index.asp


DLF Aquifer Asset Actions : DLF Aquifer Asset Actions


Semantic Interoperability Experiments : Pathways Project (Cornell/LANL) Repositories expose objects as Pathways Core (RDF/XML) Includes semantic URIs for entities in the model Dynamic service matching based on semantics Demo: http://memex.cs.cornell.edu:8000/fedora/search Semantic Interoperability Experiments


Pathways Semantic URIs : Pathways Semantic URIs info:pathways/semantic/abstract info:pathways/semantic/article info:pathways/semantic/article-fulltext info:pathways/semantic/bibliographic-citation info:pathways/semantic/bibliography info:pathways/semantic/collection info:pathways/semantic/data info:pathways/semantic/data-collection info:pathways/semantic/dataset info:pathways/semantic/dataset-primary info:pathways/semantic/dataset-revision info:pathways/semantic/descriptive-metadata info:pathways/semantic/figure info:pathways/semantic/graph info:pathways/semantic/image info:pathways/semantic/journal info:pathways/semantic/journal-article info:pathways/semantic/journal-issue


Pathways Dissemination: Journal article with semantics : Pathways Dissemination: Journal article with semantics


Content Model FormalizationPossible Approaches : Content Model Formalization Possible Approaches XML-based Express model constraints via a simple XML format Fedora CMDA proposal VTLS content model schema Other? Rule-based Express model rules via Schematron (XPATH) Other? Ontology-based Express formal, logical models via OWL/Protégé Good for reasoning, harder to use for direct validation


CMDA Proposal: Content Model Objects : CMDA Proposal: Content Model Objects


Which content model style to use ? : Which content model style to use ? Fedora does not prescribe a content model for objects There appear to be two primary ways of thinking about content models Multi-object models ('atomistic') Single-object models ('compound') Choice of content model is dependent on the structure of your content, your internal workflow, and your anticipated delivery and search methodology


Slide27 : Questions and Discussion