AGU 2006 Woolf IN53C 02

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

‘Feature types’ as an integration bridge in the climate sciences: 

‘Feature types’ as an integration bridge in the climate sciences Andrew Woolf (1,*), Bryan Lawrence (2), Jeremy Tandy (3), Keiran Millard (4), Dominic Lowe (2), Sam Pepler (2) (1) CCLRC e-Science Centre, (2) British Atmospheric Data Centre, (3) Met Office, (4) HR Wallingford (*) Corresponding author email: A.Woolf@rl.ac.uk

Outline: 

Outline Background ‘container’ vs ‘content’ BADC feature types The data management pipeline ingestion integration management use Examples CSML Observations and Measurements

Background: container vs content: 

Background: container vs content Storage-centred data management focuses on container, not content different stovepipes for different storage granularity impacts entire pipeline backend exposed throughout integration difficult maintenance complexity

Background: e.g. BADC: 

Background: e.g. BADC British Atmospheric Data Centre http://badc.nerc.ac.uk UK NERC designed data centre ~60 Tb, ~130 datasets NERC programmes, Met Office, ECMWF, NASA, ... ground-based observation networks, model output (NWP, climate), satellite data

Background: e.g. BADC: 

Background: e.g. BADC

Background: e.g. BADC: 

Background: e.g. BADC

Background: e.g. BADC: 

Background: e.g. BADC Nearly all the data at the BADC has geospatial information But it is not represented in a standard way Lots of types of geospatial and temporal things with no clear categorisation

Background: e.g. BADC: 

The current way of doing things makes it hard to integrate data from other data repositories… …, or other datasets… …, or even data from within the same dataset sometimes! Background: e.g. BADC

Background: ‘feature types’: 

Background: ‘feature types’ Emerging ISO standards TC211 – around 40 standards for geographic information Cover activity spectrum: discovery  access  use ISO 19101 Domain Reference Model

Background: ‘feature types’: 

[from ISO 19109 “Geographic information – Rules for Application Schema”] Background: ‘feature types’ Geographic ‘features’ “abstraction of real world phenomena” [ISO 19101] Type or instance Encapsulate important semantics in universe of discourse Application schema Defines semantic content and logical structure of datasets ISO standards provide toolkit: spatial/temporal referencing geometry (1-, 2-, 3-D) topology dictionaries (phenomena, units, etc.) GML – canonical encoding

Background: ‘feature types’: 

Background: ‘feature types’ “lifetime of a technical implementation is shorter than the lifetime of the information it handles” (CEN/TR 15449) Loosens coupling between storage artefacts and data management infrastructure: breaks the link between storage and discovery/access front-end can expose information rather than files entire infrastructure more independent of back-end

Data management pipeline: ingestion: 

Data management pipeline: ingestion “What’s a dataset ?” BADC currently: “A collection of files with a common theme and administration” Alternative: “A collection of feature instances with a common theme and administration” better for integration more natural granularity for use independent of physical storage format

Data management pipeline: integration: 

Data management pipeline: integration e.g. UK NERC DataGrid

Data management pipeline: integration: 

Data management pipeline: integration ‘Feature types’ provide integration key common language across providers/users e.g. oceanographers / meteorologists share discussion about semantics of data despite format differences Standard mechanism for ‘relating’ data ‘association’ is part of General Feature Model (rather than determined by file/directory structures)

Data management pipeline: management: 

Data management pipeline: management How to manage preservation/curation of storage artefacts ? A ‘features view’ redirects the emphasis to preserving the feature rather than the file e.g. become less hung-up on GRIBnetCDF conversion object-with-attributes is the curation focus cf. OAIS (ISO 14721):

Data management pipeline: use: 

Data management pipeline: use Currently, have to ‘back out’ information content – ‘features’ make this explicit enables standard patterns for ‘context’, e.g. OGC Observations and Measurements ‘Features’ are closer to applications can be leveraged for value-added services General Feature Model/UML ‘operations’ (Work needed on implementation!)

Data management pipeline: use: 

Data management pipeline: use Visualisation generic visualisation capability fraught! feature types make this more explicit Discovery ‘feature collections’ more natural granularity than file/directory collections or database tables

Data management pipeline: use: 

Data management pipeline: use Mediator architecture n+m, not n*m ! ‘Feature types’ view

Data management pipeline: use: 

Data management pipeline: use Integrates climate science data within mainstream ‘spatial data infrastructure’ e.g. EU INSPIRE Directive enhances cross-disciplinary use

Examples: 

Examples Climate Science Modelling Language (CSML)

Examples: 

Examples OGC ‘Observations and Measurements’ An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure CSML

Summary: 

Summary Data management problems arise from traditional ‘storage-oriented’ view ‘Feature types’ encapsulate information semantics Provides integration key across granularity range Potential benefits for entire data management pipeline ingestion  integration  management  use

authorStream Live Help