Lawrence CAS2K3

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

The NERC DataGrid – Building Bridges for the Environmental Sciences: 

The NERC DataGrid – Building Bridges for the Environmental Sciences Bryan Lawrence Kerstin Kleese, Roy Lowry, Kevin O’Neill, Andrew Woolf & others Head, NCAS/British Atmospheric Data Centre Rutherford Appleton Laboratory, CCLRC

NDG Partners: 

NDG Partners As funded a partnership between British Atmospheric Data Centre (BADC, PI: Bryan Lawrence) British Oceanographic Data Centre (BODC, Co-I: Roy Lowry) CLRC E-science Centre (Co-I: Kerstin Kleese) PCMDI at LNL in the US (Dean Williams, Bob Drach, Mike Fiorino) Project has caught the imagination, extra funding now supports: A number of groups at the NERC Centre for Ecology and Hydrology (CEH: Ecology DataGrid) NERC Earth Observation Data Centre & Plymouth Marine Lab Remote Sensing Not directly funded major collaborators will include: ClimatePrediction.net, GODIVA (NERC e-science projects) NCAS/CGAM: The Centre for Global Atmospheric Modelling at the University of Reading (via Lois Stenman-Clark and Katherine Bouton) Already required to provide technology to support the major UK project: HIGEM (a collaboration between the Hadley Centre and the NERC academic community to develop the next generation of high resolution GCM models based on HadGEM).

Outline: 

Outline Motivation: The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture Metadata Model Data Model ISO Context NDG Prototype Status Summary & Challenges

The British Oceanographic Data Centre: 

The British Oceanographic Data Centre (not for much longer, moving to a site on Liverpool University campus imminently)

BODC Mission Statement: 

BODC Mission Statement To operate a world class data centre in support of UK marine science by: providing data management support for UK marine science projects maintaining and developing the UK’s national oceanographic database developing innovative marine data products and digital atlases collaborating, on behalf of the UK, in the international exchange and management of oceanographic data making high quality data readily available to UK research scientists in academia, government and industry

British Atmospheric Data Centre: 

British Atmospheric Data Centre The Role: Key words: Curation and Facilitation!

BADC Users : 

BADC Users 3800 registered in March03 ~ 300 individual users per month Users by Discipline November 02, 2150 Users

BADC Storage Capacity: 

BADC Storage Capacity Approx 50 TB (Nov02) Projected to quadruple well within next couple of years given existing commitments Planning exercise under way now. Committed to keeping as much as possible on spinning disk Further backup and extra storage at national archival centre (ATLAS, PB soon) 2.5Gb

Huge variety of Data Sets: 

Huge variety of Data Sets

Querying datasets: 

Querying datasets Complex Metadata, held in Ingres database: export DIF and Z39.50

Different types of data returned: Wallingford: 

Different types of data returned: Wallingford Supporting very diverse user community: NetCDF is not enough …

NERC Metadata Gateway - SST: 

NERC Metadata Gateway - SST No clean handover from discovery to browse and use! Geospatial coordinates forgotten. Time reference forgotten. Need to get entire field(s), and find correct time! And if I want to compare data from different locations? - multiple logins - multiple formats - discovery?

Outline: 

Outline Motivation: The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture Metadata Model Data Model ISO Context NDG Prototype Status Summary & Challenges

The NERC DataGrid: 

The NERC DataGrid

Metadata Origins: 

Metadata Origins Consider a hierarchy of data users beginning with an individual scientist, who may herself be part of a research group, itself part of a community sharing resources, lying in the wider internet … To be well integrated the metadata should have a role at each level! (The data portal client and server interface may be different at each level). At each level “extra” metadata will be required, probably produced by dedicated staff at the research group, or data centre.

A google for data; the metadata carrot!: 

A google for data; the metadata carrot!

Outline: 

Outline Motivation: The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture Metadata Model Data Model ISO Context NDG Prototype Status Summary & Challenges

NDG Metadata Taxonomy: 

NDG Metadata Taxonomy

Separate data (A) and metadata (B) models: 

Separate data (A) and metadata (B) models Clear separation of function Difference between data use and discovery etc. “Tuning” of metadata to include relevant detail Allows increased reuse of metadata model Avoids tie-in to details of a particular fields data formats Can plug-in another data model Metadata Model Data Model Data granule ID Data summary

(A) NDG Data Model: Overview: 

(A) NDG Data Model: Overview Dataset: named container for a number of variables Variable: physical parameters within the dataset; controlled vocabularies eg BODC datadictionary, CF standard names Array: multidimensional container for other arrays or numeric data Coordinate: may be shared between multiple Arrays; ‘anonymous’ if not georeferenced; MappedCoordinate vs ProductCoordinate; with respect to a Coordinate reference System (ref ISO 19111, ISO 19115) GranuleDescriptor: describes data granule in terms of file storage; enables file aggregation; SQL/OGSA-DAI for RDBMS; physical or logical (eg SRB) files “Profiles” of model defined for important data types

Array: 

NDG Data Model Array

(B) Metadata Model: 

(B) Metadata Model

(B) Metadata Model: an NDG Intermediate Schema, Conceptual Overview: 

(B) Metadata Model: an NDG Intermediate Schema, Conceptual Overview

Outline: 

Outline Motivation: The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture Metadata Model Data Model ISO Context NDG Prototype Status Summary & Challenges

ISO TC211: 

ISO 19101: Geographic information – Reference model ISO 19103: Geographic information – Conceptual schema language ISO 19107: Geographic information – Spatial schema ISO 19108: Geographic information – Temporal schema ISO 19109: Geographic information – Rules for application schema ISO 19111: Geographic information – Spatial referencing by coordinates ISO 19115: Geographic information – Metadata ISO 19118: Geographic information – Encoding ISO 19121: Geographic information – Imagery and gridded data ISO TC211

ISO19115: 

ISO19115

ISO: 

Metadata extensions and profiles ISO Direct relationship between ISO19115 and our (B) Intermediate schema.

ISO19101: 

Profiling of ISO 191xx “The comprehensiveness and large number of options available in various base standards make it difficult to combine them for practical applications. … A profile integrates a set of base standards and/or modules (predefined subsets) of base standards to meet a specific implementation requirement.” Registration of profiles “A profile that is registered through an ISO registration procedure becomes an International Standardized Profile (ISP). National standards that are expressed as profiles of ISO base standards may be registered at a national level.” ISO19101

Further Application in NERC DataGrid: 

Further Application in NERC DataGrid eg Data model “Coordinates”

Outline: 

Outline Motivation: The BADC, BODC, and the Metadata Gateway The NDG Goal NDG Metadata Structures and Architecture Metadata Model Data Model ISO Context NDG Prototype Status Summary & Challenges

The Data Use Chain: 

The Data Use Chain

Key Components – need APIs and standards: 

Key Components – need APIs and standards Globus Harvest

NDG Discovery Service Element: 

NDG Discovery Service Element Traditional and Grid Service (GT3) Interfaces

Starting with the LAS: 

Starting with the LAS Deployment for UK users within a few weeks (constraint is primarily access control)

LAS – Simple Box fill Output: 

LAS – Simple Box fill Output Work for us to do: Labelling is inadequate as yet ..

Cache management in LAS/CDAT: 

Cache management in LAS/CDAT Cache also checks if enough room, deletes oldest files if necessary and checks against disk space limit.

NERC DataGrid Prototype: 

NERC DataGrid Prototype (by hand) Ingestion of ACSOE data from BADC and BODC. NASA GCMD DIF based discovery Exported from Intermediate Schema Harvested by hand Working on hand-over-mechanism to pass dataset info to DataModel based LAS service Generate and populate LAS database in response Use standard LAS delivery Next Steps: GT3 based services, improve LAS, improve delivery, implement multiple datamodel profiles, implement multiple discovery services.

Summary: 

Summary NDG project running for a year now, aiming to provide grid-enabled tools to support: a diverse community with diverse datasets NDG part of the UK National E-science programme, and will leverage off other projects to implement grid solutions. initial prototype web-service based GT3 prototype due early in the new year Software development based on plagiarising the maximum amount from other groups, and a standards based approach within the NDG. All code will be in the public domain Major challenge will not be technical; policy, attitudes, legal issues.

You’ve gone TOO FAR!: 

You’ve gone TOO FAR!

authorStream Live Help