Gheller snap

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

The Simple Numerical Access Protocol (SNAP) for theoretical data Claudio Gheller CINECA (c.gheller@cineca.it)

Slide2: 

The Theoretical Virtual Observatory (TVO) The main purpose of the TVO is to create a distributed database of simulated data accessible from anywhere in a easy and transparent way, and to include some services to allow the user to download data extract information from them Data: Data growth: extremely fast (approx. double each year) Data must be stored, organized, described Applications: Demand for computational resources for data analysis (computers, software, services) Algorithmic intensity, nonlinearity, bandwidth limits Data must be handled with NEW methods. Traditional techniques are no more enough. ITVO represents the Italian effort toward the theoretical archive

Slide3: 

Data levels Mimicing observational data, sumulated data can be organized in 3 levels Level 0: direct outcome of the simulation. Examples are the coordinates and velocities of files in an N-Body simulation, the density field on the computational mesh of a Jet simulation etc. Level 1: data extracted or derived from the simulation results, having the same characteristics of the simulation results themselves. For example, the coordinates of the points that build up a galaxy cluster extracted from a cosmological simulation using a friend of friends algortihm Level 2: results that have been obtained after an analysis process from Level 0 and Level 1 data. Examples are projected maps, statistical functions, Virtual Telescope applications.

Slide4: 

Data schema (cosmology)

Slide5: 

Level 0 Data: simulation codes

Slide6: 

Level 0 Data: file formats Data produced by sumulation codes are stored in files with different and, usually, non-standard formats. This make it difficult to handle and exchange data E.g. Gadget as its own format file (although it supports also HDF5). This format has no access library support, it is not extensible, data access is not efficient, it is strictly linked to the application. File formats should be: standard Flexible Extensible Portable Fast Easily usable by applications SELF DESCRIPTIVE Possible solutions: FITS HDF5 VOTables

Slide7: 

File formats: HDF5 HDF5 (http://hdf.ncsa.uiuc.edu) represents a possible solution to deal with such data HDF5 is Portable between most of modern platform High performance Well supported Well documented Rich of tools HDF5 data files are Platform independent (portable) Well organized Self defined Metadata enriched Efficiently accessible HDF5 drawbacks Requires some expertise and skill to be used Information are difficult to access Can be subject to major library changes (see HDF4 to HDF5)

Slide8: 

File formats: HDF5 self-consistency Each file represents an output time The structure is simple: all the data objects are at the root level: /BmMassDensity Dataset {512, 512, 512} /BmTemperature Dataset {512, 512, 512} /BmVelocity Dataset {512, 512, 512, 3} /DmMassDensity Dataset {512, 512, 512} /DmPosition Dataset {134217728, 3} /DmVelocity Dataset {134217728, 3} HDF5 metadata make the file completely self-consistent Structural metadata (strictly required from the library) rank Dimensionality Annotation metadata (required from our implementation) Data object name Data object description Unit Formula Data objects (at the moment) can be: Structured grid: rank 4 (scalars or vectors) Unstructured points: rank 2 (scalar or vectors)

Slide9: 

Simple Numerical Access Protocol - SNAP SNAP specification defines a prototype standard for retrieving theoretical data from a variety of astronomical simulation repositories. Data can be the outcome of different kinds of numerical applications. However, SNAP is designed to address numerical simulation outputs organized as follows: Time concept is supported. For each timestep, the information must be sampled in a generic 3D space Positions in this volume are called x, y and z. The sampling can be regular (e.g. cartesian mesh) or irregular (e.g. particle or adaptive mesh position). Each mesh/particle position in the 3D space hosts the same physical quantity (i.e. mass, density, velocity, etc) for each timestep.

Slide10: 

SNAP main stages 1. Search for available simulations and data. The query is on metadata. The result is an XML document (maybe VOTable) with matching result metadata. 2. Identification of subset of interest. The user identifies and set a subset of the full simulation data which is of interest. This subset is defined both in time and in space. 3. Snap request. Send to the server the selection parameters for the Snap action 4. Data staging and delivery. Metadata are immediately delivered to the client as a VOTable. Data are delivered (possibly after some time, needed for extraction) via HTTP, FTP as binary files.

Slide11: 

Simulation discovery The user submit a query, searching for simulations of specific chatacteristics (cosmological model, simulation code, authors…) The possible parameters for the query are defined according to the data model The data model implementation must be specialized to the specific application field. It is not feasible to create a single parameter “space” Results are delivered as VOTables specifing: The parameters of the query The number of hits The main features of each hit The physical location of the hits A “thumbnail” of the result

Slide12: 

Selecting a region Notice that regions are to be thought as 3D selection in a generic phase space SNAP allows the user to select a rectangular or cubic region in two steps: A representative subset of the data (the “thumbnail”) is downloaded and acts as a reference dataset to set the selection An appropriate application (web based, visualization tool…) is used to set the selected region. E.g. VisIVO (see M. Comparato presentation)

Slide13: 

The SNAP service The main target of the SNAP service is the access to the raw data from a simulation, selected by a general Simulation Query The SNAP service in general provides the following functionalities: Extraction of a subset of data selected in a rectangular or spherical volume Storage of the associated metadata in a VOTable Delivery of the result to the user via http, ftp etc. The extraction phase 1, allows the user to focus on regions of interest, without having to download the whole dataset. Nevertheless, retrieving the complete dataset is still possible.

Slide14: 

The SNAP input parameters An input Sub-Volume query consists of an x,y,z position in the box, plus the side lengths (or radius) of the rectangular (spherical) region surrounding this point. These quantities may be specified as fraction of the box, therefore specific units are not necessary. Conversion to internal units is performed at the server side. The service MUST support the following three parameters: POS The position of the center of the region of interest, expressed in proper units. Example: "POS=0.3,0.25,0.9". A NULL value represents the center of the whole box (e.g. 0.5,0.5,0.5). SIZE The size of the sides (or the radius) of the region in proper units. The region may be specified using either one or three values. If only one value is given it applies to all coordinate axes (alternatively it is the radius of the sphere). The format of the SIZE parameter is the same as that for POS. Example “SIZE=0.2,0.5,0.3”. A special case is SIZE=NULL, which represents the whole box.

Slide15: 

The SNAP input parameters (cont.ed) The following parameters may be supported: SHAPE BOX or SPHERE (but this could be redundant: 1 value=sphere, 3 values=cube) BOUNDARY Also this parameter can have one or three values, one for each coordinate direction. If only one value is given it applies to all coordinate axes. Possible values are: TRUNC – if the interesting region exceeds the computational box, it is resized at the box boundary PERIODIC - if the interesting region exceeds the computational box, data are selected from the opposite side of the box Metadata of the service indicates whether periodic is supported. FIELDS The service MAY support an optional parameter with the name FIELDS, the value of which is a comma separated list of field names corresponding to the data elements the simulation can return. If the parameter is not provided the default behavior is to return all fields.

Slide16: 

SNAP output The result of the SNAP query consists in A VOTable with the description of the result and of the data A binary file with the extracted data Only the VOTable (small) is immediately returned to the user. The description VOTable consists in the following elements: a RESOURCE element, identified with the tag type="results", containing at least a TABLE element which contains the results of the query. The RESOURCE element SHOULD contain an INFO with name="QUERY_STATUS". Its value attribute should set to "OK" if the query executed successfully The TABLE in the output VOTable MUST contain FIELDs, that refer to the variables stored in the external binary file. FIELDS can be organized either as table or as complete sequences Variables must be scalars, i.e. vectors (or more general multidimensional quantities) are not supported. This means that each FIELD represents a scalar value. E.g. temperature of each point, x coordinate of a particle.

Slide17: 

SNAP output (cont.ed) Each FIELD must specify the datatype, the arraysize and the unit of the variable. Furthermore name, ID, and ucd has to be set. The ucds for simulations are still in progress, therefore we do not enter in more details. The fields must specify the geometry parameter, which at present can have the values “n-body”, “mesh” and “amr”. The binary data filename is specified in a DATA section, according to the rules defined in other IVOA specifications (e.g. SIAP specification) Other parameters may be supported according to the services offered by the data provider.

Slide18: 

SNAP VOTables examples 1 VOTable for the velocity field of a fluid on a fixed 3D mesh <RESOURCE name="myVectorField" type="results" > <DESCRIPTION>Velocity Field from N-Body run</DESCRIPTION> <INFO name="QUERY_STATUS" value="OK"/> <TABLE name="VelocityField" ID="Vel" order="sequential"> <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x" datatype="float" arraysize="41x41x41" unit="km/s" geometry="mesh" /> <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y" datatype="float" arraysize="41x41x41" unit="km/s" geometry="mesh" /> <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z" datatype="float" arraysize="41x41x41" unit="km/s" geometry="mesh" /> <DATA> <BINARY> <STREAM href="file:///scratch/myhome/test.bin"/> </BINARY> </DATA> </TABLE> </RESOURCE> </VOTABLE>

Slide19: 

SNAP VOTables examples 2 VOTable for the velocity and position fields of particles from an N-Body simulation <RESOURCE name=myParticles type="results"> <INFO name="QUERY_STATUS" value="OK"/> <TABLE name="Particles" ID="NBody" order="tabular"> <FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x" datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /> <FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y" datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /> <FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z" datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /> <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x" datatype="float"arraysize="100000" unit="km/s" geometry="particles" /> <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y" datatype="float"arraysize="100000" unit="km/s" geometry="particles" /> <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z" datatype="float" arraysize="100000" unit="km/s" /> <DATA> <BINARY> <STREAM href="file:///scratch/myhome/test.bin"/> </BINARY> </DATA> </TABLE> </RESOURCE> </VOTABLE>

Slide20: 

SNAP VOTables examples 3 VOTable for the temperature field of a mesh based quantity and the position of N-Body particles extracted from the same spatial region. <RESOURCE name=myMixedData type="results"> <INFO name="QUERY_STATUS" value="OK"/> <TABLE name="ParticlesAndMesh" ID="NBody" order="sequential"> <FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x" datatype="float" arraysize="100000" unit="Mpc" geometry="particles" /> <FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y" datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /> <FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z" datatype="float"arraysize="100000" unit="Mpc" geometry="particles" /> <FIELD name="temperature" ID="temp" ucd="phys.temperature;pos.cartesian" datatype="float"arraysize="41x41x41" unit="K" geometry="mesh" /> <DATA> <BINARY> <STREAM href="file:///scratch/myhome/test.bin"/> </BINARY> </DATA> </TABLE> </RESOURCE> </VOTABLE> Different geometry quantities could be specified also in different tables

Slide21: 

Data Staging By Data Staging we refer to the processing the server performs to retrieve or generate the requested simulation volume or subvolume and cache them in online storage for retrieval by a client. Staging is necessary for large archives which must retrieve simulation data from hierarchical storage, or for services which can dynamically extract subvolumes, where it may take a substantial time (e.g., minutes or hours) to retrieve the data in the relevant region of the simulation box When staging of data is necessary, the technique used is to stage data on the server for later retrieval by the client; the data is only staged for a period of time and is eventually deleted by the service As soon as staged data are available at the given URL, the user can start the download procedure. The user can be informed of the availability of the data following two different approaches: The client searches for information on the service. The service searches for the client and, if present, sends information to it. [Requires authentication]

Slide22: 

Data Delivery The snapshot retrieval request (getSnap web method) allows a client to retrieve a single raw simulation file given an access reference or "acref" as returned by a prior simulation query. Data can be delivered with various protocols like http, ftp, Gridftp…

Slide23: 

SNAP registration The SNAP service MUST be registered by providing the information Registration allows clients to use a central registry service to locate compliant simulation access services and select an optimal subset of services to query, based on the characteristics and functions of each service and the simulation data collections it serves. This part of the work is at the beginning of its development.

Slide24: 

People around SNAP C. Gheller and I. Girotto (CINECA) U. Becciani, A. Costa, V. Costa (OACT) F. Pasian, R. Smareglia, V. Manna, L. Smareglia, P. Manzato, G. Taffoni (OATS) G. Lemson (MPI Munich) L. Shaw (Cambridge) H. Wozniak (Lyon) P. Teuben (Maryland) More infos: http://www.ivoa.net