http://egee.hu/grid05/index.php?m=3 : http://egee.hu/grid05/index.php?m=3
Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT : EGEE is a project funded by the European Union under contract IST-2003-508833 Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT www.eu-egee.org
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
Data Management: general concepts : Data Management: general concepts What does 'Data Management' mean?
Users and applications produce and require data
Data may be stored in Grid files
Granularity is at the 'file' level (no data 'structures')
Users and applications need to handle files on the Grid
Files are stored in appropriate permanent resources called 'Storage Elements' (SE)
Present almost at every site together with computing resources
We will treat a storage element as a 'black box' where we can store data
Appropriate data management utilities/services hide internal structure of SE
Appropriate data management utilities/services hide details on transfer protocols
Data Management: general concepts : Data Management: general concepts A Grid file is READ-ONLY (at least in egee/LCG)
It can not be modified
It can be deleted (so it can be replaced)
Files are heterogeneous (ascii, binary …)
High level Data Management tools (lcg_utils, see later) hide
transport layer details (protocols …)
Storage location
To use lower level tools (edg-gridftp, see later ) you need
some knowledge of the transport layer
some knowledge of Storage Element implementation
Some details on protocols : Some details on protocols Data channel protocol: mostly gridFTP (gsiftp)
secure and efficient data movement
extends the standard FTP protocol
Public-key-based Grid Security Infrastructure (GSI) support
Third-party control of data transfer
Parallel data transfer
Other protocols are available, especially for File I/O
rfio protocol:
for CASTOR SE (and classic SE)
Not yet GSI enabled
gsidcap protocol:
for secure access to dCache SE
file protocol:
for local file access
Other Control Channel Protocols (SRM, discussed in SE lecture … )
Data Management operations : Data Management operations SE CE SE CE Several Grid Components Upload a file to the grid
U ser need to store data in SE (from a U I)
Application need to store data in SE (from a CE)
U ser need to store the application (to be retrieved and run on a CE)
For small files the InputSandbox can be used (see WMS lecture)
Data Management operations : Download files from the grid
User need to retrieve (onto the UI) data stored into SE
For small files produced in WN the OutputSandbox can be used
(see WMS lecture)
Applications need to copy data locally (into the CE) and use them
The application itself must be downloaded onto the CE and run
Data Management operations SE CE SE CE Several Grid Components
Data Management operations : Replicate a file across different SEs
Load share balacing of computing resources
Often a job needs to run at a site where a copy of input data is present
See InputData JDL attribute in WMS lecture
Performance improvement in data access
Several applications might need to access the same file concurrently
Important for redundancy of key files (backup)
Data Management operations SE CE SE CE Several Grid Components
Slide11 : One of the base idea of LCG:
Let us bring the little programs close to the
big files Asymmetry in JDL:
In given situation it is the task of the user to copy the GRID files mentioned in Input Data to the CE
The JDL supports the creating of GRID files from local files via Output Data
Data management operations : Data management operations Data Management means movement and replication of files across/on grid elements
Grid DM tools/applications/services can be used for all kind of files
HOWEVER
Data Management focuses on 'large' files
large means greater than ~20MB
Tipically on the order of few hundreds MB
Tools/applications/services are optimized to deal with large files
In many cases, small files can be efficiently treated using different procedures
Examples:
User can ship data to be used by the application on the WN (and possibly the application itself) using the InputSandbox (see WMS lecture)
User can retrieve (on the UI) data generated by a job (on the WN) using the OutputSandbox (see WMS lecture)
Files & replicas: Name Conventions : Files andamp; replicas: Name Conventions Logical File Name (LFN)
An alias created by a user to refer to some item of data, e.g. 'lfn:cms/20030203/run2/track1'
Globally Unique Identifier (GUID)
A non-human-readable unique identifier for an item of data, e.g.
'guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6'
Site URL (SURL) (or Physical File Name (PFN) or Site FN)
The location of an actual piece of data on a storage system, e.g. 'srm://pcrd24.cern.ch/flatfiles/cms/output10_1' (SRM) 'sfn://lxshare0209.cern.ch/data/alice/ntuples.dat' (Classic SE)
Transport URL (TURL)
Temporary locator of a replica + access protocol: understood by a SE, e.g.
'rfio://lxshare0209.cern.ch//data/alice/ntuples.dat'
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
File Catalogs : File Catalogs
At this point you should ask:
How do I keep track of all my files on the Grid?
Even if I remember all the lfns of my files, what about someone else files?
Anyway, how does the Grid keep track of associations lfn/GUID/surl?
Well… we need a FILE CATALOGUE
Cataloging Requirements : Cataloging Requirements Need to keep track of the location of copies (replicas) of Grid files
Replicas might be described by attributes
Support for METADATA
Could be 'system' metadata or 'user' metadata
Potentially, milions of files need to be registered and located
Requirement for performance
Distributed architecture might be desirable
scalability
prevent single-point of failure
Site managers need to change autonomously file locations
File Catalogs in egee/LCG : File Catalogs in egee/LCG Access to the file catalog
The DM tools and APIs and the WMS interact with the catalog
Hide catalogue implementation details
Lower level tools allow direct catalogue access
EDG’s Replica Location Service (RLS)
Catalogs in use in LCG-2
Replica Metadata Catalog (RMC) + Local Replica Catalog (LRC)
Some performance problems detected during LCG Data Challenges
New LCG File Catalog (LCF)
Already being certified; deployment in January 2005
Coexistence with RLS and migration tools provided
Better performance and scalability
Provides new features: security, hierarchical namespace, transactions...
Overview of File catalogues : Overview of File catalogues
File Catalogs: The RLS : RMC:
Stores LFN-GUID mappings
Accessible by edg-rmc CLI + API
LRC:
Stores GUID-SURL mappings
Accessible by edg-lrc CLI + API File Catalogs: The RLS RMC LRC DM LRC RMC
File Catalogs: The LFC : File Catalogs: The LFC One single catalog
LFN acts as main key in the database. It has:
Symbolic links to it (additional LFNs)
Unique Identifier (GUID)
System metadata
Information on replicas
One field of user metadata
File Catalogs: The LFC (II) : File Catalogs: The LFC (II) Fixes performance and scalability problems seen in EDG Catalogs
Cursors for large queries
Timeouts and retries from the client
Provides more features than the EDG Catalogs
User exposed transaction API (+ auto rollback on failure of mutating method call)
Hierarchical namespace and namespace operations (for LFNs) /grid/andlt;VOandgt;/…..
Integrated GSI Authentication + Authorization
Access Control Lists (Unix Permissions and POSIX ACLs)
Checksums
Interaction with other components
Supports Oracle and MySQL database backends
Integration with GFAL and lcg_util APIs complete
New specific API provided
New features will be added (requests welcome!)
ROOT Integration in progress
POOL Integration will be provided soon
VOMS will be integrated
LFC commands : LFC commands Summary of the LFC Catalog commands
LFC C API : LFC C API lfc_deleteclass
lfc_delreplica
lfc_endtrans
lfc_enterclass
lfc_errmsg
lfc_getacl
lfc_getcomment
lfc_getcwd
lfc_getpath
lfc_lchown
lfc_listclass
lfc_listlinks lfc_listreplica
lfc_lstat
lfc_mkdir
lfc_modifyclass
lfc_opendir
lfc_queryclass
lfc_readdir
lfc_readlink
lfc_rename
lfc_rewind
lfc_rmdir
lfc_selectsrvr lfc_setacl
lfc_setatime
lfc_setcomment
lfc_seterrbuf
lfc_setfsize
lfc_starttrans
lfc_stat
lfc_symlink
lfc_umask
lfc_undelete
lfc_unlink
lfc_utime
send2lfc lfc_access
lfc_aborttrans
lfc_addreplica
lfc_apiinit
lfc_chclass
lfc_chdir
lfc_chmod
lfc_chown
lfc_closedir
lfc_creat
lfc_delcomment
lfc_delete
Low level methods (many POSIX-like):
Slide24 : Important environment variables:
export LCG_GFAL_INFOSYS=grid152.kfki.hu:2170 Must be set for each catalogue type
export LCG_CATALOG_TYPE=lfc Must be set only for LFC
export LFC_HOST=grid155.kfki.hu Must be set only for LFC
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
DM CLIs & APIs overview : DM CLIs andamp; APIs overview User Tools Cataloging Storage Data transfer Data Management (Replication, Indexing, Querying) EDG LFC SRM Classic
SE GridFTP bbFTP File I/O RFIO DCAP lcg_utils: CLI + C API
edg-rm: CLI + API GFAL C API GFAL C API GFAL C API (GFAL C API) edg-rmc
edg-lrc
CLI + API rfio API dcap API edg- gridtp Globus API bbFTP API CLI+
API SRM API
SRM Storage Management : SRM Storage Management
Data management tools : Data management tools Replica manager: lcg-* commands + lcg_* API
Provide (all) the functionality needed by the egee/LCG user
Combine file transfer and cataloging as an atomic transaction
Insure consistent operations on catalogues and storage systems
Offers high level layer over technology specific implementations
Based on the Grid File Access Library (GFAL) API
Discussed in SE section
edg-gridftp tools: CLI
Complete the lcg_utils with GridFTP operations
Lower level layer w.r.t. Replica Manager
Only for gridFTP protocol
Functionality available in GFAL
May be implemented as lcg-* commands
DM CLIs & APIs: Old EDG tools : DM CLIs andamp; APIs: Old EDG tools Old versions of EDG CLIs and APIs still available
File andamp; replica management
edg-rm
Implemented (mostly) in java
Catalog interaction (only for EDG catalogs)
edg-lrc
edg-rmc
Java and C++ APIs
Use discouraged
Worse performance (slower)
New features added only to lcg_utils
Less general than GFAL and lcg_utils
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
Gathering informations: lcg-infosites : Gathering informations: lcg-infosites Not really a Data Management tool
Wrapper around Information System Client
Very usefull to discover resources
Storage Elements
Catalog end points
(…)
Usage: lcg-infosites --vo voname option [--is BDII] [--help]
Possible options: se, ce, closeSE, lrc, rmc, all
--vo field is mandatory
--is : allows to specify the BDII to query
If flag not used, the BDII defined into LCG_GFAL_INFOSYS environmental variable is used
Try the –help flag for a list of possible options
lcg-utils commands : lcg-utils commands Replica Management
File Catalog Interaction
Gathering informations: lcg-infosites : Gathering informations: lcg-infosites
[scampana@grid019:~]$ lcg-infosites --vo gilda se
*************************************************************
These are the related data for gilda: (in terms of SE)
*************************************************************
Avail Space(Kb) Used Space(Kb) SEs
----------------------------------------------------------
1570665704 576686868 grid3.na.astro.it
225661244 1906716 grid009.ct.infn.it
523094840 457000 grid003.cecalc.ula.ve
1570665704 576686868 testbed005.cnaf.infn.it
15853516 1879992 gilda-se01.pd.infn.it
lcg_utils CLI : usage example : lcg_utils CLI : usage example We have a local file in our UI in Catania [scampana@grid019:~]$ lcg-cr --vo gilda -l lfn:simone-important \
-d grid3.na.astro.it file://`pwd`/important-file.txt
guid:08d02e56-bdf6-4833-a4da-e0247c188242 Upload the file in Naples (Italy) [scampana@grid019:~]$ ls -l important-file.txt
-rw-r--r-- 1 scampana users 19 Oct 31 17:09 important-file.txt
[scampana@grid019:~]$ lcg-lr --vo gilda lfn:simone-important
sfn://grid3.na.astro.it/flatfiles/SE00/gilda/generated/2004-10-31/ \
file4c7c2ad6-4d93-4cd2-be24-bf4239f58208 The file is effectively there … …. Let’ s replicate it to Merida now … [scampana@grid019:~]$ lcg-rep --vo gilda \
-d grid003.cecalc.ula.ve lfn:simone-important
[scampana@grid019:~]$ lcg-lr --vo gilda lfn:simone-important
sfn://grid003.cecalc.ula.ve/flatfiles/SE00/gilda/generated/2004-10-31/ \
file39568d15-e873-4f17-9371-b8862ae77c36
sfn://grid3.na.astro.it/flatfiles/SE00/gilda/generated/2004-10-31/ \
file4c7c2ad6-4d93-4cd2-be24-bf4239f58208 [scampana@grid019:~]$ lcg-del --vo gilda -a lfn:simone-important
[scampana@grid019:~]$ lcg-lr --vo gilda lfn:simone-important
lcg_lr: No such file or directory Delete all the replicas in the storage elements. IMPORTANT
The lcg_utils (both CLI and API described later) need to access
the Information System (BDII).
The name of the BDII host used by lcg_utils is specified in the
environment variable LCG_GFAL_INFOSYS
REMEMBER THAT, ESPECIALLY WHEN PERFORMING
DATA MANAGEMENT OPERATIONS FROM THE WN
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
lcg_utils API : lcg_utils API lcg_utils API:
High-level data management C API
Same functionality as lcg_util command line tools
Single shared library
liblcg_util.so
Single header file
lcg_util.h
(+ linking against libglobus_gass_copy_gcc32.so)
lcg_utils: Replica management : lcg_utils: Replica management int lcg_cp (char *src_file, char *dest_file, char *vo, int nbstreams, char * conf_file, int insecure, int insecure);
int lcg_cr (char *src_file, char *dest_file, char *guid, char *lfn, char *vo, char *relative_path, int nbstreams, char *conf_file, int insecure, int verbose, char *actual_guid);
int lcg_del (char *file, int aflag, char *se, char *vo, char *conf_file, int insecure, int verbose);
int lcg_rep (char *src_file, char *dest_file, char *vo, char *relative_path, int nbstreams, char *conf_file, int insecure, int verbose);
int lcg_sd (char *surl, int regid, int fileid, char *token, int oflag);
lcg_utils: Catalog interaction : lcg_utils: Catalog interaction int lcg_aa (char *lfn, char *guid, char *vo, char *insecure, int verbose);
int lcg_gt (char *surl, char *protocol, char **turl, int *regid, int *fileid, char **token);
int lcg_la (char *file, char *vo, char *conf_file, int insecure, char ***lfns);
int lcg_lg (char *lfn_or_surl, char *vo, char *conf_file, int insecure, char *guid);
int lcg_lr (char *file, char *vo, char *conf_file, int insecure, char ***pfns);
int lcg_ra (char *lfn, char *guid, char *vo, char *conf_file, int insecure);
int lcg_rf (char *surl, char *guid, char *lfn, char *vo, char *conf_file, int insecure, int verbose, char *actual_guid);
int lcg_uf (char *surl, char *guid, char *vo, char *conf_file, int insecure);
Available APIs : Available APIs #include andlt;iostreamandgt;
#include andlt;stdlib.handgt;
#include andlt;string.handgt;
#include andlt;stringandgt;
#include andlt;stdio.handgt;
#include andlt;errno.handgt;
// lcg_util is a C library. Since we write C++ code here, we need to
// use extern C
//
extern 'C'
{
#include andlt;lcg_util.handgt;
}
using namespace std; /***************************************************************************************/
/* The folling example code shows you how you can use the lcg_util API for */
/* replica management. We expect that you modify parts of this code in */
/* to make it work in your environment. This is particularly indicated */
/* by ACTION, i.e. your action is required. */ /**************************************************************************************/
int main ()
{
cout andlt;andlt; 'Data Management API Example ' andlt;andlt; endl;
char *vo = 'cms'; // ACTION: fill in your correct VO here: gilda !
cout andlt;andlt; '---------------------------------------------------' andlt;andlt; endl;
C APIs
Available APIs : Available APIs // Copy a local file to the Storage Element and register it in RLS
//
char *localFile = 'file:/tmp/test-file'; // ACTION: create a testfile
char *destSE = 'lxb0707.cern.ch'; // ACTION: fill in a specific SE char
*actualGuid = (char*) malloc(50);
int verbose = 2; // we use verbosity level 2
int nbstreams = 8; // we use 8 parallel streams to transfer a file
lcg_cr(localFile, destSE, NULL,
NULL, vo, NULL, nbstreams,
NULL, 0, verbose, actualGuid);
if (errno)
{
perror('Error in copyAndRegister:');
return -1;
} else {
cout andlt;andlt; 'We registered the file with GUID: ' andlt;andlt; actualGuid andlt;andlt; endl;
}
cout andlt;andlt; '---------------------------------------------------' andlt;andlt; endl; Copy and Register
Available APIs : Available APIs // Call the listReplicas (lcg_lr) method and print the returned URLs
//
// The actualGuid does not contain the prefix 'guid:'. We add it here and
// then use the new guid as a parameter to list replicas
//
std::string guid = 'guid:';
guid.insert(5,actualGuid);
char ***pfns = (char***) malloc(200);
lcg_lr((char*) guid.c_str(), vo, NULL, 0, pfns);
if(errno)
{
perror('Error in listReplicas:');
free(pfns);
return -1;
} else {
cout andlt;andlt; 'PFN = ' andlt;andlt; **pfns andlt;andlt; endl;
}
free(pfns);
cout andlt;andlt; '---------------------------------------------------' andlt;andlt; endl; List Replicas
Available APIs : Available APIs // Delete the replica again
//
int rc = lcg_del((char*) guid.c_str(), 1, destSE, vo, NULL, 0, verbose);
if(rc != 0)
{
perror('Error in delete:');
return -1;
} else {
cout andlt;andlt; 'Delete OK' andlt;andlt; endl;
}
return 0;
} Delete Replica
Available APIs : Available APIs CC = g++
GLOBUS_FLAVOR = gcc32
all: data-management
data-management: data-management.o
$(CC) -o data-management \
-L${GLOBUS_LOCATION}/lib -lglobus_gass_copy_${GLOBUS_FLAVOR} \
-L${LCG_LOCATION}/lib -llcg_util -lgfal \
data-management.o
data-management.o: data-management.cpp
$(CC) -I ${LCG_LOCATION}/include -c data-management.cpp
clean:
rm -rf data-management data-management.o Makefile used
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
Grid File Access Library : Grid File Access Library GFAL is a library to provide access to Grid files
File I/O, Catalog Interaction, Storage Interaction
Abstraction from specific implementations
Transparent interaction with the information service, the file catalogs…
Single shared library in threaded and unthreaded versions
libgfal.so, libgfal_pthr.so
Single header file
gfal_api.h
GFAL: Catalog API : GFAL: Catalog API int create_alias (const char *guid, const char *lfn, long long size)
int guid_exists (const char *guid)
char *guidforpfn (const char *surl)
char *guidfromlfn (const char *lfn)
char **lfnsforguid (const char *guid)
int register_alias (const char *guid, const char *lfn)
int register_pfn (const char *guid, const char *surl)
int setfilesize (const char *surl, long long size)
char *surlfromguid (const char *guid)
char **surlsfromguid (const char *guid)
int unregister_alias (const char *guid, const char *lfn)
int unregister_pfn (const char *guid, const char *surl)
GFAL: Storage API : GFAL: Storage API int deletesurl (const char *surl)
int getfilemd (const char *surl, struct stat64 *statbuf)
int set_xfer_done (const char *surl, int reqid, int fileid, char *token, int oflag)
int set_xfer_running (const char *surl, int reqid, int fileid, char *token)
char *turlfromsurl (const char *surl, char **protocols, int oflag, int *reqid, int *fileid, char **token)
int srm_get (int nbfiles, char **surls, int nbprotocols, char **protocols, int *reqid, char **token, struct srm_filestatus **filestatuses)
int srm_getstatus (int nbfiles, char **surls, int reqid, char *token, struct srm_filestatus **filestatuses)
GFAL: File I/O API (I) : GFAL: File I/O API (I) int gfal_access (const char *path, int amode);
int gfal_chmod (const char *path, mode_t mode);
int gfal_close (int fd);
int gfal_creat (const char *filename, mode_t mode);
off_t gfal_lseek (int fd, off_t offset, int whence);
int gfal_open (const char * filename, int flags, mode_t mode);
ssize_t gfal_read (int fd, void *buf, size_t size);
int gfal_rename (const char *old_name, const char *new_name);
ssize_t gfal_setfilchg (int, const void *, size_t);
int gfal_stat (const char *filename, struct stat *statbuf);
int gfal_unlink (const char *filename);
ssize_t gfal_write (int fd, const void *buf, size_t size);
GFAL protocol of File Open : GFAL protocol of File Open
GFAL: File I/O API (II) : GFAL: File I/O API (II) int gfal_closedir (DIR *dirp);
int gfal_mkdir (const char *dirname, mode_t mode);
DIR *gfal_opendir (const char *dirname);
struct dirent *gfal_readdir (DIR *dirp);
int gfal_rmdir (const char *dirname);
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
Advanced utilities: edg-gridftp : Advanced utilities: edg-gridftp edg-gridftp-exists TURL Checks if file/dir exists on a SE
edg-gridftp-ls TURL Lists a directory on a SE
globus-url-copy srcTURL dstTURL Copies files between SEs
edg-gridftp-mkdir TURL Creates a directory on a SE
edg-gridftp-rename srcTURL dstTURL Renames a file on a SE
edg-gridftp-rm TURL Removes a file from a SE
edg-gridftp-rmdir TURL Removes a directory on a SE Used for low level management of file/directories in SEs
edg-gridftp example : edg-gridftp example Create and delete a directory in a GILDA Storage Element
Other Advanced CLI&API : Other Advanced CLIandamp;API globus-url-copy srcTURL destTURL
low level file transfer
Interaction with RLS components
edg-lrc command (actions on LRC)
edg-rmc command (actions on RMC)
C++ and Java API for all catalog operations
http://edg-wp2.web.cern.ch/edg-wp2/replication/docu/r2.1/edg-lrc-devguide.pdf
http://edg-wp2.web.cern.ch/edg-wp2/replication/docu/r2.1/edg-rmc-devguide.pdf
Using low level CLI and API is STRONGLY discouraged
Risk: loose consistency between SEs and catalogues
REMEMBER: a file is in Grid if it is BOTH:
stored in a Storage Element
registered in the file catalog
OutputData JDL attribute : OutputData JDL attribute Same as lcg-cr command
OutputData JDL attribute specifies files to be copied and registered into the Grid
The filename (OutputData) is compulsory
If no LFN specified (LogicalFileName), none is set!
If no SE specified (StorageElement), the default SE is chosen ($VO_andlt;VOandgt;_DEFAULT_SE)
At the end of the job the files are moved from WN and registered
OutputData = { [ OutputFile = 'toto.out' ; StorageElement = 'adc0021.cern.ch' ; LogicalFileName = 'lfn:theBestTotoEver' ;],
[ OutputFile = 'toto2.out' ; StorageElement = 'adc0021.cern.ch' ; LogicalFileName = 'lfn:theBestTotoEver2' ; ] };
Overview : Overview Introduction on Data Management (DM)
General Concepts
Some details on transport protocols
Data management operations
Files andamp; replicas: Name Convention
File catalogs
Cataloging requirements and catalogs in egee/LCG
RLS file catalog
LCG file catalog
DM tools: overview
Data Management CLI
lcg_utils
Data Management API
lcg_utils
GFAL
Advanced concepts
Advanced utilities: CLIandamp;APIs
OutputData JDL attribute
Conclusions
Summary : Summary We provided a description to the egee/LCG Data Management Middleware Components and Tools
We described how to use the available CLIs
Use-case scenarios of Data Movement on Grid
We presented the available APIs
An example usage of lcg_util library is shown
Bibliography : Bibliography General egee/LCG information
EGEE Homepage
http://public.eu-egee.org/
EGEE’s NA3: User Training and Induction
http://www.egee.nesc.ac.uk/
LCG Homepage
http://lcg.web.cern.ch/LCG/
LCG-2 User Guide
https://edms.cern.ch/file/454439//LCG-2-UserGuide.html
GILDA
http://gilda.ct.infn.it/
GENIUS (GILDA web portal)
http://grid-tutor.ct.infn.it/
Bibliography : Bibliography Information on Data Management middleware
LCG-2 User Guide (chapters 3rd and 6th)
https://edms.cern.ch/file/454439//LCG-2-UserGuide.html
Evolution of LCG-2 Data Management. J-P Baud, James Casey.
http://indico.cern.ch/contributionDisplay.py?contribId=278andamp;sessionId=7andamp;confId=0
Globus 2.4
http://www.globus.org/gt2.4/
GridFTP
http://www.globus.org/datagrid/gridftp.html
GFAL
http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html
Bibliography : Bibliography Information on egee/LCG tools and APIs
Manpages (in UI)
lcg_utils: lcg-* (commands), lcg_* (C functions)
Header files (in $LCG_LOCATION/include)
lcg_util.h
CVS developement (sources for commands)
http://isscvs.cern.ch:8180/cgi-bin/cvsweb.cgi/?hidenonreadable=1andamp;f=uandamp; logsort=dateandamp;sortby=fileandamp;hideattic=1andamp;cvsroot=lcgwareandamp;path=
Information on other tools and APIs
EDG CLIs and APIs
http://edg-wp2.web.cern.ch/edg-wp2/replication/documentation.html
Globus
http://www-unix.globus.org/api/c/ , ...globus_ftp_client/html , ...globus_ftp_control/html