Wyatt rc

Uploaded from authorPOINT
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Status of SAM, JIM, SAM-Gridand Future Plans: 

Status of SAM, JIM, SAM-Grid and Future Plans Definition of terms Some brief requirements Status and deployments of SAM Status and deployments of JIM Near term developments for SAM and JIM SAM-Grid: The work plan for the next 2 years Data handling in context

Definition of terms: 

Definition of terms Data handling system – the complex of software tools and servers which tracks locations of collider data andamp; derived data sets,defines sets of these data files, delivers specific sets to applications, tracks processing information to desired levels SAM – DØ's data handling system. Depends on/uses: Central ORACLE database at Fermilab (central metadata repository) ENSTORE mass storage system at Fermilab (central data repository) Access to various file transfer protocols: bbftp, GridFTP, rcp,dccp.. Other mass storage systems: HPSS, TSM,… Jobs can be submitted to SAM via: direct SAM command (sam submit) d0tools (most usually) MCRunjob (in development) JIM (in testing) Dataset definition tools: Web-based GUI Command line interface Dataset definition wizard (in development)

Definition of termscont’d: 

Definition of terms cont’d JIM (Job Information and Management) – component of SAM-Grid which provides for remote job submission and monitoring Data Grid – collection of computing andamp; storage resources among which data files can be transferred by a general software service. The computers which run SAM stations today can be considered a data grid. (aka distributed data processing) Compute Grid – collection of computing resources among which applications can be distributed by a general software service. The computers which run SAM stations and have installed JIM execution sites can be considered a compute grid. (aka Grid, DataGrid) Database Grid – collection of computing resources which can share metadata from distributed, possibly disparate databases We don’t have a functioning example of a database grid.

Some brief requirementsfor remote computing: 

Some brief requirements for remote computing Minimum remote data handling requirements: 1) Distribute data files from the primary mass store to all computing systems in the experiment which install the data handling system. Route data in a configurable way. 2)Receive for storage in the primary mass store simulated or derived data sets from remote institutions which have installed the DH system. Accommodate secondary storage locations. 3)Track processing apps at central and remote stations. Next level requirement: Distribute applications from arbitrary submission sites with the DØ code distribution available to generally available computing resources. And then: Provide tools for the automatic scheduling of jobs and for priorization: co-locate data and processing

Some brief requirementsfor remote computing: 

Some brief requirements for remote computing Maintain high availability of services Minimize operational load Provide for error handling: Worker node failure File transfer failure User application crashes/hangs/is killed by user/goes berserk Tune systems for most common use cases / expected bottlenecks / … Provide for system monitoring andamp; debugging Provide for user monitoring andamp; accounting Mitigate security risks

Status and deployments of SAM: 

Status and deployments of SAM Operational 24/7 on the Fermilab central systems: online, reconstruction farm, d0mino, cab, clued0 Operational at Monte Carlo production sites Operational at remote analysis sites: ~20 active, ~40 deployed Coming on line for remote reconstruction sites: U Mich, U Wisc, others later, including Lyon? Statistics: ~40000 proj FNAL, ~3500 proj remote (since 1/1/03) ~200M evts collider data, andgt; 350 TB stored at FNAL

Status and deployments of SAM: 

Status and deployments of SAM Current versions Software suite: V5.0 Includes sam_station v4_2_1_43 (sophisticated ‘compute element’: routing, different caching strategies), new user api, new batch adapter Database schema: V4.9 Expected releases this autumn Database schema V5.1 – major changes to data_files table (many-to-many files to runs connection; metadata specification according to file type) Software suite: V5.x – dbserver rewrite to take advantage of new schema

Status and deployments of JIM: 

Status and deployments of JIM Current JIM version V1.0 – Job broker, execution and submission site software, job monitor, client software Current push to deploy JIM V1.0 Deployment of demo last November – proof of principle Deployment of submission site on clued0 with execution site on CAB – immediately pointed out lack of viable model for job transfer (either input or output). Working on that now. (Note we can already submit SAM jobs to CAB from clued0 using d0tools.) Request: can job broker pick between clued0 and CAB based on queue status? Deployment of submission and execution sites at Wuppertal, IC, RAL, Lancaster. In progress or envisioned at several other sites: U Wisc, U Mich, GridKA, NIKHEF, …

Near term developments for SAM and JIM: 

Near term developments for SAM and JIM JIM: Complete deployment of GridFTP as WAN transfer protocol SAM: Deploy new schema and accompanying dbserver improvements JIM: Devise and implement viable input/output sandbox model [Both: Bring CDF stations to full operational status] SAM: Migrate to new ORACLE database server machine and new version of ORACLE JIM: Migrate to new version of Globus software Integrate remote analysis stations into regular metric reporting; define offline shift responsibilities wrt remote stations

SAM-Grid: The work plan for the next 2 years: 

SAM-Grid: The work plan for the next 2 years Implement Schema Update I. (file_type, runs changes) Equip job broker to automate MC production; understand issues in automating job distribution for re-processing and analysis. Revise caching strategies. (local vs fileserving; merging operations; connections w/ other layers) Implement Schema Update II. (processing requirements for jobs, group info) Equip optimizers and job brokers to deal w/ info in Schema Update II. Sort out parallelization issues. Implement Virtual Organization tools. Implement Monitoring and Information server on the SAM side Provide for distributed database: two parts, file location info and processing info.; equip servers for more autonomous operation

SAM-Grid: The work plan for the next 2 years: 

SAM-Grid: The work plan for the next 2 years Evaluate technology changes/upgrades Improvements for installation/config management? Move to VDT suite (production version of Condor, Globus, etc.) Possible CORBA replacements – WebServices? XML-based logging – will this be the way to go? Which solution for distributed DB’s? Plan for interoperability Merge SAM catalog w/ other replica schemas? Follow example of DØ/CDF merge? Interoperation with other replica catalogs? GLUE schema for resource description; job description language Sam_batch_adapter technology Working with SRM’s – await outcome of caching strategy discussions Interactions of tools w/ data handling system: cf. mc_runjob andamp; d0tools w/ JIM and CAF(CDF) VO organization issues Security issues (VO, file transfer, job submission)

Data handling in context: 

Data handling in context We have the ability to distribute data to remote locations, allowing institutions to utilize their local resources for analysis (sam station) We have a system for job processing and analysis bookkeeping which can be deployed remotely as well (sam station, project master, user api,…) We have the ability to interface to local storage and batch systems (HPSS, TSM,… ; LSF, PBS, FBS, Condor, SGE,…) We see distinct opportunities to add the ability for automated job scheduling and remote submission capabilities

Data handling in context: 

Data handling in context For DØ, maintaining stable operation and current functionality is a priority The operational load remains a concern: we need more shifters, and adding development projects that impose a larger operational load would be a time for cost/benefit calculations