ATLAS ppdg short

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

GRID Tools In ATLAS Production: 

GRID Tools In ATLAS Production Production Features First steps Plans

ATLAS Data Challenge: 

ATLAS Data Challenge ATLAS DC1 phase 1 starts this May 108 Generator events (all produced at CERN) 107 Geant3 detector response events (atlsim framework) 107 reconstructed events (Athena framework) Data production is for physics purposes (Trigger TDR) 2/3 of data produced outside of CERN production on a global scale: Asia, Australia, Europe and North America 19 countries

ATLAS jobs: 

ATLAS jobs Four distinct classes: Event generation : CPU 100 SI95, output 20kB no Database needed Detector simulation: 16,000 SI95, output 2MB, Geometry parameter Database needed (MySQL) Reconstruction: CPU 2000 SI95, output 500 kB Geometry, Alignment, Calibration Databases Analysis: CPU 100 SI95, Output 50 kB(Ntuple) many different event components (HES…), DB

GRID Tool Evaluation: 

GRID Tool Evaluation Gradual implementation plan: Use whatever is available now: Magda, Pacman, Condor-G Suppose the tools are there (imitate them when necessary) Collect as much components as possible and try to classify there by their functionality Start with event generation and detector simulation (Atlsim+Dice)

Atlsim-Dice Production Status: 

Atlsim-Dice Production Status Objectivity, ROOT, MySQL interfaces implemented Multi-processor run with common input Typical input may contain many thousands of similar physics events Atlsim jobs are able to maintain a local DB to process common input coherently Optimal processing time per job is about 24 hours Typical output file size for 170 – 320 events (with hits and digits) is 200 – 300 Mbytes

Pre-VDC Experience: 

Pre-VDC Experience Recipes for producing the data (jobOptions, kumacs) has to be fully tested. Preparation production recipes takes time and efforts, encapsulating considerable knowledge and infrastructure dependencies inside. When you got recipes, data production is straightforward After the data have been produced, what do we have to do with the developed recipes? Data are primary, recipes are secondary

Virtual Data Perspective: 

Virtual Data Perspective Recipes are as valuable as the data Production recipes are Virtual data Recipes are primary, data are secondary - if you have recipes you can reproduce data Do not throw away the recipes, save them (in VDC) Recipes should be encapsulated in VD Objects

VDC Status in DC1: 

VDC Status in DC1 Approved as an R&D activity (parallel to the production scripts not using VDC) Templated jobOptions approach was used for Generator events production USA site (BNL) will use VDC in simulation production transformation Participants from Canada and UK expressed interest in using VDC-based scripts

Production Policies in VDC: 

Production Policies in VDC Allocation of unique event ID implementing the event ID allocation policy Allocating random number seeds providing unified random number seed allocation policy Support for automatic generation of jobs Unique partition numbering Encapsulation of environment variables

VDC database backend: 

VDC database backend VDC guarantee uniqueness of event ID output PFN random number seeds This was difficult with a non-VDC “perl script” approach in a massive parallel production environment

VDC Integration in Production: 

VDC Integration in Production Production System is extended in DC1 with features provided by few “ortogonal” VDC component: Data reproducibility SIGNATURE (application software version) Grid dimension: LOCATION (site) Application complexity Application CONFIGURATION

VDC Integration: 

VDC Integration VDC-based automatic “garbage collection”: Agents (jobs) get the next derivation from VDC After the data has been materialized agents register “success” in VDC if some previous invocation has not been completed within the specified timeout period, it is invoked again