Presentation Transcript
Enabling Grid Computer for HEP :James Cunha Werner jamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at
University of Manchester
Resources: www.hep.man.ac.uk/u/jamwer
Human resource strategy :James Cunha Werner jamwer@hep.man.ac.uk Human resource strategy * Jobs with 5 events instead Millions.
Resources Strategy :James Cunha Werner jamwer@hep.man.ac.uk Resources Strategy
Grid Test Bed :James Cunha Werner jamwer@hep.man.ac.uk Grid Test Bed
Slide 5 :James Cunha Werner jamwer@hep.man.ac.uk
Slide 6 :James Cunha Werner jamwer@hep.man.ac.uk Software: 850 packages. Tau Datasets: range between 60 files 1GB and 150 files 1GB
Total 4,000 GB ~ 10,000 files
Analysis Submission to Grid :James Cunha Werner jamwer@hep.man.ac.uk Analysis Submission to Grid Single command: ./easygrid dataset_name
Perform Handlers management and submission
Software based in State-machine
Verify skimdata available:
If not available perform BbkDatasetTCL to generate skimData. Each file will be a job.
Verify if there are handlers pending
If not, script generation (gera.c) with edg-job-submit and ClassAdds, and script execution. Nest for submission policy and optimisation.
If yes, verify job status. When the all jobs ended, recover results in user folder. (Prototype)
Generation and submission :James Cunha Werner jamwer@hep.man.ac.uk Generation and submission [jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14
Invalid configuration filename: /opt/edg/etc/vomses
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
Enter GRID pass phrase for this identity:
Creating temporary proxy ......................................................... Done
Creating proxy .................................................... Done
Searching pre selected skimdata.
Searching previous handlers.
Handlers not found. Submiting to GRID . Wait end of process...
Job Status :James Cunha Werner jamwer@hep.man.ac.uk Job Status [jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14
Invalid configuration filename: /opt/edg/etc/vomses
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
Enter GRID pass phrase for this identity:
Creating temporary proxy ............................ Done
Creating proxy ............................... Done
Searching pre selected skimdata.
Searching previous handlers. Checking if jobs finished.
### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg
Current Status: Scheduled
https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg still pendent.
### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA
Current Status: Scheduled
https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA still pendent.
4 jobs did not finished ! Try again later.
Job Status and recovery :James Cunha Werner jamwer@hep.man.ac.uk Job Status and recovery [jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14
Invalid configuration filename: /opt/edg/etc/vomses
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
Enter GRID pass phrase for this identity:
Creating temporary proxy .......................................... Done
Creating proxy ........................................................... Done
Searching pre selected skimdata. Searching previous handlers.
Checking if jobs finished.
### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg
Current Status: Done
Exit code: 0
### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA
Current Status: Done
Exit code: 0
0 jobs did not finished ! Try again later.
All jobs done. Recovering results in your folder. Results in the following folders: /home/jamwer/grid_sub/babar/jamwer_foRHhWyeDBnbqA9JkDADLg /home/jamwer/grid_sub/babar/jamwer_8DdK3xruxtevNpei3zZbaA
Monte Carlo Submission to Grid :James Cunha Werner jamwer@hep.man.ac.uk Monte Carlo Submission to Grid Single Command: ./mcgrid JobName num_copies
Perform Handlers management and submission.
Software based in State-Machine:
Verify if there are handlers pending
If not, script generation (geramc.c) with edg-job-submit and ClassAdds for each copy, and script execution. Nest for submission policy and optimisation.
If yes, verify job status. When the all jobs ended, recover results in user folder. (Prototype)
MC Submission :James Cunha Werner jamwer@hep.man.ac.uk MC Submission [jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3
Invalid configuration filename: /opt/edg/etc/vomses
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
Enter GRID pass phrase for this identity:
Creating temporary proxy ................................. Done
Creating proxy ....................................................... Done
Searching previous handlers. Handlers not found.
Submiting to GRID . Wait end of process...
Job Status :James Cunha Werner jamwer@hep.man.ac.uk Job Status [jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3
Invalid configuration filename: /opt/edg/etc/vomses
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
Enter GRID pass phrase for this identity:
Creating temporary proxy ........................................ Done
Creating proxy ....................................... Done
Searching previous handlers. Checking if jobs finished.
### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw
Current Status: Scheduled
https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw still pendent.
### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg
Current Status: Ready
https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg still pendent.
### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA
Current Status: Ready
https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA still pendent.
3 jobs did not finished ! Try again later.
Job status and recovery :James Cunha Werner jamwer@hep.man.ac.uk Job status and recovery [jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3
Invalid configuration filename: /opt/edg/etc/vomses
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
Enter GRID pass phrase for this identity:
Creating temporary proxy .................................................. Done
Creating proxy .................................................... Done
Searching previous handlers. Checking if jobs finished.
### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw
Current Status: Done
Exit code: 0
### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg
Current Status: Done
Exit code: 0
0 jobs did not finished ! Try again later.
All jobs done. Recovering results in your folder. Results in the following folders: /home/jamwer/grid_sub/mcgrid1/jamwer_9WzceoIMEQoTK24a-UvOmw /home/jamwer/grid_sub/mcgrid1/jamwer_c4iCB8vioozaGteI9hybIg /home/jamwer/grid_sub/mcgrid1/jamwer_L5BD1OE--eckTm5RXkp2nA
Testing Submission Script :James Cunha Werner jamwer@hep.man.ac.uk Testing Submission Script Load Range: Worker load x #Files
16 x 60 files = 960 jobs pendent
16 x 150 files = 2400 jobs pendent
Test with Submission script * sslv3 alert handshake failure
** Please wait job enter the “Done” status. This never happens!
Resource Broker not reliable or robust. Sometimes failure 3 days a week or takes hours to submit/dispatch to CE (empty!).
Pending Infrastructure => Course of action :James Cunha Werner jamwer@hep.man.ac.uk Pending Infrastructure => Course of action Babar Software Know How is not available at Manchester => Web Page & Network skills.
Quality Assurance => We are OK! from benchmark (E x P)
Real Application to perform complete cycle, acquire know how, and grid prof-of-concept is missing => Partnership with physicists
CERN does NOT recognise Babar Community => Lets reduce their priority!
RB at Manchester => 60MB binaries and policies freedom.
SE/RC at Manchester => policies and submission jobs freedom.
Mass storage (10TB) for Babar purposes => CAP!
UI in the AFS => wide access to Manchester farms.
Apprenticeship at RAL and later at SLAC – production and experiment => improve where others fail
Configuration for optimal job performance/submission at Tear 2 (1 Ce x 50 WN? Performance dCache with Babar Software? Why 10TB if Liverpool bought 80TB? Electricity bill? => analyse procedures to improve QoS and better Site Configuration
Update (software and data) and operational policies => operational standards to achieve high QoS
Aimed Hardware Architecture :James Cunha Werner jamwer@hep.man.ac.uk Aimed Hardware Architecture (Redundant RB with alternate access)
Aimed Software Architecture :James Cunha Werner jamwer@hep.man.ac.uk Aimed Software Architecture
Production Job Submission Package :James Cunha Werner jamwer@hep.man.ac.uk Production Job Submission Package Operational policies/integration with RB (application level).
Recovery of aborted status.
Resources optimisation.
Integration with RC (application level) for replicas policies development.
Interactive data visualisation (Useful?)
Integration with GridSite (Data visualisation, analysis, performance monitor, and submission)
Professional version.
Integrate LCG2 and Job Submission with Babar/CM2 at University of Manchester for Tau Physics modelling, analysis and MC generation. :James Cunha Werner jamwer@hep.man.ac.uk Integrate LCG2 and Job Submission with Babar/CM2 at University of Manchester for Tau Physics modelling, analysis and MC generation. We aim to be soon…
The largest site in UK.
Leader in grid computing and HEP Summary
Conclusion :James Cunha Werner jamwer@hep.man.ac.uk Conclusion Babar CM2 is running at Manchester!
LCG2 Grid is running with real world experiment!
Babar submission prototype to Grid is running !
LCG is not LHC software only! It is Babar’s.
We are doing today what will take years to you to achieve. Lets work together!
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.