Presentation Transcript
History of the National INFN Pool : History of the National INFN Pool P. Mazzanti, F. Semeria
INFN – Bologna (Italy)
European Condor Week 2006
Milan, 29-Jun-2006
Our first experience (1997) : Our first experience (1997) Monte Carlo event generation.
WA92 experiment at CERN: Beauty search in fixed target experiment.
Working conditions: a dedicated farm of 3 Alpha VMS and 6 DecStation Ultrix.
Results: 22000 events/day (0 dead time).
Then Condor came... : Then Condor came... Production Condor Pool:
23 DEC Alpha
18 Bologna
2 Cnaf (Bologna)
2 Turin
1 Rome
4 HP
6 DecStation Ultrix
5 Pentium Linux
Then Condor came… (cont.) : The throughput of the 23 Alpha subset of the pool:
75000 to 100000 events/day plus 15000 events/day with the pool in Madison.
We got x5 the production at zero cost! Then Condor came… (cont.)
Give me a calculator… : Give me a calculator… At INFN : 1000 PCs used 8 hours/day by the owners (16 hours/day idle)
1000 * 16 = 16000 hours = 1.8 year
1.8 year equivalent CPU wasted each day!
The ‘Condor on WAN’ INFN Project : The ‘Condor on WAN’ INFN Project Approved by the Computing Committee on February 1998.
Goal: install Condor on the INFN WAN and evaluate its effectiveness for the INFN computational needs.
30 people involved.
The Condor INFN Project (cont.) : The Condor INFN Project (cont.) The INFN Structure
27 sites
More then 10 experiments on nuclear and sub-nuclear physics.
Hundreds of researchers involved.
Distributed and heterogeneous resources.
(good frame for a grid…)
The Condor INFN Project (cont.) : The Condor INFN Project (cont.) The first example in Europe of a national
distributed computing environment
Collaboration : Collaboration INFN and Computer Science Dept. of the University of Wisconsin, Madison
Coordinators for the project:
for Madison: Miron Livny
for INFN: Paolo Mazzanti.
General usage policy : General usage policy
Each group of people must be able to maintain full control over their own machines.
General usage policy (cont.) : General usage policy (cont.) A Condor job sent from a machine of a group must have the maximum access priority on the machines of the same group.
Subpools : Subpools rank expression: a resource owner can give priority to requests from selected groups: GROUP_ID = “My_Group” RANK = target.GROUP_ID == “My_Group”
From the group point of view the machines make a pool by themselves: a subpool.
Checkpoint Server Domains : Checkpoint Server Domains The network could be a concern with a computing environment distributed over a WAN.
Policy: a job should run in the ckpt domain if local resources are available.
The INFN-WAN Pool (2000) : The INFN-WAN Pool (2000)
The INFN-WAN Pool (2002) : The INFN-WAN Pool (2002)
ALPHA/OSF1 107
INTEL/LINUX 122
SUN/SOLARIS 6
INTEL/WNT 1
Total 235
INFN Condor Pool Allocation Time (Hours) (1999) : INFN Condor Pool Allocation Time (Hours) (1999)
Applications : Applications Simulation of the CMS detector.
MC event production for CMS.
Simulation of Cherenkov light in the atmosphere (CLUE).
MC integration in perturbative QCD.
Dynamic chaotic systems.
Extra-solar planets orbits.
Sthocastics differentials equations.
Maxwell equations.
Simulation of Cherenkov light in the atmosphere (CLUE). : Simulation of Cherenkov light in the atmosphere (CLUE).
Without Condor (1 Alpha):
20000 events/week.
With Condor: 350000 events in 2 weeks (gain: x9)
Dynamic chaotic systems : Dynamic chaotic systems Computations based on complex matrix (multiplication,inversion,determinants etc.).
Very CPU-bound with little output and no input.
Gains with Condor respect to the only Alpha used: x3.5 to x10.
MC integration in perturbative QCD : MC integration in perturbative QCD CPU-bound
No input, very small output
Gains with Condor: x10.
Maxwell Equations : Maxwell Equations 201 jobs, each with a different value of an input parameter.
Output: 401 numbers/jobs
Gains with Condor compared to the only Alpha available: x11
Slide22 :
People very very very happy!!
The Pool Today : The Pool Today 8 checkpoint servers: Bologna,Milano,Torino,Pavia,Trieste, Padova,LNGS,Napoli.
270 CPUs
45.5 years CPU equivalent used from January to June 25th ->
91 years CPU/year
Why the pool does not grow up? : Why the pool does not grow up? Why Condor is not installed on all PCs?
Is it difficult to install?
Is it difficult to use?
Is it difficult to maintain?
We are prefer to buy new machines?
An automatic installation tool : An automatic installation tool Three type of installation
server: binary and library only
client: configuration files only
Full: client+server
Rpm files are built up
Web interface
http://www.bo.infn.it/calcolo/condor/infn-installation-tool-6.6.7.html
Server installation : Server installation Only binaries and libraries
Usually done on nfs or afs servers. It exports bin and lib to the clients
Client installation : Client installation Install configuration files using data specified through the web interfaceCreates startup and shutdown scripts for the Condor daemons
Add binaries path (from the ‘server’ installation) in the users PATH
Full installation : Full installation Client + Server
All the condor distribution and the configuration files on the same machine
NFS and AFS are not required
Conclusion : Conclusion The INFN Condor Pool has been the first ‘pre-grid’ wide area distributed computing system.
It is still used by people out from the ‘big science’.
Conclusion (cont.) : Conclusion (cont.) BUT: why not Condor on each PC?
We did not find the answer in 10 years…
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.