Presentation Transcript
ALICE Physics AnalysisUsing the GRID: ALICE Physics Analysis Using the GRID I.Belikov, for the ALICE Collaboration
ICHEP’06
July 26-August 2, 2006
Moscow, Russia
Slide2: level 0,1 - special hardware 8 kHz (160 GB/sec) level 2 - embedded processors level 3 (HLT) - PCs 200 Hz (4 GB/sec) 30 Hz (2.5 GB/sec) 30 Hz
(1.25 GB/sec) data recording &
offline analysis Total weight 10,000 t
Overall diameter 16.00 m
Overall length 25 m
Magnetic Field 0.5 T ALICE Collaboration
~ 1/2 ATLAS, CMS, ~ 2x LHCb
~1000 people, 30 countries, ~ 80 Institutes
CERN computing power: CERN computing power “High throughput” computing based on reliable commercial components
More than 1500 double CPU PC’s
5000 in 2007
More than 3 PB of data on disks & tapes
> 15 PB in 2007
Far from being enough !
ALICE computing model: ALICE computing model Three kinds of data analysis
Fast pilot analysis of the data “just collected” to tune the first reconstruction at CERN Analysis Facility (CAF)
Scheduled batch analysis using GRID (Event Summary Data and Analysis Object Data)
End-user interactive analysis using PROOF and GRID (AOD and ESD)
CERN
Does: first pass reconstruction
Stores: one copy of RAW, calibration data and first-pass ESD’s
T1
Does: reconstructions and scheduled batch analysis
Stores: second collective copy of RAW, one copy of all data to be kept, disk replicas of ESD’s and AOD’s
T2
Does: simulation and end-user interactive analysis
Stores: disk replicas of AOD’s and ESD’s
AliRoot framework: AliRoot framework ROOT AliRoot STEER Virtual MC G3 G4 FLUKA HIJING MEVSIM PYTHIA6 PDF EVGEN HBTP HBTAN ISAJET AliEn + LCG EMCAL ZDC ITS PHOS TRD TOF RICH ESD AliAnalysis AliReconstruction PMD CRT FMD MUON TPC START RALICE STRUCT AliSimulation JETAN
Alice Environment (AliEn), since 2001: Alice Environment (AliEn), since 2001 Based on Open Source components (95% imported code)
Offers for ALICE users a single interface into the heterogeneous and fast-evolving GRID reality
More than 130 registered AliEn users
Whenever possible, uses common services
LCG/gLite CE + RB, gLite FTS for scheduled file tranfers
ALICE is taking active part in the definition and testing of these components
The services provided by AliEn are
ALICE job database and related distributed tools and services
ALICE file catalogue and related distributed tools and services
ALICE specific job reporting services
Their (high-level) functionality is ALICE-specific and not found elsewhere
Batch Analysis (1):: Batch Analysis (1): The jobs are described by AliEn JDL files
Executable=“startana”
Packages={“ROOT::5.11.02, …”}
Split=“se”
InputFile={“LF:/alice/…/MyBatchAnalysis.C”}
InputData={LF:/alice/…/AliESDs.root, nodownload”}
OutputFile={esdAna.root@Alice::CERN::se01,noarchive}
Submitted to the AliEn TQ from the AliEn command line
Submit .jdl
Scheduled, optimized, splitted (based on the InputData)
Can be monitored and re-prioritised
ps –trace
The results are registered in AliEn distributed file catalogue
The job runs on many machines in parallel, as close to the InputData as possible
Batch analysis (2):: File Catalogue query CE and SE processing User job (many events) Data set (ESD’s, AOD’s) Job Optimizer Sub-job 1 Sub-job 2 Sub-job n CE and SE processing CE and SE processing Job Broker Grouped by SE files location
Submit to CE with closest SE
Output file 2 Output file n File merging job Job output Batch analysis (2): processing processing
Interactive Analysis (1):: Interactive Analysis (1): A user starts ROOT session on a laptop
The analysis macros are started from the ROOT command line
The data files on the GRID are accessed using ROOT (AliEn) UI (via xrootd)
The results are stored locally or can be registered on the GRID (AliEn file catalogue)
If the data files are stored on a cluster, the interactive analysis is done in parallel using PROOF
Interactive Analysis (2): File Access from ROOTall files accessible via LFNs ! : Interactive Analysis (2): File Access from ROOT all files accessible via LFNs !
Slide11: Physics Data Challenge (PDC06): Running since 25 April 2006
29 sites participating (6 T1s, 23 T2s)
More than 100K jobs done, 500K p+p, 90K Pb+Pb events, 40 TB of data stored at CASTOR@CERN
Validation of the Computing Model in ALICE DC
p: p Example of analysis PDC06 is
the last opportunity to exercise the simulation&reconstruction
And the analysis !
Conclusions: Conclusions Parallelism provided by the GRID offers a new opportunity for the analysis of extremely large sets of data
ALICE accesses the GRID using its own environment, AliEn
AliEn, together with ROOT/PROOF, are solid foundations to build the final system
We’ve been permanently testing our GRID infrastructure in ALICE Data Challenges
Wish us good luck !