logging in or signing up Seminar Malden Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: Embed: Flash iPad Dynamic Copy Does not support media & animations Automatically changes to Flash or non-Flash embed WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 288 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: October 08, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript ALICE Computing Model: F.Carminati BNL Seminar March 21, 2005 ALICE Computing ModelOffline framework: Offline framework AliRoot in development since 1998 Entirely based on ROOT Used since the detector TDR’s for all ALICE studies Two packages to install (ROOT and AliRoot) Plus MC’s Ported on most common architectures Linux IA32, IA64 and AMD, Mac OS X, Digital True64, SunOS… Distributed development Over 50 developers and a single CVS repository 2/3 of the code developed outside CERN Tight integration with DAQ (data recorder) and HLT (same code-base) Wide use of abstract interfaces for modularity “Restricted” subset of c++ used for maximum portabilityAliRoot layout: AliRoot layout ROOT AliRoot STEER Virtual MC G3 G4 FLUKA HIJING MEVSIM PYTHIA6 PDF EVGEN HBTP HBTAN ISAJET AliEn/gLite EMCAL ZDC ITS PHOS TRD TOF RICH ESD AliAnalysis AliReconstruction PMD CRT FMD MUON TPC START RALICE STRUCT AliSimulationSoftware management: Software management Regular release schedule Major release every six months, minor release (tag) every month Emphasis on delivering production code Corrections, protections, code cleaning, geometry Nightly produced UML diagrams, code listing, coding rule violations, build and tests , single repository with all the code No version management software (we have only two packages!) Advanced code tools under development (collaboration with IRST/Trento) Smell detection (already under testing) Aspect oriented programming tools Automated genetic testingALICE Detector Construction Database (DCDB) : ALICE Detector Construction Database (DCDB) Specifically designed to aid detector construction in distributed environment: Sub-detector groups around the world work independently All data collected in central repository and used to move components from one sub-detector group to another and during integration and operation phase at CERN Multitude of user interfaces: WEB-based for humans LabView, XML for laboratory equipment and other sources ROOT for visualisation In production since 2002 A very ambitious project with important spin-offs Cable Database Calibration DatabaseThe Virtual MC: The Virtual MCTGeo modeller: TGeo modeller Results: Results Geant3 FLUKA HMPID 5 GeV PionsITS – SPD: Cluster SizePRELIMINARY!: ITS – SPD: Cluster Size PRELIMINARY!Reconstruction strategy: Reconstruction strategy Main challenge - Reconstruction in the high flux environment (occupancy in the TPC up to 40%) requires a new approach to tracking Basic principle – Maximum information approach Use everything you can, you will get the best Algorithms and data structures optimized for fast access and usage of all relevant information Localize relevant information Keep this information until it is needed Tracking strategy – Primary tracks: Tracking strategy – Primary tracks Incremental process Forward propagation towards to the vertex TPCITS Back propagation ITSTPCTRDTOF Refit inward TOFTRDTPCITS Continuous seeding Track segment finding in all detectors Combinatorial tracking in ITS Weighted two-tracks 2 calculated Effective probability of cluster sharing Probability not to cross given layer for secondary particlesTracking & PID: Tracking & PID PIV 3GHz – (dN/dy – 6000) TPC tracking - ~ 40s TPC kink finder ~ 10 s ITS tracking ~ 40 s TRD tracking ~ 200 s TPC ITS+TPC+TOF+TRDCondition and alignment: Condition and alignment Heterogeneous information sources are periodically polled ROOT files with condition information are created These files are published on the Grid and distributed as needed by the Grid DMS Files contain validity information and are identified via DMS metadata No need for a distributed DBMS Reuse of the existing Grid servicesExternal relations and DB connectivity: External relations and DB connectivity DAQ Trigger DCS ECS Physics data DCDB AliEngLite: metadata file store calibration procedures calibration files AliRoot Calibration classes API API API API API files From URs: Source, volume, granularity, update frequency, access pattern, runtime environment and dependencies API – Application Program Interface Relations between DBs not final not all shown API API HLT Call for UR sent to subdetectorsMetadata: Metadata MetaData are essential for the selection of events We hope to be able to use the Grid file catalogue for one part of the MetaData During the Data Challenge we used the AliEn file catalogue for storing part of the MetaData However these are file-level MetaData We will need an additional event-level MetaData This can be simply the TAG catalogue with externalisable references We are discussing with STAR on this subject We will take a decision soon We would prefer that the Grid scenario be clearerALICE CDC’s: ALICE CDC’sUse of HLT for monitoring in CDC’s: Use of HLT for monitoring in CDC’s Aliroot Simulation Digits Raw Data LDC LDC LDC LDC GDC Event builder alimdc Root file CASTOR AliEn Monitoring HLT Algorithms ESD HistogramsALICE Physics Data Challenges: ALICE Physics Data ChallengesPDC04 schema: CERN Tier2 Tier1 Tier2 Tier1 Production of RAW Shipment of RAW to CERN Reconstruction of RAW in all T1’s Analysis AliEn job control Data transfer PDC04 schemaPhase 2 principle: Mixed signal Phase 2 principleSimplified view of the ALICE Grid with AliEn: Simplified view of the ALICE Grid with AliEn Local scheduler ALICE VO – central services Central Task Queue Job submission File Catalogue Configuration Accounting User authentication Computing Element Workload management Job Monitoring Storage volume manager Data Transfer Storage Element Cluster Monitor AliEn Site services Disk and MSS Existing site components ALICE VO – Site services integrationSite services : Site services Inobtrusive – entirely in user space: Singe user account All authentication already assured by central services Tuned to the existing site configuration – supports various schedulers and storage solutions Running on many Linux flavours and platforms (IA32, IA64, Opteron) Automatic software installation and updates (both service and application) Scalable and modular – different services can be run on different nodes (in front/behind firewalls) to preserve site security and integrity: Load balanced file transfer nodes (on HTAR) CERN firewall solution for large volume file transfers Fire wall ONLY High ports (50K-55K) for parallel file transport CERN Intranet AliEn Data Transfer AliEn Other servicesSlide23: Log files, application software storage 1TB SATA Disk serverPhase 2 job structure : Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs Storage CERN CASTOR: underlying events Local SEs CERN CASTOR: backup copy Storage Primary copy Primary copy Local SEs Output files Output files Underlying event input files zip archive of output files Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN edg(lcg) copy®ister File catalogue Phase 2 job structure Task - simulate the event reconstruction and remote event storage Completed Sep. 2004Production history : Production history ALICE repository – history of the entire DC ~ 1 000 monitored parameters: Running, completed processes Job status and error conditions Network traffic Site status, central services monitoring …. 7 GB data 24 million records with 1 minute granularity – analysed to improve GRID performance Statistics 400 000 jobs, 6 hours/job, 750 MSi2K hours 9M entries in the AliEn file catalogue 4M physical files at 20 AliEn SEs in centres world-wide 30 TB stored at CERN CASTOR 10 TB stored at remote AliEn SEs + 10 TB backup at CERN 200 TB network transfer CERN –> remote computing centres AliEn efficiency observed >90% LCG observed efficiency 60% (see GAG document)Job repartition: Job repartition Jobs (AliEn/LCG): Phase 1 - 75/25%, Phase 2 – 89/11% More operation sites added to the ALICE GRID as PDC progressed Phase 2 Phase 1 17 permanent sites (33 total) under AliEn direct control and additional resources through GRID federation (LCG) Summary of PDC’04: Summary of PDC’04 Computing resources It took some effort to ‘tune’ the resources at the remote computing centres The centres’ response was very positive – more CPU and storage capacity was made available during the PDC Middleware AliEn proved to be fully capable of executing high-complexity jobs and controlling large amounts of resources Functionality for Phase 3 has been demonstrated, but cannot be used LCG MW proved adequate for Phase 1, but not for Phase 2 and in a competitive environment It cannot provide the additional functionality needed for Phase 3 ALICE computing model validation: AliRoot – all parts of the code successfully tested Computing elements configuration Need for a high-functionality MSS shown Phase 2 distributed data storage schema proved robust and fast Data Analysis could not be testedDevelopment of Analysis: Development of Analysis Analysis Object Data designed for efficiency Contain only data needed for a particular analysis Analysis à la PAW ROOT + at most a small library Work on the distributed infrastructure has been done by the ARDA project Batch analysis infrastructure Prototype published at the end of 2004 with AliEn Interactive analysis infrastructure Demonstration performed at the end 2004 with AliEngLite Physics working groups are just starting now, so timing is right to receive requirements and feedbackSlide29: Forward Proxy Forward Proxy Rootd Proofd Grid/Root Authentication Grid Access Control Service TGrid UI/Queue UI Proofd Startup Slave Registration/ Booking- DB Site <X> PROOF SLAVE SERVERS Site A PROOF SLAVE SERVERS Site B LCG Master Setup New Elements Grid Service Interfaces Grid File/Metadata Catalogue Client retrieves list of logical file (LFN + MSN) Booking Request with logical file names “Standard” Proof Session Slave ports mirrored on Master host Optional Site Gateway Master Client Grid-Middleware independend PROOF Setup Only outgoing connectivityGrid situation: Grid situation History Jan ‘04: AliEn developers are hired by EGEE and start working on new MW May ‘04: A prototype derived from AliEn is offered to pilot users (ARDA, Biomed..) under the gLite name Dec ‘04: The four experiments ask for this prototype to be deployed on larger preproduction service and be part of the EGEE release Jan ‘05: This is vetoed at management level -- AliEn will not be common software Current situation EGEE has vaguely promised to provide the same functionality of AliEn-derived MW But with a 2-4 months delay at least on top of the one already accumulated But even this will be just the beginning of the story: the different components will have to be field tested in a real environment, it took four years for AliEn All experiments have their own middleware Our is not maintained because our developers have been hired by EGEE EGEE has formally vetoed any further work on AliEn or AliEn-derived software LCG has allowed some support for ALICE but the situation is far from being clearALICE computing model: ALICE computing model For pp similar to the other experiments Quasi-online data distribution and first reconstruction at T0 Further reconstruction passes at T1’s For AA different model Calibration, alignment and pilot reconstructions during data taking Data distribution and first reconstruction at T0 during the four months after AA run (shutdown) Second and third pass distributed at T1’s For safety one copy of RAW at T0 and a second one distributed among all T1’s T0: First pass reconstruction, storage of one copy of RAW, calibration data and first-pass ESD’s T1: Subsequent reconstructions and scheduled analysis, storage of the second collective copy of RAW and one copy of all data to be safely kept (including simulation), disk replicas of ESD’s and AOD’s T2: Simulation and end-user analysis, disk replicas of ESD’s and AOD’s Very difficult to estimate network loadALICE requirements on MiddleWare: ALICE requirements on MiddleWare One of the main uncertainties of the ALICE computing model comes from the Grid component ALICE was developing its computing model assuming that a MW with the same quality and functionality that AliEn would have had in two years from now will be deployable on the LCG computing infrastructure If not, we will still analyse the data (!), but Less efficiency more computers more time and money More people for production more money To elaborate an alternative model we should know what will be The functionality of the MW developed by EGEE The support we can count on from LCG Our “political” “margin of manoeuvre”Possible strategy: Possible strategy If Basic services from LCG/EGEE MW can be trusted at some level We can get some support to port the “higher functionality” MW onto these services We have a solution If a) above is not true but if We have support for deploying the ARDA-tested AliEn-derived gLite We do not have a political “veto” We still have a solution Otherwise we are in troubleALICE Offline Timeline: ALICE Offline Timeline Main parameters: Main parametersProcessing pattern: Processing patternConclusions: Conclusions ALICE has made a number of technical choices for the Computing framework since 1998 that have been validated by experience The Offline development is on schedule, although contingency is scarce Collaboration between physicists and computer scientists is excellent Tight integration with ROOT allows fast prototyping and development cycle AliEn goes a long way in providing a GRID solution adapted to HEP needs However its evolution into a common project has been “stopped” This is probably the largest single “risk factor” for ALICE computing Some ALICE-developed solutions have a high potential to be adopted by other experiments and indeed are becoming “common solutions” You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.