egee den haag nov04

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

LHC Computing Grid Project – LCG CERN – European Organisation for Nuclear Research Geneva, Switzerland les.robertson@cern.ch Project Status Les Robertson, LCG Project Leader EGEE Conference – Den Haag 26 November 2004

Summary: 

Summary Key points about LCG, EGEE and other Grid Infrastructures Status & Concerns Planning for LHC Startup

LCG Project Activity Areas: 

LCG Project Activity Areas Applications Development environment and common libraries, frameworks, tools for the LHC experiments CERN Fabric Construction and operation of the central LHC computing facility at CERN Networking Planning the availability of the high bandwidth network services to interconnect the major computing centres used for LHC data analysis

Risks and Opportunities: 

Risks and Opportunities LCG & EGEE are combining resources to build an operation that is wider in scope and ambition than LCG would be able to tackle on its own. LCG has all of its middleware eggs in the EGEE basket If we can use the real needs and real resources of the LHC experience to establish a general science grid infrastructure that is supported long term we will all benefit - -- that is why we are in this project EGEE stops in March 2006! LHC starts in 2007! This is an enormous risk for LCG I am not sure that there are other applications that have shown this level of confidence in the EGEE project I am sure that the LCG reviewers would not agree entirely with some of the views of the EGEE reviewers – -- the risk we are taking deserves a considerable priority from the EGEE project

LCG Service Hierarchy: 

LCG Service Hierarchy Tier-2 – ~100 centres in ~40 countries Simulation End-user analysis – batch and interactive

Networking: 

Networking Latest estimates are that Tier-1s will need connectivity at ~10 Gbps with ~70 Gbps at CERN There is no real problem for the technology as has been demonstrated by a succession of Land Speed Records But LHC will be one of the few applications needing – - this level of performance as a service on a global scale We have to ensure that there will be an effective international backbone – that reaches through the national research networks to the Tier-1s LCG has to be pro-active in working with service providers Pressing our requirements and our timetable Exercising pilot services

LHC Computing Resources: 

LHC Computing Resources Most of the LHC resources around the world are organised as national and regional grid projects, integrated into the combined LCG-2/EGEE operation There are separate infrastructures in the US (Grid-3) and the Nordic countries (NorduGrid) that use different middleware The LCG project has a dual role – Operating the LCG-2/EGEE grid - a joint LCG-EGEE activity Coordinating the wider set of resources available to LHC There is an active programme aimed at compatibility/inter-working of LCG-2/EGEE and Grid3 And on-going technical discussions with similar aims with NorduGrid  Lack of standards is a major headache for LHC experiments In practice, the standard is most likely to be set by a “winning” middleware implementation

Status & Concerns: 

Status & Concerns

Grid Deployment - going well: 

Grid Deployment - going well The grid deployment process (LCG-2) is working well – Integration – certification – debugging Distribution - installation Rapid reaction to problems encountered during the LHC experiments’ “data challenges”  incremental releases of LCG-2  significant improvements in reliability, performance and scalability within the limits of the current architecture Scalability is much better than scheduled, or expected a year ago  ~90 nodes, ~9,000 processors  close to final scale of the LCG grid! Heavily used during the data challenges in 2004 lots of real work done – for real physicists -- these are not tests or demos many small sites have contributed to simulation runs one experiment (LHCb) has run up to 3,500 concurrent jobs

Grid Deployment - concerns: 

Grid Deployment - concerns The basic issues of middleware reliability and scalability that we were struggling with a year ago have been overcome BUT - there are many issues of functionality, usability and performance to be resolved -- soon Overall job success rate 60-75% Can be tolerated for “production” work – submitted by small teams with automatic job generation, bookkeeping systems Unacceptable for end-user data analysis

Slide11: 

Urgent to improve operations coordination and management EGEE support resources now in place Core operations centres established  CLRC Oxford, IN2P3 Lyon, CNAF Bologna, ASCC Taipei, CERN Global Grid User Support centre  Forschungszentrum Karlsruhe Operations workshop at CERN 2-4 November The new, improved middleware from EGEE is awaited with impatience

LCG-2 and Next Generation Middleware: 

LCG-2 and Next Generation Middleware LCG-2 focus on production, large-scale data handling The service for the 2004/5 data challenges Provides experience on operating and managing a global grid service -- middleware neutral Continuing, modest development programme driven by data challenge experience Will be supported until gLite is able to replace it (functionality, scaling, reliability, performance) focus on analysis LHC applications and users closely involved in prototyping & development (ARDA/NA4 project) Short development cycles Deployed along with LCG-2 (co-existence) Hope to be able to replace some LCG-2 components at an early stage with gLite components LCG-2 prototyping prototyping product 2004 2005 product gLite ?

Middleware from EGEE: 

Middleware from EGEE We have a rapidly growing number of sites connecting to the LCG-2/EGEE grid -- but there are major holes in the functionality, especially in data management, and concerns about workload management The first gLite prototype was made available in a development environment in May (6 weeks after EGEE started!) Good experience with this leads to strong pressure for extended access – more users, more data But there are difficulties in getting the product out the first pieces are only being delivered to the pre-production testbed this month key components will only arrive next year Absolute priority must now be to get the basic gLite functionality out on the pre-production testbed -- and establish the process of short development cycles The LHC experiments have a pressing time-line -- I do not want them to be forced to employ alternative solutions

Planning for LHC Startup: 

Planning for LHC Startup

Planning for LHC Startup: 

Planning for LHC Startup The agreements between the centres that will implement the LHC computing environment will be mapped out over the next 6-9 months December 2004 Experiment requirements and computing models published First quarter 2005 – Establish resource plans for Tier-0, Tier-1 and major Tier-2s Initial plan for Tier-0/1/2 networking April 2005 Formal collaboration framework – memorandum of understanding July 2005 – Technical Design Report Detailed plan for installation and commissioning the LHC computing environment To what extent will there be experience of the new middleware before these major decisions are made?

Service Challenge Programme to Ramp-up to LHC Startup: 

Service Challenge Programme to Ramp-up to LHC Startup Dec04 - Service Challenge 1 Basic high performance data transfer - 2 weeks sustained CERN + 3 Tier-1s, 500 MB/sec between CERN and Tier-1s Mar05 - Service Challenge 2 Reliable file transfer service mass store (disk) - mass store (disk) CERN + ≥ 5 sites, 500 MB/sec between sites, 1 month sustained

Service Challenge Programme to Ramp-up to LHC Startup: 

Service Challenge Programme to Ramp-up to LHC Startup Jul05 - Service Challenge 3 - Tier-0/Tier-1 base service - CERN + ≥ 5 Tier-1s, 300 MB/sec. including mass store (disk+tape) - sustained 1 month - ~5 Tier-2 centres at lower bandwidth Preparation for -- Tier-0/1 model verification – two experiments concurrently at ~50% of nominal data rate 2008 First beams Full physics run

Service Challenge Programme to Ramp-up to LHC Startup: 

Service Challenge Programme to Ramp-up to LHC Startup Apr06 - Service Challenge 4 - Tier-0, ALL Tier-1s, major Tier-2s operational at full target data rates (~1.2 GB/sec at Tier-0 ) Preparation for .. Tier-0/1/2 full model test - All experiments - 100% nominal data rate, with processing load scaled to 2006 cpus - sustained 1 month 2008 First beams Full physics run

Service Challenge Programme to Ramp-up to LHC Startup: 

Service Challenge Programme to Ramp-up to LHC Startup Nov06 - Service Challenge 5 – Infrastructure Ready at ALL Tier-1s, selected Tier-2s - Tier 0/1/2 operation - sustained 1 month - twice target data rates (~ 2.5 GB/sec at Tier-0) Preparation for .. Feb07 - ATLAS + CMS + LHCb + ALICE (proton mode) - Tier-0/1 100% full model test 2008 First beams Full physics run

Summary : 

Summary Grid Operation Very good progress during the past year Large scale deployment Real work performed for experiments Much work to be done to improve job success rate -- operations management, site discipline, middleware Grid Middleware Some of the missing functionality can be provided through short term developments of LCG-2 But we are looking to the EGEE/gLite work for middleware adapted to end user analysis Urgent to deliver the base set of gLite components LCG needs a permanent, increasingly stable service for experiments to do physics And in addition has a tight schedule of service and computing model readiness tests