Presentation Description

No description available.


Presentation Transcript


Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

CMS distributed computing: 

CMS distributed computing CMS wanted to build a distributed computing system all along! CMS CTP (Dec 1996): One integrated computing system with a single global view of the data Used by the 1000s of CMS collaborators around the world We now call this the `CMS Data Grid System'

PPDG: Mission-Oriented Pragmatic Methodology: 

PPDG: Mission-Oriented Pragmatic Methodology End-to-end integration and deployment of experiment applications using existing and emerging Grid services Deployment of Grid technologies and services in production (24x7) environments With stressful performance needs Collaborative development of Grid middleware and extensions between application and middleware groups Leading to pragmatic and acceptable-risk solutions. HENP experiments extend their adoption of common infrastructures to higher layers of their data analysis and processing applications. Much attention to integration, coordination, interoperability and interworking With emphasis on incremental deployment of increasingly functional working systems

CMS Grid Requirements: 

Major Grid requirements effort completed Document writing by Caltech group Catania CMS week Grid workshop (June 2001, about 12 hours over various sessions) CMS consensus on many strategic issues Division of labor between Grid projects and CMS Computing group Needed for planning, manpower estimates Grid job execution model Grid data model, replication model Object handling and the Grid Main Grid Requirements Document: CMS Data Grid System Overview and Requirements. CMS Note 2001/037 http://kholtman.home.cern.ch/kholtman/cmsreqs.pdf Additional documents on object views, hardware sizes, workload model, data model (K. Holtman) CMS Note 2001/047 CMS Grid Requirements

Objects and Files in the Grid: 

Objects and Files in the Grid CMS computing is object-oriented, and database oriented Fundamentally we have a persistent data model with 1 object = 1 piece of physics data (KB-MB size) Much of the thinking in the Grid projects and Grid community is file oriented `Computer center' view of large applications Do not look inside application code Think about application needs in terms of CPU batch queues, disk space for files, file staging and migration How to reconcile this ? CMS requirements 2001-2003: Grid project components do not need to deal with objects directly Specify file handling requirements in such a way that a CMS layer for object handling can be built on top LCG Project (SC2, PEB) has started to develop new object handling layer

Grid Services for CMS: Division of Labor (CMS Week,June 2001): 

Grid Services for CMS: Division of Labor (CMS Week,June 2001)

GriPhyN/PPDG Architecture: 

GriPhyN/PPDG Architecture

CMS Production: 

CMS Production

Data Produced in 2001: 

Data Produced in 2001


GDMP Tool to transfer and manage files in production Easy to handle this manually with a few centers, impossible with lots of data at many centers GDMP is based around Globus Middleware and a Flexible architecture Globus Replica Catalogue Provided an early model of collaboration between HEP and Grid middleware providers Successfully used to replicate > 1TB of CMS data Now a PPDG/EU DataGrid joint project Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication Applied Informatics Conference (AI2001), Innsbruck, Austria, 2/1001.

Early GriPhyN Challenge Problem: CMS Data Reconstruction (Caltech/Wisc/NCSA): 

Early GriPhyN Challenge Problem: CMS Data Reconstruction (Caltech/Wisc/NCSA)

PPDG MOP system: 

PPDG MOP system PPDG Developed MOP System Allows submission of CMS prod. Jobs from a central location, run on remote locations, and return results Relies on GDMP for replication Globus GRAM Condor-G and local queuing systems for Job Scheduling IMPALA for Job Specification Shown in SC2001 demo Now being deployed in USCMS testbed Proposed as basis for next CMS-wide production infrastructure

US CMS Prototypes and Test-beds: 

US CMS Prototypes and Test-beds All U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects Integrating Grid software into CMS systems Bringing CMS Production on the Grid Understanding the operational issues MOP used as first pilot application MOP system got official CMS production assignment of 200K CMSIM events 50K have been produced and registered already

Installing middleware: 

Installing middleware Virtual Data Toolkit Globus 2.0 beta Essential Grid Tools Essential Grid Services I & II Grid API Condor-G 6.3.1 Condor 6.3.1 ClassAds 0.9 GDMP 3.0 alpha 3 We found the VDT to be very easy to install, but a little bit more challenging to configure


= no code = existing = implemented using MOP Prototype VDG System (production)

Analysis part: 

Analysis part Physics data analysis will be done by 100s of users Caltech taking responsibility for developing the analysis part of the vertically integrated system Analysis part is connected to same catalogs Maintain a global view of all data Big analysis jobs can use production job handling mechanisms Analysis services based on tags

Optimization of “Tag” Databases: 

Optimization of “Tag” Databases Tags are small (~0.2 - 1 kbyte) summary objects for each event Crucial for fast selection of interesting event subsets; this will be an intensive activity Past work concentrated in three main areas: Integration of CERN’s “HepODBMS” generic Tag system with the CMS “COBRA[*]” framework Investigations of Tag bitmap indexing to speed queries Comparisons of OO and traditional databases (SQL Server, soon Oracle 9i) as efficient stores for Tags New work concentrates on tag based analysis services

CLARENS: a Portal to the Grid: 

CLARENS: a Portal to the Grid Grid-enabling the working environment for non-specialist physicists' data analysis Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence. The server is implemented in C++ to give access to the CMS OO analysis toolkit. The server will provide a remote API to Grid tools: Security services provided by the Grid (GSI) The Virtual Data Toolkit: Object collection access Data movement between Tier centers using GSI-FTP CMS analysis software (ORCA/COBRA), Current prototype is running on the Caltech proto-Tier2 More information at http://clarens.sourceforge.net, along with a web-based demo

Globally Scalable Monitoring Service CMS (Caltech and Pakistan): 

Push & Pull rsh & ssh existing scripts snmp Globally Scalable Monitoring Service CMS (Caltech and Pakistan)

Current events: 

Current events GDMP and MOP just had very favorable internal reviews in PPDG Testbed: currently MOP deployment under way Stresses the Grid middleware in new ways: new issues and bugs being discovered in Globus, Condor Testbed MOP production request: 200K CMSIM events requested, now 50K (~10 GB) finished and validated. New fully integrated system: first versions expected by summer System will be the basis for demos at SC2002 Upcoming: CMS workshop on Grid based production (CERN) Upcoming: PPDG analysis workshop (Berkeley)

2000 - 2001: 

2000 - 2001 Main `Grid task' activities in 2000 - 2001: Ramp-up of Grid projects, establish a new mode of working Grid project requirements documents, architecture GDMP Started as griddified package for data transport in CMS production, is now a more generic project Used widely in 2001 production Also demo of mode of working MOP Vertical integration of CMS production software, GDMP, Condor Both GDMP and MOP just had very succesful internal reviews in PPDG


2002 Grid task main activities (in US) in 2002: Build USCMS test grid Deploy Globus 2.0, EU DataGrid components Use MOP as a basis for developing a larger vertically integrated system with Virtual data features Central catalogs and a global view of data Production facilities Participate in real CMS production with non-trivial jobs Analysis facilities Caltech team's main role is towards analysis facilities

Summary: 2000 - 2002: 

Summary: 2000 - 2002 Main `Grid task' activities in 2000 - 2001: Grid project requirements documents, architecture GDMP MOP Main `Grid task' activities (in US) in 2002: Build USCMS test grid Deploy Globus 2.0, EU DataGrid components Use MOP as a basis for developing a larger vertically integrated system with Virtual data features Central catalogs and a global view of data Production facilities Participate in real CMS production Analysis facilities

authorStream Live Help