Grid Computing for HEP : Grid Computing for HEP L. E. Price
Argonne National Laboratory
HEP-CCC Meeting
CERN, November 12, 1999
The Challenge : The Challenge Providing rapid access to event samples and subsets from massive datastores, from 100s of Terabytes in 2000 to 100 Petabytes by 2010.
Transparent access to computing resources, throughout the U.S., and throughout the World
The extraction of small or subtle new physics signals from large and potentially overwhelming backgrounds
Enabling access to the data, and to the rest of the physics community, across and ensemble of networks of varying capability and reliability, using heterogeneous computing resources
Achieving a Balance : Achieving a Balance Proximity of the data to central computing and data handling resources
Proximity of frequently accessed data to the users, to be processed in desktops, local facilities, or regional centers
Making efficient use of limited network bandwidth; especially transoceanic
Making appropriate use of regional and local computing and data handling
Involving scientists and students in each world region in the physics analysis
Need for optimization : Need for optimization Meeting the demands of hundreds of users who need transparent access to local and remote data in disk caches and tape stores
Prioritizing hundreds to thousands of requests from the local and remote communities
Structuring and organizing the data; providing the tools for locating, moving, and scheduling data transport between tape and disk and across networks
Ensuring that the overall system is dimensioned correctly to meet the aggregate need
Science and Massive Datasets : Science and Massive Datasets Massive dataset generation the new norm in science
High Energy Physics
Nuclear Physics
LIGO
Automated astronomical scans (e.g., Sloan Digital Sky Survey)
The Earth Observing System (EOS)
The Earth System Grid
Geophysical data (e.g., seismic)
Satellite weather image analysis
The Human Brain Project (time series of 3-D images)
Protein Data Bank
The Human Genome Project
Molecular structure crystallography data
Proposed Solution : Proposed Solution A data analysis grid for High Energy Physics
Analogy to Computing Grid : Analogy to Computing Grid Because the resources needed to solve complex problems are rarely collocated
Topic of intensive CS research for a number of years already
Computing (or data) resources from a “plug on the wall”
Why a Hierarchical Data Grid? : Why a Hierarchical Data Grid? Physical
Appropriate resource use data proximity to users & labs
Efficient network use local > regional > national > oceanic
Scalable growth avoid bottlenecks
Human
Central lab cannot manage / help / care about 1000s of users
Cleanly separates functionality of different resource types
University/regional computing complements national labs funding agencies
Easier to leverage resources, maintain control, assert priorities at regional/local level
Effective involvement of scientists and students independently of location
Logical Steps toward Data Grid : Logical Steps toward Data Grid 2000 2005 2010 1995 Production Basic Research Testbeds Design/Optimization (Pre)
U.S. Grid Technology Projects : U.S. Grid Technology Projects 2000 2005 2010 1995 LHC, GriPhyN Clipper/ NGI-PPDG Apogee PASS/Globus/HENP-GC /MONARC/GIOD/Nile
In Progress : In Progress Laboratory and experiment-specific development, deployment and operation (hardware and software);
Tool development in HENP, Computer Science, Industry; The Particle Physics Data Grid:
NGI-funded project aiming (initially) at jump-starting the exploitation of CS and HENP software components to make major improvements in data access. Business as usual
Proposals being Developed : Proposals being Developed GriPhyN:
Grid Physics Networking
Targeted at NSF;
Focus on the long-term university-based grid infrastructure for major physics and astronomy experiments.
APOGEE: A Physics-Optimized Grid Environment for Experiments
Targeted at DoE HENP (and/or DoE SSI);
Focus on medium to long-term software needs for HENP distributed data management;
Initial focus on instrumentation, modeling and optimization.
PPDG, APOGEE and GriPhyN : PPDG, APOGEE and GriPhyN A coherent program of work;
Substantial common management proposed;
A focus for HENP collaboration with Computer Science and Industry;
PPDG/Apogee will create “middleware” needed by data-intensive science including LHC. (Synergy but no overlap with CMS/Atlas planning.)
Data Grid Projects in Context : Data Grid Projects in Context
Construction and Operation of HENP Data Management an Analysis Systems
Tiers 0/1 >> $20M/yr of existing funding at HENP labs.
e.g. SLAC FY1999
~$7M equipment for BaBar (of which < $2M physics CPU);
~$3M labor, M&S.
Data Grid Projects in Context : Data Grid Projects in Context
Construction and Operation of HENP Data Management and Data Analysis Systems at DoE Laboratories
Tiers 0/1
GriPhyN
HENP Data Manage-ment at Major University Centers
Tier 2 Draft proposal for NSF funding:
$5-$16M/year
$16M = $8M hardware $5M labor/R&D $3M network
Data Grid Projects in Context : Data Grid Projects in Context
Construction and Operation of HENP Data Management and Data Analysis Systems at DoE Laboratories
Tiers 0/1
GriPhyN
HENP Data Manage-ment at Major University Centers
Tier 2 OO Databases and Analysis Tools Resource Management Tools Metadata Catalogs WAN Data Movers Mass Storage Management Systems Matchmaking Widely Applicable Technolgy and Computer Science
(not only from HENP;
100s of non-HEP FTEs)
Data Grid Projects in Context : Data Grid Projects in Context
Construction and Operation of HENP Data Management and Data Analysis Systems at DoE Laboratories
Tiers 0/1
GriPhyN
HENP Data Manage-ment at Major University Centers
Tier 2 OO Databases and Analysis Tools Resource Management Tools Metadata Catalogs WAN Data Movers Mass Storage Management Systems Matchmaking PPDG
Particle Physics Data Grid
NGI Project
Large-scale tests/service focused on use of existing components
Data Grid Projects in Context : Data Grid Projects in Context
Construction and Operation of HENP Data Management and Data Analysis Systems at DoE Laboratories
Tiers 0/1
GriPhyN
HENP Data Manage-ment at Major University Centers
Tier 2 OO Databases and Analysis Tools Resource Management Tools Metadata Catalogs WAN Data Movers Mass Storage Management Systems Matchmaking PPDG
Particle Physics Data Grid
NGI Project
Unified Project Management Optimization and Evaluation Instrumentation Modeling and Simulation A new level of rigor as the foundation for future progress APOGEE
Data Grid Projects in Context : Data Grid Projects in Context
Construction and Operation of HENP Data Management and Data Analysis Systems at DoE Laboratories
Tiers 0/1
GriPhyN
HENP Data Manage-ment at Major University Centers
Tier 2 OO Databases and Analysis Tools Resource Management Tools Metadata Catalogs WAN Data Movers Mass Storage Management Systems Matchmaking PPDG
Particle Physics Data Grid
NGI Project
Unified Project Management Optimization and Evaluation Instrumentation Modeling and Simulation APOGEE R&D + Contacts with CS/Industry Long-term Goals Testbeds
Overall Program Goal : Overall Program Goal A Coordinated Approach to the Design and Optimization of a Data Analysis Grid for HENP Experiments
Slide21 : Particle Physics Data Grid Universities, DoE Accelerator Labs, DoE Computer Science Funded by DoE-NGI at $1.2M for first year
PPDG Collaborators : PPDG Collaborators Particle Accelerator Computer Physics Laboratory Science
ANL X X
LBNL X X
BNL X X x
Caltech X X
Fermilab X X x
Jefferson Lab X X x
SLAC X X x
SDSC X
Wisconsin X
Slide23 : First Year PPDG Deliverables Implement and Run two services in support of the major physics experiments at BNL, FNAL, JLAB, SLAC:
“High-Speed Site-to-Site File Replication Service”; Data replication up to 100 Mbytes/s
“Multi-Site Cached File Access Service”: Based on deployment of file-cataloging, and transparent cache-management and data movement middleware
First Year: Optimized cached read access to file in the range of 1-10 Gbytes, from a total data set of order One Petabyte
Using middleware components already developed by the Proponents
Slide24 : PPDG Site-to-Site Replication Service Network Protocols Tuned for High Throughput
Use of DiffServ for (1) Predictable high priority delivery of high - bandwidth data streams (2) Reliable background transfers
Use of integrated instrumentation to detect/diagnose/correct problems in long-lived high speed transfers [NetLogger + DoE/NGI developments]
Coordinated reservation/allocation techniques for storage-to-storage performance
Slide25 : PPDG Multi-site Cached File Access System University
CPU, Disk,
Users PRIMARY SITE
Data Acquisition,
Tape, CPU, Disk, Robot Satellite Site
Tape, CPU, Disk, Robot Satellite Site
Tape, CPU, Disk, Robot University
CPU, Disk,
Users University
CPU, Disk,
Users Satellite Site
Tape, CPU, Disk, Robot
Slide26 : PPDG Middleware Components
APOGEEFocus on Instrumentation and Modeling : APOGEE Focus on Instrumentation and Modeling Planned proposal to DOE
Originally targeted at SSI
Roughly the same collaborators as PPDG
Intended to be the next step after PPDG
Understanding Complex Systems(Writing into the BaBar Object Database at SLAC) : Understanding Complex Systems (Writing into the BaBar Object Database at SLAC) Aug. 1: ~4.7 Mbytes/s Oct. 1:
~28 Mbytes/s
APOGEE Manpower Requirements (FTE) : APOGEE Manpower Requirements (FTE) FY00 FY01 FY02 FY03 FY04
Instrumentation
Low-level data capture 0.5 1 0.75 0.75 0.75
Filtering and collecting agents 0.5 1 1 1 1
Data analysis and presentation 0.5 1 1 0.75 0.75
HENP workload profiling 0.5 1 0.5 0.5 0.5
Simulation
Framework design and development 1 2 1.5 1 0.5
User workload simulation 0.5 1 0.75 0.75 0.5
Component simulations (network, mass-storage system, object DB etc.) 1.25 2.5 2 1.5 1
Site simulation packages 1 1 1
Instrumentation/Simulation Testbed
Instrumentation of existing experiment(s) (e.g.PPDG) 0.5 1 1 1 1
Acquire and simulate performance measurements 0.25 0.5 0.5 0.75 1
Acquire user workload profile 0.25 0.5 0.5 0.25 0.25
Test prediction and optimization 0.5 0.75 0.75
Evaluation and Optimization
Quantify evolving needs of physics (including site policies etc.) 0.25 0.5 0.5 0.5 0.5
Develop metrics for usefulness of data management facilities 0.5 1 1 1 1
Optimize model systems 0.5 1 1.5
Long-Term Strategy (Towards "Virtual Data")
Tracking and testing HENP/CS/Industry developments 1 2 1.5 1.5 1.5
Development projects in collaboration with HENP/CS/Industry 0.5 1 1.5
Project Management (APOGEE and PPDG)
Project leader (physicist) 0.5 1 1 1 1
Lead computer scientist 0.5 1 1 1 1
TOTALS 8.5 17 17 17 17
APOGEE Funding Needs : APOGEE Funding Needs $k $k $k $k $k
FY00 FY01 FY02 FY03 FY04
Manpower
Instrumentation 250 500 406 375 375
Simulation 344 688 656 531 375
Instrumentation/Simulation Testbed 125 250 313 344 375
Evaluation and Optimization 94 188 250 313 375
Long-Term Strategy (Towards "Virtual Data") 125 250 250 313 375
Project Management (APOGEE and PPDG) 225 450 450 450 450
Commercial Software 100 250 375 500 500
Testbed hardware (in addition to parasitic 150 400 400 400 400
use of production systems)
Workstations, M&S, Travel 128 255 255 255 255
TOTALS 1540 3230 3355 3480 3480
GriPhyN Proposal : GriPhyN Proposal Addresses several massive dataset problems
ATLAS, CMS
LIGO
Sloan Digital Sky Survey (SDSS)
Tier 2 computing centers (university based)
Hardware commodity CPU / disk / tape
System support
Networking
Transatlantic link to CERN "high-speed"
Tier 2 backbone multi-gigabit/sec
R&D
Leverage Tier 2 + existing resources into Grid
Computer Science partnership, software
GriPhyN Goals : GriPhyN Goals Build production grid
Exploit all computing resources most effectively
Enable US physicists to participate fully in LHC program (also LIGO, SDSS)
Eliminate disadvantage of not being at CERN
Early physics analysis at LHC startup
Maintain and extend US leadership
Build collaborative infrastructure for students & faculty
Training ground for next generation leaders
Tier 2 Regional Centers : Tier 2 Regional Centers Total number »20
ATLAS: 6
CMS: 6
LIGO: 5
SDSS 23
Flexible architecture and mission complements national labs
Intermediate-level data handling
Makes possible regional collaborations
Well-suited to universities (training, mentoring and education)
Scale: Tier2 = (university * laboratory)1/2
1 scenario: Tier 2 = Tier 1 Tier 2 20% Tier 1
GriPhyN Funding (Very Rough) : GriPhyN Funding (Very Rough)
R&D Proposal $15M (Jan. 1999) : R&D Proposal $15M (Jan. 1999) R&D goals (complementary to APOGEE / PPDG)
Data, resource management over wide area
Fault-tolerant distributed computing over LAN
High-speed networks, as they relate to data management
Grid testbeds (with end-users)
Simulations crucial to success
MONARC group
With APOGEE / PPDG
Leverage resources available to us
Strong connections with Computer Science people
Existing R&D projects
Commercial connections
Grid Computing: Conclusions : Grid Computing: Conclusions HENP at the frontier of Information Technology
Collaboration with Computer Science;
Collaboration with industry;
Outreach to other sciences.
Visibility (and scrutiny) of HENP computing;
Enabling revolutionary advances in data analysis in the LHC era
Increasing the value of the vital investment in experiment-specific data-analysis software