Slide1 : Caltech HEP: Next Generation Networks, Grids and Collaborative Systems for Global VOs Harvey B. Newman
California Institute of Technology AOL Visit to Caltech September 23, 2003
Slide2 : First Beams: April 2007
Physics Runs: from Summer 2007 TOTEM pp, general purpose; HI LHCb: B-physics ALICE : HI pp s =14 TeV L=1034 cm-2 s-1
27 km Tunnel in Switzerland & France CMS Design Reports: Computing Fall 2004; Physics Fall 2005 ATLAS
CMS: Higgs at LHC : CMS: Higgs at LHC Higgs to Two Photons Higgs to Four Muons General purpose pp detector; well-adapted to lower initial lumi
Caltech Work on Crystal ECAL for precise e and g measurements; Higgs Physics
Precise All-Silicon Tracker: 223 m2 Excellent muon ID and precise momentum measurements (Tracker + Standalone Muon)
Caltech Work on Forward Muon Reco. & Trigger, XDAQ for Slice Tests FULL CMS SIMULATION
LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate : 109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations) LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate
The CMS CollaborationIs Progressing : The CMS Collaboration Is Progressing 2000+ Physicists & Engineers
36 Countries
159 Institutions Slovak Republic CERN France Italy UK Switzerland USA Austria Finland Greece Hungary Belgium Poland Portugal Spain Pakistan Georgia Armenia Ukraine Uzbekistan Cyprus Croatia China Turkey Belarus Estonia India Germany Korea Russia Bulgaria China (Taiwan) USA NEW in US CMS FIU
YALE
South America: UERJ Brazil
LHC Data Grid Hierarchy:Developed at Caltech : LHC Data Grid Hierarchy: Developed at Caltech Emerging Vision: A Richly Structured, Global Dynamic System
Next Generation Networks and Grids for HEP Experiments : Next Generation Networks and Grids for HEP Experiments Providing rapid access to event samples and analyzed physics results drawn from massive data stores
From Petabytes in 2003, ~100 Petabytes by 2007-8, to ~1 Exabyte by ~2013-5.
Providing analyzed results with rapid turnaround, by coordinating and managing large but LIMITED computing, data handling and NETWORK resources effectively
Enabling rapid access to the Data and the Collaboration
Across an ensemble of networks of varying capability
Advanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANs
With reliable, monitored, quantifiable high performance Worldwide Analysis: Data explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams
2001 Transatlantic Net WG Bandwidth Requirements [*] : 2001 Transatlantic Net WG Bandwidth Requirements [*] [*] See http://gate.hep.anl.gov/lprice/TAN. The 2001 LHC requirements outlook now looks Very Conservative in 2003
Production BW Growth of Int’l HENP Network Links (US-CERN Example) : Production BW Growth of Int’l HENP Network Links (US-CERN Example) Rate of Progress >> Moore’s Law. (US-CERN Example)
9.6 kbps Analog (1985)
64-256 kbps Digital (1989 - 1994) [X 7 – 27]
1.5 Mbps Shared (1990-3; IBM) [X 160]
2 -4 Mbps (1996-1998) [X 200-400]
12-20 Mbps (1999-2000) [X 1.2k-2k]
155-310 Mbps (2001-2) [X 16k – 32k]
622 Mbps (2002-3) [X 65k]
2.5 Gbps (2003-4) [X 250k]
10 Gbps (2005) [X 1M]
A factor of ~1M over a period of 1985-2005 (a factor of ~5k during 1995-2005)
HENP has become a leading applications driver, and also a co-developer of global networks
HEP is Learning How to Use Gbps Networks Fully: Factor of 25-100 Gain in Max. Sustained TCP Thruput in 15 Months, On Some US+TransAtlantic Routes : * 9/01 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN
1/09/02 190 Mbps for One stream shared on Two 155 Mbps links
5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams
6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel)
9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, 2.5G Link
11/02 [LSR] 930 Mbps in 1 Stream California-CERN, and California-AMS FAST TCP 9.4 Gbps in 10 Flows California-Chicago
2/03 [LSR] 2.38 Gbps in 1 Stream California-Geneva (99% Link Utilization)
5/03 [LSR] 0.94 Gbps IPv6 in 1 Stream Chicago- Geneva
Fall 2003 Goal: 6-10 Gbps in 1 Stream over 7-10,000 km (10G Link); LSRs HEP is Learning How to Use Gbps Networks Fully: Factor of 25-100 Gain in Max. Sustained TCP Thruput in 15 Months, On Some US+TransAtlantic Routes
FAST TCP: Baltimore/Sunnyvale : FAST TCP: Baltimore/Sunnyvale 1 flow 2 flows 7 flows 9 flows 10 flows Average utilization 95% 92% 90% 90% 88% Measurements 11/03
Std Packet Size
Utilization averaged over > 1hr
4000 km Path RTT estimation: fine-grain timer
Delay monitoring in equilibrium
Pacing: reducing burstiness
Fast convergence to equilibrium Fair Sharing Fast Recovery 8.6 Gbps;
21.6 TB in 6 Hours 9G 10G
On Feb. 27-28, a Terabyte of data was transferred in 3700 seconds by S. Ravot of Caltech between the Level3 PoP in Sunnyvale near SLAC and CERN through the TeraGrid router at StarLight from memory to memory As a single TCP/IP stream at average rate of 2.38 Gbps. (Using large windows and 9kB “Jumbo frames”)This beat the former record by a factor of ~2.5, and used the US-CERN link at 99% efficiency. : On Feb. 27-28, a Terabyte of data was transferred in 3700 seconds by S. Ravot of Caltech between the Level3 PoP in Sunnyvale near SLAC and CERN through the TeraGrid router at StarLight from memory to memory As a single TCP/IP stream at average rate of 2.38 Gbps. (Using large windows and 9kB “Jumbo frames”) This beat the former record by a factor of ~2.5, and used the US-CERN link at 99% efficiency. 10GigE Data Transfer: Internet2 LSR European Commission 10GigE NIC
Slide13 : “Private” Grids”: Structured P2P Sub-Communities in Global HEP
HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps : HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade; We are Rapidly Learning to Use Multi-Gbps Networks Dynamically
HENP Lambda Grids:Fibers for Physics : HENP Lambda Grids: Fibers for Physics Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores
Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007) requires that each transaction be completed in a relatively short time.
Example: Take 800 secs to complete the transaction. Then
Transaction Size (TB) Net Throughput (Gbps)
1 10
10 100
100 1000 (Capacity of Fiber Today)
Summary: Providing Switching of 10 Gbps wavelengths within ~3-5 years; and Terabit Switching within 5-8 years would enable “Petascale Grids with Terabyte transactions”, to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.
The Move to OGSA and then Managed Integration Systems : The Move to OGSA and then Managed Integration Systems Increased functionality,
standardization Time Custom
solutions Open Grid
Services Arch GGF: OGSI, …
(+ OASIS, W3C)
Multiple implementations,
including Globus Toolkit Web services + … Globus Toolkit Defacto standards
GGF: GridFTP, GSI X.509,
LDAP,
FTP, … App-specific
Services ~Integrated Systems Stateful; Managed
Dynamic Distributed Services Architecture (DDSA) :
“Station Server” Services-engines at sites host “Dynamic Services”
Auto-discovering, Collaborative
Servers interconnect dynamically; form a robust fabric in which mobile agents travel, with a payload of (analysis) tasks
Service Agents: Goal-Oriented, Autonomous, Adaptive
Maintain State: Automatic “Event” notification
Adaptable to Web services: OGSA; many platforms & working environments (also mobile) Dynamic Distributed Services Architecture (DDSA) Caltech/UPB (Romania)/NUST (Pakistan) Collaboration See http://monalisa.cacr.caltech.edu http://diamonds.cacr.caltech.edu
Slide18 : By I. Legrand (Caltech) et al.
Monitors Clusters, Networks
Agent-based Dynamic information / resource discovery mechanisms
Implemented in
Java/Jini; SNMP
WDSL / SOAP with UDDI
Global System Optimizations
> 50 Sites and Growing
Being deployed in Abilene; through the Internet2 E2EPi
MonALISA (Java) 3D Interface MonaLisa: A Globally Scalable Grid Monitoring System
UltraLight Collaboration:http://ultralight.caltech.edu : First Integrated packet switched and circuit switched hybrid experimental research network; leveraging transoceanic R&D network partnerships
NLR Wave: 10 GbE (LAN-PHY) wave across the US; (G)MPLS managed
Optical paths transatlantic; extensions to Japan, Taiwan, Brazil
End-to-end monitoring; Realtime tracking and optimization; Dynamic bandwidth provisioning,
Agent-based services spanning all layers of the system, from the optical cross-connects to the applications. UltraLight Collaboration: http://ultralight.caltech.edu Caltech, UF, FIU, UMich, SLAC,FNAL, MIT/Haystack, CERN, UERJ(Rio), NLR, CENIC, UCAID, Translight, UKLight, Netherlight, UvA, UCLondon, KEK, Taiwan
Cisco, Level(3)
Grid Analysis Environment:R&D Led by Caltech HEP : Grid Analysis Environment: R&D Led by Caltech HEP Building a GAE is the “Acid Test” for Grids; and is crucial for LHC experiments
Large, Diverse, Distributed Community of users
Support for hundreds to thousands of analysis tasks, shared among dozens of sites
Widely varying task requirements and priorities
Need for Priority Schemes, robust authentication and Security
Operation in a severely resource-limited and policy- constrained global system
Dominated by collaboration policy and strategy, for resource-usage and priorities
GAE is where the physics gets done
Where physicists learn to collaborate on analysis, across the country, and across world-regions
Grid Enabled Analysis: User View of a Collaborative Desktop : Grid Enabled Analysis: User View of a Collaborative Desktop Physics analysis requires varying levels of interactivity, from “instantaneous response” to “background” to “batch mode”
Requires adapting the classical Grid “batch-oriented” view to a services-oriented view, with tasks monitored and tracked
Use Web Services, leveraging wide availability of commodity tools and protocols: adaptable to a variety of platforms
Implement the Clarens Web Services layer as mediator between authenticated clients and services as part of CAIGEE architecture
Clarens presents a consistent analysis environment to users, based on WSDL/SOAP or XML RPCs, with PKI-based authentication for Security PDA ROOT Clarens
External Services MonaLisa Browser Iguana VO Management Authentication Authorization Logging Key Escrow File Access Shell Storage Resource Broker CMS ORCA/COBRA Cluster Schedulers ATLAS DIAL Griphyn VDT MonaLisa Monitoring
VRVS on Windows : VRVS on Windows VRVS (Version 3)
Meeting in 8 Time Zones 73 Reflectors
Deployed Worldwide
Users in 83 Countries
Caltech HEP Group CONCLUSIONS : Caltech HEP Group CONCLUSIONS Caltech has been a leading inventor/developer of systems for Global VOs, spanning multiple technology generations
International Wide Area Networks Since 1982; Global role from 2000
Collaborative Systems (VRVS) Since 1994
Distributed Databases since 1996
The Data Grid Hierarchy and Dynamic Distributed Systems Since 1999
Work on Advanced Network Protocols from 2000
A Focus on the Grid-enabled Analysis Environment for Data Intensive Science Since 2001
Strong HEP/CACR/CS-EE Partnership [Bunn, Low]
Driven by the Search for New Physics at the TeV Energy Scale at the LHC
Unprecedented Challenges in Access, Processing, and Analysis of Petabyte to Exabyte Data; and Policy-Driven Global Resource Sharing
Broad Applicability Within and Beyond Science: Managed, Global Systems for Data Intensive and/or Realtime Applications
AOL Site Team: Many Apparent Synergies with Caltech Team: Areas of Interest, Technical Goals and Development Directions
Some Extra Slides Follow : Some Extra Slides Follow
U.S. CMS is Progressing:400+ Members, 38 Institutions : U.S. CMS is Progressing: 400+ Members, 38 Institutions New in 2002/3: FIU, Yale Caltech has Led the US CMS Collaboration Board Since 1998; 3rd Term as Chair Through 2004 +
Physics Potential of CMS:We Need to Be Ready on Day 1 : At L0=2x1033 cm-2s-1
1 day ~ 60 pb-1
1 month ~ 2 fb-1
1 year ~ 20 fb-1 3 months 1 year MH = 130 GeV LHCC: CMS detector is well optimized for LHC physics.
To fully exploit the physics potential of the LHC for discovery we will start with a “COMPLETE”* CMS detector.
In particular a complete ECAL from the beginning for the low mass Hgg channel.
Physics Potential of CMS: We Need to Be Ready on Day 1
Caltech Role: Precision e/g Physics With CMS : H 0 gg In the CMS Precision ECAL Caltech Role: Precision e/g Physics With CMS Crystal Quality in Mass Production
Precision Laser Monitoring
Study of Calibration Physics Channels
Inclusive J,U, W, Z
Realistic H 0 gg Background Studies: 2.5 M Events
Signal/Bgd Optimization: g/Jet Separation
Vertex Reconstruction with Associated Tracks
Photon Reconstruction: Pixels + ECAL + Tracker
Optimization of Tracker Layout
Higher Level Trig. On Isolated g
ECAL Design: Crystal Sizes Cost- Optimized for g/Jet Separation
CMS SUSY Reach :
The LHC could establish the existence of SUSY; study the masses and decays of SUSY particles
The cosmologically interesting region of the SUSY space could be covered in the first weeks of LHC running.
The 1.5 to 2 TeV mass range for squarks and gluinos could be covered within one year at low luminosity. CMS SUSY Reach
HCAL Barrels Done: Installing HCAL Endcap and Muon CSCs in SX5 : HCAL Barrels Done: Installing HCAL Endcap and Muon CSCs in SX5 36 Muon CSCs successfully installed on YE-2,3. Avg. rate 6/day (planned 4/day). Cabling+commissioning. HE-1 complete, HE+ will be mounted in Q4 2003
UltraLight: Proposed to the NSF/EIN Program : UltraLight: Proposed to the NSF/EIN Program http://ultralight.caltech.edu First “Hybrid” packet-switched and circuit-switched optical network
Trans-US wavelength riding on NLR: LA-SNV-CHI-JAX
Leveraging advanced research & production networks
USLIC/DataTAG, SURFnet/NLlight, UKLight, Abilene, CA*net4
Dark fiber to CIT, SLAC, FNAL, UMich; Florida Light Rail
Intercont’l extensions: Rio de Janeiro, Tokyo, Taiwan
Three Flagship Applications
HENP: TByte to PByte “block” data transfers at 1-10+ Gbps
eVLBI: Real time data streams at 1 to several Gbps
Radiation Oncology: GByte image “bursts” delivered in ~1 second
A traffic mix presenting a variety of network challenges
UltraLight: An Ultra-scale Optical Network Laboratory for Next Generation Science : UltraLight: An Ultra-scale Optical Network Laboratory for Next Generation Science http://ultralight.caltech.edu Ultrascale protocols and MPLS: Classes of service used to share primary 10G efficiently
Scheduled or sudden “overflow” demands handled by provisioning additional wavelengths:
GE, N*GE, and eventually 10 GE
Use path diversity, e.g. across the Atlantic, Canada
Move to multiple 10G ’s (leveraged) by 2005-6
Unique feature: agent-based, end-to-end monitored, dynamically provisioned mode of operation
Agent services span all layers of the system; Communication application characteristics and requirements to
The protocol stacks, MPLS class provisioning and the optical cross-connects
Dynamic responses help manage traffic flow
History – One large Research Site : History – One large Research Site Current Traffic to ~400 Mbps; Projections: 0.5 to 24 Tbps by ~2012 Much of the Traffic: SLAC IN2P3/RAL/INFN; via ESnet+France; Abilene+CERN
VRVS Core Architecture : VRVS Core Architecture VRVS combined the best of all standards and products in one unique architecture
Multi-platform and multi-protocol architecture
MONARC/SONN: 3 Regional Centres Learning to Export Jobs (Day 9) : MONARC/SONN: 3 Regional Centres Learning to Export Jobs (Day 9) NUST
20 CPUs CERN 30 CPUs CALTECH
25 CPUs 1MB/s ; 150 ms RTT 1.2 MB/s
150 ms RTT 0.8 MB/s
200 ms RTT Day = 9 = 0.73 = 0.66 = 0.83 Building the LHC Computing Model: Focus on New Persistency Simulations for Strategy and System Services Development I. Legrand, F. van Lingen
GAE Collaboration DesktopExample : GAE Collaboration Desktop Example Four-screen Analysis Desktop 4 Flat Panels: 5120 X 1024
Driven by a single server and single graphics card
Allows simultaneous work on:
Traditional analysis tools (e.g. ROOT)
Software development
Event displays (e.g. IGUANA)
MonALISA monitoring displays; Other “Grid Views”
Job-progress Views
Persistent collaboration (e.g. VRVS; shared windows)
Online event or detector monitoring
Web browsing, email
GAE Workshop: Components and Services; GAE Task Lifecycle : GAE Workshop: Components and Services; GAE Task Lifecycle GAE Components & Services
VO authorization/management
Software Install/Config. Tools
Virtual Data System
Data Service Catalog (Metadata)
Replica Management Service
Data Mover/Delivery Service
[NEW]
Planners (Abstract; Concrete)
Job Execution Service
Data Collection Services – couples analysis selections/expressions to datasets/replicas
Estimators
Events; Strategic Error Handling; Adaptive Optimization Grid-Based Analysis Task’s Life:
Authentication
DATA SELECTION
Query/Dataset Selection/??
Session Start
Establish Slave/server config.
Data Placement
Resource Broker for resource assignment
Or static configuration
Availability/Cost Estimation
Launch masters/slaves/Grid Execution services
ESTABLISH TASK – Initiate & Software Specification/Install
Execute (with dynamic Job Control)
Report Status (Logging/Metadata/partial results)
Task Completion (Cleanup, data merge/archive/catalog)
Task End
Task Save
LOOP to ESTABLISH TASK
LOOP to DATA SELECTION
Grid Enabled Analysis Architecture : Grid Enabled Analysis Architecture Michael Thomas July, 2003
HENP Networks and Grids; UltraLight : HENP Networks and Grids; UltraLight The network backbones and major links used by major HENP projects advanced rapidly in 2001-2
To the 2.5-10 G range in 15 months; much faster than Moore’s Law
Continuing a trend: a factor ~1000 improvement per decade
Network costs continue to fall rapidly
Transition to a community-owned and operated infrastructure for research and education is beginning with (NLR, USAWaves)
HENP (Caltech/DataTAG/SLAC/LANL Team) is learning to use 1-10 Gbps networks effectively over long distances
Unique Fall Demos: to 10 Gbps flows over 10k km
A new HENP and DOE Roadmap: Gbps to Tbps links in ~10 Years
UltraLight: A hybrid packet-switched and circuit-switched network: ultrascale protocols, MPLS and dynamic provisioning
Sharing, augmenting NLR and internat’l optical infrastructures
May be a cost-effective model for future HENP, DOE networks