logging in or signing up avery lsu 2005feb03 Bruno Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 27 Category: Travel/ Places.. License: All Rights Reserved Like it (0) Dislike it (0) Added: March 25, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Paul Avery University of Florida avery@phys.ufl.edu Grid3 and Open Science Grid Mardi Gras Conference Louisiana State University Baton Rouge February 3, 2005Data Grids & Collaborative Research: Data Grids & Collaborative Research Scientific discovery increasingly dependent on collaboration Computationally & data intensive analyses Resources and collaborations distributed internationally Dominant factor: data growth (1 Petabyte = 1000 TB) 2000 ~0.5 Petabyte 2005 ~10 Petabytes 2012 ~100 Petabytes 2018 ~1000 Petabytes? Drives need for powerful linked resources: “Data Grids” Computation Massive, distributed CPU Data storage and access Distributed hi-speed disk and tape Data movement International optical networks Collaborative research and Data Grids Data discovery, resource sharing, distributed analysis, etc. How to collect, manage, access and interpret this quantity of data?Data Intensive Disciplines: Data Intensive Disciplines High energy & nuclear physics Belle, BaBar, Tevatron, RHIC, JLAB Large Hadron Collider (LHC) Astronomy Digital sky surveys, “Virtual” Observatories VLBI arrays: multiple- Gbps data streams Gravity wave searches LIGO, GEO, VIRGO, TAMA, ACIGA, … Earth and climate systems Earth Observation, climate modeling, oceanography, … Biology, medicine, imaging Genome databases Proteomics (protein structure & interactions, drug delivery, …) High-res brain scans (1-10m, time dependent)Background: Data Grid Projects: U.S. Projects GriPhyN (NSF) iVDGL (NSF) Particle Physics Data Grid (DOE) Open Science Grid UltraLight TeraGrid (NSF) DOE Science Grid (DOE) NEESgrid (NSF) NSF Middleware Initiative (NSF) Background: Data Grid Projects EU, Asia projects EGEE (EU) LCG (CERN) DataGrid EU national Projects DataTAG (EU) CrossGrid (EU) GridLab (EU) Japanese, Korea Projects Not exclusively HEP (but many driven/led by HEP) Many 10s x $M brought into the field Large impact on other sciences, education HEP led projectsU.S. “Trillium” Grid Consortium: U.S. “Trillium” Grid Consortium Trillium = PPDG + GriPhyN + iVDGL Particle Physics Data Grid: $12M (DOE) (1999 – 2004+) GriPhyN: $12M (NSF) (2000 – 2005) iVDGL: $14M (NSF) (2001 – 2006) Basic composition (~150 people) PPDG: 4 universities, 6 labs GriPhyN: 12 universities, SDSC, 3 labs iVDGL: 18 universities, SDSC, 4 labs, foreign partners Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO Complementarity of projects GriPhyN: CS research, Virtual Data Toolkit (VDT) development PPDG: “End to end” Grid services, monitoring, analysis iVDGL: Grid laboratory deployment using VDT Experiments provide frontier challenges Unified entity when collaborating internationallyGoal: Peta-scale Data Grids forGlobal Science: Goal: Peta-scale Data Grids for Global Science Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Distributed resources (code, storage, CPUs, networks) Resource Management Services Security and Policy Services Other Grid Services Interactive User Tools Production Team Single Researcher Workgroups Raw data source PetaOps Petabytes PerformanceTrillium Science Drivers: Trillium Science Drivers Experiments at Large Hadron Collider 100s of Petabytes 2007 - ? High Energy & Nuclear Physics expts ~1 Petabyte (1000 TB) 1997 – present LIGO (gravity wave search) 100s of Terabytes 2002 – present Sloan Digital Sky Survey 10s of Terabytes 2001 – present Future Grid resources Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN: Sloan Digital Sky Survey (SDSS) Using Virtual Data in GriPhyNThe LIGO Scientific Collaboration (LSC)and the LIGO Grid: The LIGO Scientific Collaboration (LSC) and the LIGO Grid LIGO Grid: 6 US sites iVDGL has enabled LSC to establish a persistent production gridLarge Hadron Collider (LHC) @ CERN: Search for Origin of Mass & Supersymmetry (2007 – ?) TOTEM LHCb ALICE 27 km Tunnel in Switzerland & France CMS ATLAS Large Hadron Collider (LHC) @ CERNLHC Data Rates: Detector to Storage: LHC Data Rates: Detector to Storage Level 1 Trigger: Special Hardware 40 MHz 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 0.1 – 1.5 GB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage (+ simulated data) Physics filtering ~TBytes/secComplexity: Higgs Decay into 4 muons: 109 collisions/sec, selectivity: 1 in 1013 Complexity: Higgs Decay into 4 muonsLHC: Petascale Global Science: LHC: Petascale Global Science Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ CountriesLHC Global Collaborations: ATLAS CMS LHC Global Collaborations 1000 – 4000 per experiment USA is 20 – 25% of total LHC Global Data Grid (2007+): CMS Experiment LHC Global Data Grid (2007+) Online System CERN Computer Center 0.1 - 1.5 GB/s >10 Gb/s 10-40 Gb/s 2.5-10 Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Physics caches PCs 5000 physicists, 60 countries 10s of Petabytes/yr by 2008 1000 Petabytes in < 10 yrs? Tier 4CMS: Grid Enabled Analysis (GAE) Architecture: CMS: Grid Enabled Analysis (GAE) Architecture Scheduler Catalogs Grid Services Web Server Execution Priority Manager Grid Wide Execution Service Data Manage- ment Fully- Concrete Planner Fully- Abstract Planner Analysis Client Virtual Data Replica Applications Monitoring Partially- Abstract Planner Metadata HTTP, SOAP, XML-RPC Chimera Sphinx MonALISA ROOT (analysis tool) Python Cojac (detector viz)/ IGUANA (cms viz) Clarens MCRunjob BOSS RefDB POOL ORCA ROOT FAMOS VDT-Server MOPDB Discovery ACL management Cert. based access Clients talk standard protocols to “Grid Services Web Server” Simple Web service API allows simple or complex analysis clients Typical clients: ROOT, Web Browser, …. Clarens portal hides complexity Key features: Global Scheduler, Catalogs, Monitoring, Grid-wide Execution service Analysis ClientCollaborative Research by Globally Distributed Teams: Collaborative Research by Globally Distributed Teams Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flowsTrillium Grid Tools: Virtual Data Toolkit: Trillium Grid Tools: Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool (37 computers) … Build Test Package VDT Build Contributors (VDS, etc.) Build Pacman cache RPMs Binaries Binaries Binaries Test Use NMI processes laterVDT Growth Over 3 Years: VDT Growth Over 3 Years VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.11 Grid3 VDT 1.1.8 First real use by LCG VDT 1.1.14 May 10Packaging of Grid Software: Pacman: Language: define software environments Interpreter: create, install, configure, update, verify environments Version 3.0.2 released Jan. 2005 LCG/Scram ATLAS/CMT CMS DPE/tar/make LIGO/tar/make OpenSource/tar/make Globus/GPT NPACI/TeraGrid/tar/make D0/UPS-UPD Commercial/tar/make Combine and manage software from arbitrary sources. % pacman –get iVDGL:Grid3 “1 button install”: Reduce burden on administrators Remote experts define installation/ config/updating for everyone at once Packaging of Grid Software: PacmanCollaborative RelationshipsCS Perspective: Collaborative Relationships CS Perspective Computer Science Research Virtual Data Toolkit Partner science projects Partner networking projects Partner outreach projects Larger Science Community Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC Experiments, QuarkNet, CHEPREO, Dig. Divide Production Deployment Tech Transfer Techniques & software Requirements Prototyping & experiments Other linkages Work force CS researchers Industry U.S.Grids Int’l OutreachSlide22: Grid3: An Operational National Grid 35 sites, 3500 CPUs: Universities + 4 national labs Part of LHC Grid Running since October 2003 Applications in HEP, LIGO, SDSS, Genomics, CS http://www.ivdgl.org/grid3Grid3 Applications: Grid3 Applications High energy physics US-ATLAS analysis (DIAL), US-ATLAS GEANT3 simulation (GCE) US-CMS GEANT4 simulation (MOP) BTeV simulation Gravity waves LIGO: blind search for continuous sources Digital astronomy SDSS: cluster finding (maxBcg) Bioinformatics Bio-molecular analysis (SnB) Genome analysis (GADU/Gnare) CS Demonstrators Job Exerciser, GridFTP, NetLogger-grid2003Grid3 Shared Use Over 6 months: Grid3 Shared Use Over 6 months Sep 10 Usage: CPUs Grid3 Open Science Grid: Grid3 Open Science Grid Iteratively build & extend Grid3, to national infrastructure Shared resources, benefiting broad set of disciplines Realization of the critical need for operations More formal organization needed because of scale Grid3 Open Science Grid Build OSG from laboratories, universities, campus grids, etc. Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc. Further develop OSG Partnerships and contributions from other sciences, universities Incorporation of advanced networking Focus on general services, operations, end-to-end performanceSlide26: http://www.opensciencegrid.orgOpen Science Grid Basics: Open Science Grid Basics OSG infrastructure A large CPU & storage Grid infrastructure supporting science Grid middleware based on Virtual Data Toolkit (VDT) Loosely coupled, consistent infrastructure: “Grid of Grids” Emphasis on “end to end” services for applications OSG collaboration builds on Grid3 Computer and application scientists Facility, technology and resource providers Grid3 OSG-0 OSG-1 OSG-2 … Fundamental unit is the Virtual Organization (VO) E.g., an experimental collaboration, a research group, a class Simplifies organization and logistics Distributed ownership of resources Local facility policies, priorities, and capabilities must be supportedOSG Integration:Applications, Infrastructure, Facilities: OSG Integration: Applications, Infrastructure, FacilitiesOSG Organization: Enterprise Technical Groups Research Grid Projects VOs Researchers Sites Service Providers Universities, Labs Advisory Committee Core OSG Staff (few FTEs, manager) OSG Council (all members above a certain threshold, Chair, officers) Executive Board (8-15 representatives Chair, Officers) OSG OrganizationOSG Technical Groups and Activities: OSG Technical Groups and Activities Technical Groups address and coordinate a technical area Propose and carry out activities related to their given areas Liaise & collaborate with other peer projects (U.S. & international) Participate in relevant standards organizations. Chairs participate in Blueprint, Grid Integration and Deployment activities Activities are well-defined, scoped set of tasks contributing to the OSG Each Activity has deliverables and a plan … is self-organized and operated … is overseen & sponsored by one or more Technical GroupsOSG Technical Groups (7 currently): OSG Technical Groups (7 currently) Governance Charter, organization, by-laws, agreements, formal processes Policy VO & site policy, authorization, priorities, privilege & access rights Security Common security principles, security infrastructure Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting Storage Storage services at remote sites, interfaces, interoperability Support Centers Infrastructure and services for user support, helpdesk, trouble ticket Education and Outreach Networking?OSG Activities (5 currently): OSG Activities (5 currently) Blueprint Defining principles and best practices for OSG Deployment Deployment of resources & services Incident Response Plans and procedures for responding to security incidents Integration Testing & validating & integrating new services and technologies Data Resource Management (DRM) Deployment of specific Storage Resource Management technologyOSG Short Term Plans: OSG Short Term Plans Maintain Grid3 operations In parallel with extending Grid3 to OSG OSG technology advances for Spring 2005 deployment Add full Storage Elements Extend Authorization services Extend Data Management services Interface to sub-Grids Extend monitoring, testing, accounting Add new VOs + OSG-wide VO Services Add Discovery Service Service challenges & collaboration with the LCG Make the switch to “Open Science Grid” in Spring 2005Open Science Grid Meetings: Open Science Grid Meetings Sep. 17, 2003 @ NSF Strong interest of NSF education people Jan. 12, 2004 @ Fermilab Initial stakeholders meeting, 1st discussion of governance May 20-21, 2004 @ Univ. of Chicago Joint Trillium Steering meeting to define OSG program July 2004 @ Wisconsin First attempt to define OSG Blueprint (document) Sep. 9-10, 2004 @ Harvard Major OSG workshop: Technical, Governance, Sciences Dec. 15-17, 2004 @ UCSD Major meeting for Technical Groups Feb. 15-17, 2005 @ U Chicago Integration meetingSlide35: Networks Networks and Grids for Global Science: Networks and Grids for Global Science Network backbones and major links are advancing rapidly To the 10G range in < 3 years; faster than Moore’s Law New HENP and DOE Roadmaps: a factor ~1000 BW Growth per decade We are learning to use long distance 10 Gbps networks effectively 2004 Developments: to 7 - 7.5 Gbps flows with TCP over 16-25 kkm Transition to community-operated optical R&E networks US, CA, NL, PL, CZ, SK, KR, JP … Emergence of a new generation of “hybrid” optical networks We must work to close to digital divide To allow scientists in all world regions to take part in discoveries Regional, last mile, local bottlenecks and compromises in network quality are now on the critical path Important examples on the road to closing the digital divide CLARA, CHEPREO, and the Brazil HEPGrid in Latin America Optical networking in Central and Southeast Europe APAN Links in the Asia Pacific: GLORIAD and TEIN Leadership and Outreach: HEP Groups in Europe, US, Japan, & KoreaHEP Bandwidth Roadmap (Gb/s): HEP Bandwidth Roadmap (Gb/s) Evolving Science Requirements for Networks (DOE High Perf. Network Workshop): Evolving Science Requirements for Networks (DOE High Perf. Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/UltraLight: Advanced Networkingin Applications: UltraLight: Advanced Networking in Applications 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL Int’l partners Level(3), Cisco, NLR Funded by ITR2004Slide40: Education and Outreach Grids and the Digital DivideRio de Janeiro, Feb. 16-20, 2004: NEWS: Bulletin: ONE TWO WELCOME BULLETIN General Information Registration Travel Information Hotel Registration Participant List How to Get UERJ/Hotel Computer Accounts Useful Phone Numbers Program Contact us: Secretariat Chairmen Grids and the Digital Divide Rio de Janeiro, Feb. 16-20, 2004 Background World Summit on Information Society HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes Global collaborations, Grids and addressing the Digital Divide Next meeting: May 2005 (Korea) http://www.uerj.br/lishep2004Second Digital Divide Grid Meeting: Second Digital Divide Grid Meeting Prof. Dongchul Son Center for High Energy Physics Kyungpook National University International Workshop on HEP Networking, Grids and Digital Divide Issues for Global e-Science May 23-27, 2005 Daegu, KoreaiVDGL, GriPhyN Education / Outreach: iVDGL, GriPhyN Education / Outreach Basics $200K/yr Led by UT Brownsville Workshops, portals Partnerships with CHEPREO, QuarkNet, …June 21-25 Grid Summer School: June 21-25 Grid Summer School First of its kind in the U.S. (South Padre Island, Texas) 36 students, diverse origins and types (M, F, MSIs, etc) Marks new direction for U.S. Grid efforts First attempt to systematically train people in Grid technologies First attempt to gather relevant materials in one place Today: Students in CS and Physics Later: Students, postdocs, junior & senior scientists Reaching a wider audience Put lectures, exercises, video, on the web More tutorials, perhaps 2+/year Dedicated resources for remote tutorials Create “Grid book”, e.g. Georgia Tech New funding opportunities NSF: new training & education programsCHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University: CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S. America) Funded September 2003 $4M initially (3 years) 4 NSF Directorates!QuarkNet/GriPhyN e-Lab Project: QuarkNet/GriPhyN e-Lab ProjectChiron/QuarkNet Architecture: Chiron/QuarkNet ArchitectureMuon Lifetime Analysis Workflow: Muon Lifetime Analysis WorkflowQuarkNet Portal Architecture: QuarkNet Portal Architecture Simpler interface for non-experts Builds on Chiron portalSummary: Summary Grids enable 21st century collaborative science Linking research communities and resources for scientific discovery Needed by LHC global collaborations pursuing “petascale” science Grid3 was an important first step in developing US Grids Value of planning, coordination, testbeds, rapid feedback Value of building & sustaining community relationships Value of learning how to operate Grid as a facility Value of delegation, services, documentation, packaging Grids drive need for advanced optical networks Grids impact education and outreach Providing technologies & resources for training, education, outreach Addressing the Digital Divide OSG: a scalable computing infrastructure for science? Strategies needed to cope with increasingly large scaleGrid Project References: Grid Project References Grid3 www.ivdgl.org/grid3 Open Science Grid www.opensciencegrid.org GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net CHEPREO www.chepreo.org UltraLight ultralight.cacr.caltech.edu Globus www.globus.org LCG www.cern.ch/lcg EU DataGrid www.eu-datagrid.org EGEE www.eu-egee.org You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
avery lsu 2005feb03 Bruno Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 27 Category: Travel/ Places.. License: All Rights Reserved Like it (0) Dislike it (0) Added: March 25, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Paul Avery University of Florida avery@phys.ufl.edu Grid3 and Open Science Grid Mardi Gras Conference Louisiana State University Baton Rouge February 3, 2005Data Grids & Collaborative Research: Data Grids & Collaborative Research Scientific discovery increasingly dependent on collaboration Computationally & data intensive analyses Resources and collaborations distributed internationally Dominant factor: data growth (1 Petabyte = 1000 TB) 2000 ~0.5 Petabyte 2005 ~10 Petabytes 2012 ~100 Petabytes 2018 ~1000 Petabytes? Drives need for powerful linked resources: “Data Grids” Computation Massive, distributed CPU Data storage and access Distributed hi-speed disk and tape Data movement International optical networks Collaborative research and Data Grids Data discovery, resource sharing, distributed analysis, etc. How to collect, manage, access and interpret this quantity of data?Data Intensive Disciplines: Data Intensive Disciplines High energy & nuclear physics Belle, BaBar, Tevatron, RHIC, JLAB Large Hadron Collider (LHC) Astronomy Digital sky surveys, “Virtual” Observatories VLBI arrays: multiple- Gbps data streams Gravity wave searches LIGO, GEO, VIRGO, TAMA, ACIGA, … Earth and climate systems Earth Observation, climate modeling, oceanography, … Biology, medicine, imaging Genome databases Proteomics (protein structure & interactions, drug delivery, …) High-res brain scans (1-10m, time dependent)Background: Data Grid Projects: U.S. Projects GriPhyN (NSF) iVDGL (NSF) Particle Physics Data Grid (DOE) Open Science Grid UltraLight TeraGrid (NSF) DOE Science Grid (DOE) NEESgrid (NSF) NSF Middleware Initiative (NSF) Background: Data Grid Projects EU, Asia projects EGEE (EU) LCG (CERN) DataGrid EU national Projects DataTAG (EU) CrossGrid (EU) GridLab (EU) Japanese, Korea Projects Not exclusively HEP (but many driven/led by HEP) Many 10s x $M brought into the field Large impact on other sciences, education HEP led projectsU.S. “Trillium” Grid Consortium: U.S. “Trillium” Grid Consortium Trillium = PPDG + GriPhyN + iVDGL Particle Physics Data Grid: $12M (DOE) (1999 – 2004+) GriPhyN: $12M (NSF) (2000 – 2005) iVDGL: $14M (NSF) (2001 – 2006) Basic composition (~150 people) PPDG: 4 universities, 6 labs GriPhyN: 12 universities, SDSC, 3 labs iVDGL: 18 universities, SDSC, 4 labs, foreign partners Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO Complementarity of projects GriPhyN: CS research, Virtual Data Toolkit (VDT) development PPDG: “End to end” Grid services, monitoring, analysis iVDGL: Grid laboratory deployment using VDT Experiments provide frontier challenges Unified entity when collaborating internationallyGoal: Peta-scale Data Grids forGlobal Science: Goal: Peta-scale Data Grids for Global Science Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Distributed resources (code, storage, CPUs, networks) Resource Management Services Security and Policy Services Other Grid Services Interactive User Tools Production Team Single Researcher Workgroups Raw data source PetaOps Petabytes PerformanceTrillium Science Drivers: Trillium Science Drivers Experiments at Large Hadron Collider 100s of Petabytes 2007 - ? High Energy & Nuclear Physics expts ~1 Petabyte (1000 TB) 1997 – present LIGO (gravity wave search) 100s of Terabytes 2002 – present Sloan Digital Sky Survey 10s of Terabytes 2001 – present Future Grid resources Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN: Sloan Digital Sky Survey (SDSS) Using Virtual Data in GriPhyNThe LIGO Scientific Collaboration (LSC)and the LIGO Grid: The LIGO Scientific Collaboration (LSC) and the LIGO Grid LIGO Grid: 6 US sites iVDGL has enabled LSC to establish a persistent production gridLarge Hadron Collider (LHC) @ CERN: Search for Origin of Mass & Supersymmetry (2007 – ?) TOTEM LHCb ALICE 27 km Tunnel in Switzerland & France CMS ATLAS Large Hadron Collider (LHC) @ CERNLHC Data Rates: Detector to Storage: LHC Data Rates: Detector to Storage Level 1 Trigger: Special Hardware 40 MHz 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 0.1 – 1.5 GB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage (+ simulated data) Physics filtering ~TBytes/secComplexity: Higgs Decay into 4 muons: 109 collisions/sec, selectivity: 1 in 1013 Complexity: Higgs Decay into 4 muonsLHC: Petascale Global Science: LHC: Petascale Global Science Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ CountriesLHC Global Collaborations: ATLAS CMS LHC Global Collaborations 1000 – 4000 per experiment USA is 20 – 25% of total LHC Global Data Grid (2007+): CMS Experiment LHC Global Data Grid (2007+) Online System CERN Computer Center 0.1 - 1.5 GB/s >10 Gb/s 10-40 Gb/s 2.5-10 Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Physics caches PCs 5000 physicists, 60 countries 10s of Petabytes/yr by 2008 1000 Petabytes in < 10 yrs? Tier 4CMS: Grid Enabled Analysis (GAE) Architecture: CMS: Grid Enabled Analysis (GAE) Architecture Scheduler Catalogs Grid Services Web Server Execution Priority Manager Grid Wide Execution Service Data Manage- ment Fully- Concrete Planner Fully- Abstract Planner Analysis Client Virtual Data Replica Applications Monitoring Partially- Abstract Planner Metadata HTTP, SOAP, XML-RPC Chimera Sphinx MonALISA ROOT (analysis tool) Python Cojac (detector viz)/ IGUANA (cms viz) Clarens MCRunjob BOSS RefDB POOL ORCA ROOT FAMOS VDT-Server MOPDB Discovery ACL management Cert. based access Clients talk standard protocols to “Grid Services Web Server” Simple Web service API allows simple or complex analysis clients Typical clients: ROOT, Web Browser, …. Clarens portal hides complexity Key features: Global Scheduler, Catalogs, Monitoring, Grid-wide Execution service Analysis ClientCollaborative Research by Globally Distributed Teams: Collaborative Research by Globally Distributed Teams Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flowsTrillium Grid Tools: Virtual Data Toolkit: Trillium Grid Tools: Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool (37 computers) … Build Test Package VDT Build Contributors (VDS, etc.) Build Pacman cache RPMs Binaries Binaries Binaries Test Use NMI processes laterVDT Growth Over 3 Years: VDT Growth Over 3 Years VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.11 Grid3 VDT 1.1.8 First real use by LCG VDT 1.1.14 May 10Packaging of Grid Software: Pacman: Language: define software environments Interpreter: create, install, configure, update, verify environments Version 3.0.2 released Jan. 2005 LCG/Scram ATLAS/CMT CMS DPE/tar/make LIGO/tar/make OpenSource/tar/make Globus/GPT NPACI/TeraGrid/tar/make D0/UPS-UPD Commercial/tar/make Combine and manage software from arbitrary sources. % pacman –get iVDGL:Grid3 “1 button install”: Reduce burden on administrators Remote experts define installation/ config/updating for everyone at once Packaging of Grid Software: PacmanCollaborative RelationshipsCS Perspective: Collaborative Relationships CS Perspective Computer Science Research Virtual Data Toolkit Partner science projects Partner networking projects Partner outreach projects Larger Science Community Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC Experiments, QuarkNet, CHEPREO, Dig. Divide Production Deployment Tech Transfer Techniques & software Requirements Prototyping & experiments Other linkages Work force CS researchers Industry U.S.Grids Int’l OutreachSlide22: Grid3: An Operational National Grid 35 sites, 3500 CPUs: Universities + 4 national labs Part of LHC Grid Running since October 2003 Applications in HEP, LIGO, SDSS, Genomics, CS http://www.ivdgl.org/grid3Grid3 Applications: Grid3 Applications High energy physics US-ATLAS analysis (DIAL), US-ATLAS GEANT3 simulation (GCE) US-CMS GEANT4 simulation (MOP) BTeV simulation Gravity waves LIGO: blind search for continuous sources Digital astronomy SDSS: cluster finding (maxBcg) Bioinformatics Bio-molecular analysis (SnB) Genome analysis (GADU/Gnare) CS Demonstrators Job Exerciser, GridFTP, NetLogger-grid2003Grid3 Shared Use Over 6 months: Grid3 Shared Use Over 6 months Sep 10 Usage: CPUs Grid3 Open Science Grid: Grid3 Open Science Grid Iteratively build & extend Grid3, to national infrastructure Shared resources, benefiting broad set of disciplines Realization of the critical need for operations More formal organization needed because of scale Grid3 Open Science Grid Build OSG from laboratories, universities, campus grids, etc. Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc. Further develop OSG Partnerships and contributions from other sciences, universities Incorporation of advanced networking Focus on general services, operations, end-to-end performanceSlide26: http://www.opensciencegrid.orgOpen Science Grid Basics: Open Science Grid Basics OSG infrastructure A large CPU & storage Grid infrastructure supporting science Grid middleware based on Virtual Data Toolkit (VDT) Loosely coupled, consistent infrastructure: “Grid of Grids” Emphasis on “end to end” services for applications OSG collaboration builds on Grid3 Computer and application scientists Facility, technology and resource providers Grid3 OSG-0 OSG-1 OSG-2 … Fundamental unit is the Virtual Organization (VO) E.g., an experimental collaboration, a research group, a class Simplifies organization and logistics Distributed ownership of resources Local facility policies, priorities, and capabilities must be supportedOSG Integration:Applications, Infrastructure, Facilities: OSG Integration: Applications, Infrastructure, FacilitiesOSG Organization: Enterprise Technical Groups Research Grid Projects VOs Researchers Sites Service Providers Universities, Labs Advisory Committee Core OSG Staff (few FTEs, manager) OSG Council (all members above a certain threshold, Chair, officers) Executive Board (8-15 representatives Chair, Officers) OSG OrganizationOSG Technical Groups and Activities: OSG Technical Groups and Activities Technical Groups address and coordinate a technical area Propose and carry out activities related to their given areas Liaise & collaborate with other peer projects (U.S. & international) Participate in relevant standards organizations. Chairs participate in Blueprint, Grid Integration and Deployment activities Activities are well-defined, scoped set of tasks contributing to the OSG Each Activity has deliverables and a plan … is self-organized and operated … is overseen & sponsored by one or more Technical GroupsOSG Technical Groups (7 currently): OSG Technical Groups (7 currently) Governance Charter, organization, by-laws, agreements, formal processes Policy VO & site policy, authorization, priorities, privilege & access rights Security Common security principles, security infrastructure Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting Storage Storage services at remote sites, interfaces, interoperability Support Centers Infrastructure and services for user support, helpdesk, trouble ticket Education and Outreach Networking?OSG Activities (5 currently): OSG Activities (5 currently) Blueprint Defining principles and best practices for OSG Deployment Deployment of resources & services Incident Response Plans and procedures for responding to security incidents Integration Testing & validating & integrating new services and technologies Data Resource Management (DRM) Deployment of specific Storage Resource Management technologyOSG Short Term Plans: OSG Short Term Plans Maintain Grid3 operations In parallel with extending Grid3 to OSG OSG technology advances for Spring 2005 deployment Add full Storage Elements Extend Authorization services Extend Data Management services Interface to sub-Grids Extend monitoring, testing, accounting Add new VOs + OSG-wide VO Services Add Discovery Service Service challenges & collaboration with the LCG Make the switch to “Open Science Grid” in Spring 2005Open Science Grid Meetings: Open Science Grid Meetings Sep. 17, 2003 @ NSF Strong interest of NSF education people Jan. 12, 2004 @ Fermilab Initial stakeholders meeting, 1st discussion of governance May 20-21, 2004 @ Univ. of Chicago Joint Trillium Steering meeting to define OSG program July 2004 @ Wisconsin First attempt to define OSG Blueprint (document) Sep. 9-10, 2004 @ Harvard Major OSG workshop: Technical, Governance, Sciences Dec. 15-17, 2004 @ UCSD Major meeting for Technical Groups Feb. 15-17, 2005 @ U Chicago Integration meetingSlide35: Networks Networks and Grids for Global Science: Networks and Grids for Global Science Network backbones and major links are advancing rapidly To the 10G range in < 3 years; faster than Moore’s Law New HENP and DOE Roadmaps: a factor ~1000 BW Growth per decade We are learning to use long distance 10 Gbps networks effectively 2004 Developments: to 7 - 7.5 Gbps flows with TCP over 16-25 kkm Transition to community-operated optical R&E networks US, CA, NL, PL, CZ, SK, KR, JP … Emergence of a new generation of “hybrid” optical networks We must work to close to digital divide To allow scientists in all world regions to take part in discoveries Regional, last mile, local bottlenecks and compromises in network quality are now on the critical path Important examples on the road to closing the digital divide CLARA, CHEPREO, and the Brazil HEPGrid in Latin America Optical networking in Central and Southeast Europe APAN Links in the Asia Pacific: GLORIAD and TEIN Leadership and Outreach: HEP Groups in Europe, US, Japan, & KoreaHEP Bandwidth Roadmap (Gb/s): HEP Bandwidth Roadmap (Gb/s) Evolving Science Requirements for Networks (DOE High Perf. Network Workshop): Evolving Science Requirements for Networks (DOE High Perf. Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/UltraLight: Advanced Networkingin Applications: UltraLight: Advanced Networking in Applications 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL Int’l partners Level(3), Cisco, NLR Funded by ITR2004Slide40: Education and Outreach Grids and the Digital DivideRio de Janeiro, Feb. 16-20, 2004: NEWS: Bulletin: ONE TWO WELCOME BULLETIN General Information Registration Travel Information Hotel Registration Participant List How to Get UERJ/Hotel Computer Accounts Useful Phone Numbers Program Contact us: Secretariat Chairmen Grids and the Digital Divide Rio de Janeiro, Feb. 16-20, 2004 Background World Summit on Information Society HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes Global collaborations, Grids and addressing the Digital Divide Next meeting: May 2005 (Korea) http://www.uerj.br/lishep2004Second Digital Divide Grid Meeting: Second Digital Divide Grid Meeting Prof. Dongchul Son Center for High Energy Physics Kyungpook National University International Workshop on HEP Networking, Grids and Digital Divide Issues for Global e-Science May 23-27, 2005 Daegu, KoreaiVDGL, GriPhyN Education / Outreach: iVDGL, GriPhyN Education / Outreach Basics $200K/yr Led by UT Brownsville Workshops, portals Partnerships with CHEPREO, QuarkNet, …June 21-25 Grid Summer School: June 21-25 Grid Summer School First of its kind in the U.S. (South Padre Island, Texas) 36 students, diverse origins and types (M, F, MSIs, etc) Marks new direction for U.S. Grid efforts First attempt to systematically train people in Grid technologies First attempt to gather relevant materials in one place Today: Students in CS and Physics Later: Students, postdocs, junior & senior scientists Reaching a wider audience Put lectures, exercises, video, on the web More tutorials, perhaps 2+/year Dedicated resources for remote tutorials Create “Grid book”, e.g. Georgia Tech New funding opportunities NSF: new training & education programsCHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University: CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S. America) Funded September 2003 $4M initially (3 years) 4 NSF Directorates!QuarkNet/GriPhyN e-Lab Project: QuarkNet/GriPhyN e-Lab ProjectChiron/QuarkNet Architecture: Chiron/QuarkNet ArchitectureMuon Lifetime Analysis Workflow: Muon Lifetime Analysis WorkflowQuarkNet Portal Architecture: QuarkNet Portal Architecture Simpler interface for non-experts Builds on Chiron portalSummary: Summary Grids enable 21st century collaborative science Linking research communities and resources for scientific discovery Needed by LHC global collaborations pursuing “petascale” science Grid3 was an important first step in developing US Grids Value of planning, coordination, testbeds, rapid feedback Value of building & sustaining community relationships Value of learning how to operate Grid as a facility Value of delegation, services, documentation, packaging Grids drive need for advanced optical networks Grids impact education and outreach Providing technologies & resources for training, education, outreach Addressing the Digital Divide OSG: a scalable computing infrastructure for science? Strategies needed to cope with increasingly large scaleGrid Project References: Grid Project References Grid3 www.ivdgl.org/grid3 Open Science Grid www.opensciencegrid.org GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net CHEPREO www.chepreo.org UltraLight ultralight.cacr.caltech.edu Globus www.globus.org LCG www.cern.ch/lcg EU DataGrid www.eu-datagrid.org EGEE www.eu-egee.org