logging in or signing up DatabaseAccessPatter ns Kiska Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 49 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Database Access Patterns on a Federation of World-Wide Computational Grids: Database Access Patterns on a Federation of World-Wide Computational Grids First DIALOGUE Workshop Applications-Driven Issues in Data Grids August 1, 2005 Columbus, OH Alexandre Vaniachine, David Malon (ANL) Pavel Nevski (BNL), Yulia Shapiro (CERN)Outline: Outline Applications Domain: LHC and ATLAS increasingly complex data models ATLAS Data Challenges: General issues encountered: Increased fluctuations in database server load Connections count limitations Application-side solution: database client library not domain specificIntroduction: Introduction The Large Hadron Collider (LHC) at the CERN Laboratory in Switzerland will be the largest scientific instrument in the world beginning operations in 2007 This facility will provide the opportunity for new discoveries in particle physicsLHC/CERN: Mont Blanc, 4810 m Geneva LHC/CERNATLAS Experiment: ATLAS Experiment 2000 Scientists 150 Institutes 34 Countries D = 25 m L = 46 m Weight 7000 Ton Figure for scaleUnderground Cavern: Underground Cavern Scientists working in a hard hat area Tile Calorimeter Undergoes TestingFirst Underground Events: First Underground Events ATLAS barrel Tile calorimeter has recorded its first events underground using a cosmic ray trigger, as part of the detector commissioning program ATLAS Data Challenges: ATLAS Data Challenges To address petabyte-scale data processing challenges LHC experiments are deploying distributed data management solutions Besides the file-based data, LHC applications require access to terabytes of data store in relational databases In preparation for data taking ATLAS experiment run a series of large-scale computational exercises – Data Challenges - validating distributed data grids solutions under development ATLAS Data Challenges run on a world-wide federation of computational grids harnessing the power of more than twenty thousand processorsATLAS Federation of Grids: ATLAS Federation of Grids 14 kCPU 140 sites 5.7 kCPU 49 sites 3.5 kCPU 30 sites July 20: 15 kCPU 50 sites Spread of Grid Production: Spread of Grid Production Most recent production period: 84 sites in 22 countries U.S. CERNDatabases and the Grids: Databases and the GridsBeyond the Grid Infrastructure: Beyond the Grid Infrastructure Experience from deployment of grid technologies in a HEP production environment of ATLAS Data Challenges demonstrated that a naïve view of the grid as a p2p system is inadequate Operations show that just a single database on top is not enough to run the production system effectively A hyperinfrastructure of databases below raises the production system efficiency considerably Thus a hyperinfrastructure of databases on the grid plays a dual role: built-in part of the middleware (monitoring, catalogs, etc.) distributed infrastructure of the production system necessary to run the scatter-gather data processing applications on the gridData Mining of Operations: Data Mining of Operations The data-mining of the collected operations data reveals a striking feature – a very high degree of correlations between the failures: if the job submitted to some cluster failed, there is a high probability that a next job submitted to the cluster would fail too if the submit host failed, all the jobs scattered over different clusters will fail too Taking these correlations into account is not yet automated by the grid middleware That is why production databases and grid monitoring data that are providing immediate feedback on the Data Challenge operations to the production operators is very important for efficient utilization of the Grid capacitiesIncreased Fluctuations: Increased Fluctuations Among the database performance issues encountered is the increase in fluctuations in database servers’ workloads due to the chaotic nature of grid computations The observed fluctuations in database access patterns are of general nature and must be addressed by grid middleware solutionsScalability Challenge: Scalability Challenge Database services capacities should be adequate for peak demand The chaotic nature of Grid computing increases fluctuations in demand for database servicesProduction Rate Growth: Production Rate Growth CERN Database Capacities BottleneckJobs Load: Jobs Load No apparent correlations between jobs and database server load fluctuations Job failures?ATLAS Efficiency on LCG: ATLAS Efficiency on LCG Efficiency fluctuates because clusters are “large error amplifiers”?ATLAS Efficiency on NorduGrid : ATLAS Efficiency on NorduGrid Fluctuations are limited when efficiency is close to the 100% boundary ATLAS Efficiency on Grid3 : ATLAS Efficiency on Grid3 ATLAS achieved highest overall production efficiency on Grid3 (now OSG) A Side Note on a New Computing Paradigm: A Side Note on a New Computing Paradigm In comparison to the old centralized computing Grid efficiency may look small Splitting of large computational task into smaller tasks (jobs) in Grid computing paradigm is similar to the splitting of a large file into smaller TCP/IP packets during the FTP data transfer Do you know how many TCP/IP packets were lost during the file transfer? Do you even care? But, of course, somebody does – watch for the emergence of a new profession in Grid computing, like the network engineer who cares about the lost TCP/IP packetsLimiting Resource: Limiting Resource In some cases the connections count happens to be the limiting resource Default shell configuration limits max open files per process resulting in server connections count limited to a 1,000 per server Depending on the Linux kernel version the limit can be reconfigured to 4,000 or 10,000Complex Data Access Patterns: Complex Data Access Patterns Athena Applications Online Applications User Applications D a t a b a s e C o n n e c t i o n s Database ServersClient Library: To improve robustness of database access in a data grid environment we propose the application-side solution – a software component abstracting the database and/or middleware connectivity concerns in a generalized database client library Client LibrarySlide25: Server Indirection One of lessons learnt in ATLAS Data Challenges is that the database server address should NOT be hardwired in data processing transformations The logical-physical indirection for database servers is now introduced in ATLAS Similar to the logical-physical file Replica Location Service indirection of the Grid file catalogs Being adopted by common LHC projectConnection Service: Connection Service DBConnectionSvc implements application service for resolving logical-physical mapping and access to connection management library Logical-Physical Resolution can be done against the local file updated from Catalogue on configurable time period or directly against the Catalogue residing on Oracle DB Logical-Physical Mapping Status and Plans: Logical-Physical Mapping Status and Plans Logical-Physical Catalogue will be replicated and user can provide list of possible Catalogues for failover scenario Catalogue represents a table with: mappings of logical DB name and its replica. Server names and its redirection Work is in early stage to provide a Logical-physical mapping as a web service, based on load balancing and client’s physical location as a first prototype.Connection Management: Connection Management This module provides client-side connection management capabilities including: connection pooling logical-physical mapping for database connection string implements policies of retries and connection timeout failover to replicaSummary: Summary The chaotic nature Grid computing increases fluctuations in database services demand In high energy physics domain applications-driven issues require changes in both: Client technology Server technology see other talk: “High-performance Database Access Technologies for Computational Grids” You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
DatabaseAccessPatter ns Kiska Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 49 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Database Access Patterns on a Federation of World-Wide Computational Grids: Database Access Patterns on a Federation of World-Wide Computational Grids First DIALOGUE Workshop Applications-Driven Issues in Data Grids August 1, 2005 Columbus, OH Alexandre Vaniachine, David Malon (ANL) Pavel Nevski (BNL), Yulia Shapiro (CERN)Outline: Outline Applications Domain: LHC and ATLAS increasingly complex data models ATLAS Data Challenges: General issues encountered: Increased fluctuations in database server load Connections count limitations Application-side solution: database client library not domain specificIntroduction: Introduction The Large Hadron Collider (LHC) at the CERN Laboratory in Switzerland will be the largest scientific instrument in the world beginning operations in 2007 This facility will provide the opportunity for new discoveries in particle physicsLHC/CERN: Mont Blanc, 4810 m Geneva LHC/CERNATLAS Experiment: ATLAS Experiment 2000 Scientists 150 Institutes 34 Countries D = 25 m L = 46 m Weight 7000 Ton Figure for scaleUnderground Cavern: Underground Cavern Scientists working in a hard hat area Tile Calorimeter Undergoes TestingFirst Underground Events: First Underground Events ATLAS barrel Tile calorimeter has recorded its first events underground using a cosmic ray trigger, as part of the detector commissioning program ATLAS Data Challenges: ATLAS Data Challenges To address petabyte-scale data processing challenges LHC experiments are deploying distributed data management solutions Besides the file-based data, LHC applications require access to terabytes of data store in relational databases In preparation for data taking ATLAS experiment run a series of large-scale computational exercises – Data Challenges - validating distributed data grids solutions under development ATLAS Data Challenges run on a world-wide federation of computational grids harnessing the power of more than twenty thousand processorsATLAS Federation of Grids: ATLAS Federation of Grids 14 kCPU 140 sites 5.7 kCPU 49 sites 3.5 kCPU 30 sites July 20: 15 kCPU 50 sites Spread of Grid Production: Spread of Grid Production Most recent production period: 84 sites in 22 countries U.S. CERNDatabases and the Grids: Databases and the GridsBeyond the Grid Infrastructure: Beyond the Grid Infrastructure Experience from deployment of grid technologies in a HEP production environment of ATLAS Data Challenges demonstrated that a naïve view of the grid as a p2p system is inadequate Operations show that just a single database on top is not enough to run the production system effectively A hyperinfrastructure of databases below raises the production system efficiency considerably Thus a hyperinfrastructure of databases on the grid plays a dual role: built-in part of the middleware (monitoring, catalogs, etc.) distributed infrastructure of the production system necessary to run the scatter-gather data processing applications on the gridData Mining of Operations: Data Mining of Operations The data-mining of the collected operations data reveals a striking feature – a very high degree of correlations between the failures: if the job submitted to some cluster failed, there is a high probability that a next job submitted to the cluster would fail too if the submit host failed, all the jobs scattered over different clusters will fail too Taking these correlations into account is not yet automated by the grid middleware That is why production databases and grid monitoring data that are providing immediate feedback on the Data Challenge operations to the production operators is very important for efficient utilization of the Grid capacitiesIncreased Fluctuations: Increased Fluctuations Among the database performance issues encountered is the increase in fluctuations in database servers’ workloads due to the chaotic nature of grid computations The observed fluctuations in database access patterns are of general nature and must be addressed by grid middleware solutionsScalability Challenge: Scalability Challenge Database services capacities should be adequate for peak demand The chaotic nature of Grid computing increases fluctuations in demand for database servicesProduction Rate Growth: Production Rate Growth CERN Database Capacities BottleneckJobs Load: Jobs Load No apparent correlations between jobs and database server load fluctuations Job failures?ATLAS Efficiency on LCG: ATLAS Efficiency on LCG Efficiency fluctuates because clusters are “large error amplifiers”?ATLAS Efficiency on NorduGrid : ATLAS Efficiency on NorduGrid Fluctuations are limited when efficiency is close to the 100% boundary ATLAS Efficiency on Grid3 : ATLAS Efficiency on Grid3 ATLAS achieved highest overall production efficiency on Grid3 (now OSG) A Side Note on a New Computing Paradigm: A Side Note on a New Computing Paradigm In comparison to the old centralized computing Grid efficiency may look small Splitting of large computational task into smaller tasks (jobs) in Grid computing paradigm is similar to the splitting of a large file into smaller TCP/IP packets during the FTP data transfer Do you know how many TCP/IP packets were lost during the file transfer? Do you even care? But, of course, somebody does – watch for the emergence of a new profession in Grid computing, like the network engineer who cares about the lost TCP/IP packetsLimiting Resource: Limiting Resource In some cases the connections count happens to be the limiting resource Default shell configuration limits max open files per process resulting in server connections count limited to a 1,000 per server Depending on the Linux kernel version the limit can be reconfigured to 4,000 or 10,000Complex Data Access Patterns: Complex Data Access Patterns Athena Applications Online Applications User Applications D a t a b a s e C o n n e c t i o n s Database ServersClient Library: To improve robustness of database access in a data grid environment we propose the application-side solution – a software component abstracting the database and/or middleware connectivity concerns in a generalized database client library Client LibrarySlide25: Server Indirection One of lessons learnt in ATLAS Data Challenges is that the database server address should NOT be hardwired in data processing transformations The logical-physical indirection for database servers is now introduced in ATLAS Similar to the logical-physical file Replica Location Service indirection of the Grid file catalogs Being adopted by common LHC projectConnection Service: Connection Service DBConnectionSvc implements application service for resolving logical-physical mapping and access to connection management library Logical-Physical Resolution can be done against the local file updated from Catalogue on configurable time period or directly against the Catalogue residing on Oracle DB Logical-Physical Mapping Status and Plans: Logical-Physical Mapping Status and Plans Logical-Physical Catalogue will be replicated and user can provide list of possible Catalogues for failover scenario Catalogue represents a table with: mappings of logical DB name and its replica. Server names and its redirection Work is in early stage to provide a Logical-physical mapping as a web service, based on load balancing and client’s physical location as a first prototype.Connection Management: Connection Management This module provides client-side connection management capabilities including: connection pooling logical-physical mapping for database connection string implements policies of retries and connection timeout failover to replicaSummary: Summary The chaotic nature Grid computing increases fluctuations in database services demand In high energy physics domain applications-driven issues require changes in both: Client technology Server technology see other talk: “High-performance Database Access Technologies for Computational Grids”