DatabaseAccessPatter ns

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Database Access Patterns on a Federation of World-Wide Computational Grids: 

Database Access Patterns on a Federation of World-Wide Computational Grids First DIALOGUE Workshop Applications-Driven Issues in Data Grids August 1, 2005 Columbus, OH Alexandre Vaniachine, David Malon (ANL) Pavel Nevski (BNL), Yulia Shapiro (CERN)

Outline: 

Outline Applications Domain: LHC and ATLAS increasingly complex data models ATLAS Data Challenges: General issues encountered: Increased fluctuations in database server load Connections count limitations Application-side solution: database client library not domain specific

Introduction: 

Introduction The Large Hadron Collider (LHC) at the CERN Laboratory in Switzerland will be the largest scientific instrument in the world beginning operations in 2007 This facility will provide the opportunity for new discoveries in particle physics

LHC/CERN: 

Mont Blanc, 4810 m Geneva LHC/CERN

ATLAS Experiment: 

ATLAS Experiment 2000 Scientists 150 Institutes 34 Countries D = 25 m L = 46 m Weight 7000 Ton Figure for scale

Underground Cavern: 

Underground Cavern Scientists working in a hard hat area Tile Calorimeter Undergoes Testing

First Underground Events: 

First Underground Events ATLAS barrel Tile calorimeter has recorded its first events underground using a cosmic ray trigger, as part of the detector commissioning program

ATLAS Data Challenges: 

ATLAS Data Challenges To address petabyte-scale data processing challenges LHC experiments are deploying distributed data management solutions Besides the file-based data, LHC applications require access to terabytes of data store in relational databases In preparation for data taking ATLAS experiment run a series of large-scale computational exercises – Data Challenges - validating distributed data grids solutions under development ATLAS Data Challenges run on a world-wide federation of computational grids harnessing the power of more than twenty thousand processors

ATLAS Federation of Grids: 

ATLAS Federation of Grids 14 kCPU 140 sites 5.7 kCPU 49 sites 3.5 kCPU 30 sites July 20: 15 kCPU 50 sites

Spread of Grid Production: 

Spread of Grid Production Most recent production period: 84 sites in 22 countries U.S. CERN

Databases and the Grids: 

Databases and the Grids

Beyond the Grid Infrastructure: 

Beyond the Grid Infrastructure Experience from deployment of grid technologies in a HEP production environment of ATLAS Data Challenges demonstrated that a naïve view of the grid as a p2p system is inadequate Operations show that just a single database on top is not enough to run the production system effectively A hyperinfrastructure of databases below raises the production system efficiency considerably Thus a hyperinfrastructure of databases on the grid plays a dual role: built-in part of the middleware (monitoring, catalogs, etc.) distributed infrastructure of the production system necessary to run the scatter-gather data processing applications on the grid

Data Mining of Operations: 

Data Mining of Operations The data-mining of the collected operations data reveals a striking feature – a very high degree of correlations between the failures: if the job submitted to some cluster failed, there is a high probability that a next job submitted to the cluster would fail too if the submit host failed, all the jobs scattered over different clusters will fail too Taking these correlations into account is not yet automated by the grid middleware That is why production databases and grid monitoring data that are providing immediate feedback on the Data Challenge operations to the production operators is very important for efficient utilization of the Grid capacities

Increased Fluctuations: 

Increased Fluctuations Among the database performance issues encountered is the increase in fluctuations in database servers’ workloads due to the chaotic nature of grid computations The observed fluctuations in database access patterns are of general nature and must be addressed by grid middleware solutions

Scalability Challenge: 

Scalability Challenge Database services capacities should be adequate for peak demand The chaotic nature of Grid computing increases fluctuations in demand for database services

Production Rate Growth: 

Production Rate Growth CERN Database Capacities Bottleneck

Jobs Load: 

Jobs Load No apparent correlations between jobs and database server load fluctuations Job failures?

ATLAS Efficiency on LCG: 

ATLAS Efficiency on LCG Efficiency fluctuates because clusters are “large error amplifiers”?

ATLAS Efficiency on NorduGrid : 

ATLAS Efficiency on NorduGrid Fluctuations are limited when efficiency is close to the 100% boundary

ATLAS Efficiency on Grid3 : 

ATLAS Efficiency on Grid3 ATLAS achieved highest overall production efficiency on Grid3 (now OSG)

A Side Note on a New Computing Paradigm: 

A Side Note on a New Computing Paradigm In comparison to the old centralized computing Grid efficiency may look small Splitting of large computational task into smaller tasks (jobs) in Grid computing paradigm is similar to the splitting of a large file into smaller TCP/IP packets during the FTP data transfer Do you know how many TCP/IP packets were lost during the file transfer? Do you even care? But, of course, somebody does – watch for the emergence of a new profession in Grid computing, like the network engineer who cares about the lost TCP/IP packets

Limiting Resource: 

Limiting Resource In some cases the connections count happens to be the limiting resource Default shell configuration limits max open files per process resulting in server connections count limited to a 1,000 per server Depending on the Linux kernel version the limit can be reconfigured to 4,000 or 10,000

Complex Data Access Patterns: 

Complex Data Access Patterns Athena Applications Online Applications User Applications D a t a b a s e C o n n e c t i o n s Database Servers

Client Library: 

To improve robustness of database access in a data grid environment we propose the application-side solution – a software component abstracting the database and/or middleware connectivity concerns in a generalized database client library Client Library

Slide25: 

Server Indirection One of lessons learnt in ATLAS Data Challenges is that the database server address should NOT be hardwired in data processing transformations The logical-physical indirection for database servers is now introduced in ATLAS Similar to the logical-physical file Replica Location Service indirection of the Grid file catalogs Being adopted by common LHC project

Connection Service: 

Connection Service DBConnectionSvc implements application service for resolving logical-physical mapping and access to connection management library Logical-Physical Resolution can be done against the local file updated from Catalogue on configurable time period or directly against the Catalogue residing on Oracle DB

Logical-Physical Mapping Status and Plans: 

Logical-Physical Mapping Status and Plans Logical-Physical Catalogue will be replicated and user can provide list of possible Catalogues for failover scenario Catalogue represents a table with: mappings of logical DB name and its replica. Server names and its redirection Work is in early stage to provide a Logical-physical mapping as a web service, based on load balancing and client’s physical location as a first prototype.

Connection Management: 

Connection Management This module provides client-side connection management capabilities including: connection pooling logical-physical mapping for database connection string implements policies of retries and connection timeout failover to replica

Summary: 

Summary The chaotic nature Grid computing increases fluctuations in database services demand In high energy physics domain applications-driven issues require changes in both: Client technology Server technology see other talk: “High-performance Database Access Technologies for Computational Grids”