logging in or signing up mbranco ddm 2005 09 Dennison Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 25 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 31, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Developing a Data Management System for the ATLAS Experiment: Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco miguel.branco@cern.ch Outline: Outline ‘Data Challenges 2’ and ‘Rome Production’ Lessons Learned DQ2 Design Implementation Data model Services ConclusionDC2 and Rome Production: DC2 and Rome Production Production started Spring 2004 and finished recently ProdSys: Data Management (DQ): high-level service that interacted with all ATLAS Grid catalogs and storages File-based: relied on backend RLS (Globus RLS, EDG RLS) Also implemented a simple reliable file transfer (FIFO queue) Supervisors: collect jobs from production database dispatch to executors Executors (per ‘Grid’): translate physics definition to a Grid job and launch it DQ: All components interacted with data managementLessons learned: Lessons learned Catalogs were provided by Grid providers and used “as-is” Granularity: file-level. No datasets, no “file collections” No scoping of queries (difficult to find data, slow) No bulk operations No managed and transparent data access, unreliable GridFTP SRM also unreliable; Problems with mass storage Difficult to handle different mass storage stagers from Grid Metadata support not usable; too slow Logical Collection Name as metadata string field: /datafiles/rome/… Catalogs not always geographically distributed Single point of failure (middleware, people/timezones) No “ATLAS resources information system” (with known/negotiated QoS) … and unreliable information systems from Grid providers Operational problems Timezones, lack of people, experience, communication DQ2 Design rationale: DQ2 Design rationale Evolve from past experience Scalability Administrative, Geographical, Load Interoperability Grid m/w components Replica Catalog, Storage Management, Reliable File Transfer Global != Site != Local != Clients Production and User Analysis Security Datasets, not files… Bulk Datasets and Datablocks (a immutable collection of files) DQ2: DQ2 Moves from a file based system to one based on datasets Hides file level granularity from users A hierarchical structure makes cataloging more manageable However file level access is still possible Scalable global data discovery and access via a catalog hierarchy No global physical file replica catalog but global dataset replica catalog and global dataset location catalog Datasets Sites Files Files Files Files Files DatasetCatalog architecture and interactions: Catalog architecture and interactions‘Global’ catalogs: ‘Global’ catalogs Dataset Repository Holds all dataset names and unique IDs (+ system metadata) Dataset Content Catalog Dataset Hierarchy Dataset Location Catalog Maintains versioning information and information on ‘container datasets’, datasets consisting of other datasets Stores locations of each dataset Maps each dataset to its constituent files This one holds info on every logical file so must be highly scalable, however it can be highly partitioned using metadata etc.. All logically global but may be distributed physically‘Local’ Catalogs: ‘Local’ Catalogs Local Replica Catalog Claims Catalog Per grid/site/tier providing logical to physical file name mapping. Implementations of this catalog are Grid specific but must use a standard interface. Per site storage, keeping user claims on datasets. Claims are used to manage stage lifetime, resources and provide accounting. Currently all ‘Local’ catalogs are deployed per ATLAS siteImplementation: Implementation Architectural Style REST-style (not entirely RESTful) Communication: intend to migrate non-performance critical payload (monitoring, real-time status reporting) to XML soon vocabularies will emerge from experience of running system Development First usable prototype deployed 47 days after project started Technology choices Python; servers hosted on Apache (mod_python, mod_gridsite); clients using PyCurl POOL File Catalog interface gives us choice of back-end for catalogs File movement: SRM, GridFTP, gLite FTS, HTTP, dccp, cp Security Use HTTPS (with Globus proxy certs) for POST/PUT/DELETE and HTTP for GETs, ie world-readable data, best performance (can be made secure to ATLAS VO if required)Datablocks: Datablocks Datablocks are defined as immutable and unbreakable collections of files They are a special case of datasets A site cannot hold partial datablocks There are no versions for datablocks Used to aggregate files for convenient distribution Files grouped together by physics properties, run number etc.. Much more scalable than file level distribution Useful for provenance: immutable sets of data The principal means of data distribution and data discovery immutability avoids consistency problems when distributing data moving data in blocks improves data distribution (bulk SRM requests) Subscriptions: Subscriptions A site can subscribe to data When a new version is available, this latest version of the dataset is automatically made available through site-local specific services carrying out the required replication - Automated movement Subscriptions can be made to datasets (for file distribution) or container datasets (for datablock distribution) Use cases: Automatic distribution of datasets holding a variable collection of datablocks (container datasets) Automatic replication of files by subscribing to a mutable dataset (eg file-based calibration data distribution) Site ‘X’: Dataset ‘A’ (Container) Dataset ‘B’ Dataset ‘A’ | Site ‘X’ Dataset ‘B’ | Site ‘Y’ Site ‘Y’: Subscriptions: File1 File2 Data block1 Data block2 Subscriptions: Subscriptions Various data movement use cases… Datasets: latest version of a dataset (triggers automatic updates whenever a new version appears) Container Datasets: which in turn contain datablocks or datasets supports subscriptions to the latest version of a container dataset (automatically triggers updates whenever e.g. the set of datablocks making up the container dataset changes) Datablocks (single copy of immutable set of files) Databuckets (diagram next slide) replication of a set of files using notification model (whenever new content appears on the databucket, the replication is triggered) Subscribes to DS1 Dataset Location Catalog updatedData buckets: Data buckets Data must be replicated (quickly) not by the appearance of a new version but by new content alternative would be constantly defining new versions of datasets! Will use notification model: Whenever new content appears on a data bucket, sites subscribing to it are notified and data is moved accordingly Data buckets can contain files Data buckets can contain datablocksSummary of Services: Summary of Services Global services Dataset catalogs Requirements: grid environment, database, Apache services Site services Subscriptions, Databuckets, Claims and minimal information system (monitoring, real-time reporting) Requirements: grid environment, database, Apache services, DQ2 agents for moving data, grid-specific data movement clients, Python, PyCURL, grid certificate Local worker node client Contact local LRC, get and put data to local Storage Requirements: grid environment Clients Define datasets and datablocks, subscribe them to sites Associate files with new dataset versions Query dataset definition, contents, location … Requirements: Python, PyCURL, grid certificate for writingDetail on Subscriptions: Detail on Subscriptions State Machine unknownSURL knownSURL assigned toValidate validated done Agents Fetcher ReplicaResolver MoverPartitioner Mover ReplicaVerifier BlockVerifier Finds incomplete datasets Finds remote SURL Assigns Mover agents Moves file Verifies local replica Verifies whole dataset complete Function List of software required to handle subscriptions. Requires minimal deployment effort (laptop support!)Claims: Claims Claims catalog manages the usage of datasets User requests have a lifetime Claim is assigned User may add claims on existing datasets Claim owner may (should) release claim when done Claim owner may extend lifetime of claim Automatically handled by user client tools Behavior Each claim has an expiration time (now plus lifetime) Claim is active until released or expired Datasets may have multiple active claims for different users Cache-turnover relies on expired claims Claims provide mechanism for accounting, policy enforcement and dealing with Mass Storage (claim triggers SRM stage request) Conclusion: Conclusion Evolve the model based on past experience based on proven technologies Appears to scale so far load, geographic and very important administrative scalability It is running now across some US ATLAS and LCG sites Ramping up (starting now!) to the full set of LCG and US ATLAS resources. http://uimon.cern.ch/twiki/bin/view/Atlas/DDM You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
mbranco ddm 2005 09 Dennison Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 25 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 31, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Developing a Data Management System for the ATLAS Experiment: Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco miguel.branco@cern.ch Outline: Outline ‘Data Challenges 2’ and ‘Rome Production’ Lessons Learned DQ2 Design Implementation Data model Services ConclusionDC2 and Rome Production: DC2 and Rome Production Production started Spring 2004 and finished recently ProdSys: Data Management (DQ): high-level service that interacted with all ATLAS Grid catalogs and storages File-based: relied on backend RLS (Globus RLS, EDG RLS) Also implemented a simple reliable file transfer (FIFO queue) Supervisors: collect jobs from production database dispatch to executors Executors (per ‘Grid’): translate physics definition to a Grid job and launch it DQ: All components interacted with data managementLessons learned: Lessons learned Catalogs were provided by Grid providers and used “as-is” Granularity: file-level. No datasets, no “file collections” No scoping of queries (difficult to find data, slow) No bulk operations No managed and transparent data access, unreliable GridFTP SRM also unreliable; Problems with mass storage Difficult to handle different mass storage stagers from Grid Metadata support not usable; too slow Logical Collection Name as metadata string field: /datafiles/rome/… Catalogs not always geographically distributed Single point of failure (middleware, people/timezones) No “ATLAS resources information system” (with known/negotiated QoS) … and unreliable information systems from Grid providers Operational problems Timezones, lack of people, experience, communication DQ2 Design rationale: DQ2 Design rationale Evolve from past experience Scalability Administrative, Geographical, Load Interoperability Grid m/w components Replica Catalog, Storage Management, Reliable File Transfer Global != Site != Local != Clients Production and User Analysis Security Datasets, not files… Bulk Datasets and Datablocks (a immutable collection of files) DQ2: DQ2 Moves from a file based system to one based on datasets Hides file level granularity from users A hierarchical structure makes cataloging more manageable However file level access is still possible Scalable global data discovery and access via a catalog hierarchy No global physical file replica catalog but global dataset replica catalog and global dataset location catalog Datasets Sites Files Files Files Files Files DatasetCatalog architecture and interactions: Catalog architecture and interactions‘Global’ catalogs: ‘Global’ catalogs Dataset Repository Holds all dataset names and unique IDs (+ system metadata) Dataset Content Catalog Dataset Hierarchy Dataset Location Catalog Maintains versioning information and information on ‘container datasets’, datasets consisting of other datasets Stores locations of each dataset Maps each dataset to its constituent files This one holds info on every logical file so must be highly scalable, however it can be highly partitioned using metadata etc.. All logically global but may be distributed physically‘Local’ Catalogs: ‘Local’ Catalogs Local Replica Catalog Claims Catalog Per grid/site/tier providing logical to physical file name mapping. Implementations of this catalog are Grid specific but must use a standard interface. Per site storage, keeping user claims on datasets. Claims are used to manage stage lifetime, resources and provide accounting. Currently all ‘Local’ catalogs are deployed per ATLAS siteImplementation: Implementation Architectural Style REST-style (not entirely RESTful) Communication: intend to migrate non-performance critical payload (monitoring, real-time status reporting) to XML soon vocabularies will emerge from experience of running system Development First usable prototype deployed 47 days after project started Technology choices Python; servers hosted on Apache (mod_python, mod_gridsite); clients using PyCurl POOL File Catalog interface gives us choice of back-end for catalogs File movement: SRM, GridFTP, gLite FTS, HTTP, dccp, cp Security Use HTTPS (with Globus proxy certs) for POST/PUT/DELETE and HTTP for GETs, ie world-readable data, best performance (can be made secure to ATLAS VO if required)Datablocks: Datablocks Datablocks are defined as immutable and unbreakable collections of files They are a special case of datasets A site cannot hold partial datablocks There are no versions for datablocks Used to aggregate files for convenient distribution Files grouped together by physics properties, run number etc.. Much more scalable than file level distribution Useful for provenance: immutable sets of data The principal means of data distribution and data discovery immutability avoids consistency problems when distributing data moving data in blocks improves data distribution (bulk SRM requests) Subscriptions: Subscriptions A site can subscribe to data When a new version is available, this latest version of the dataset is automatically made available through site-local specific services carrying out the required replication - Automated movement Subscriptions can be made to datasets (for file distribution) or container datasets (for datablock distribution) Use cases: Automatic distribution of datasets holding a variable collection of datablocks (container datasets) Automatic replication of files by subscribing to a mutable dataset (eg file-based calibration data distribution) Site ‘X’: Dataset ‘A’ (Container) Dataset ‘B’ Dataset ‘A’ | Site ‘X’ Dataset ‘B’ | Site ‘Y’ Site ‘Y’: Subscriptions: File1 File2 Data block1 Data block2 Subscriptions: Subscriptions Various data movement use cases… Datasets: latest version of a dataset (triggers automatic updates whenever a new version appears) Container Datasets: which in turn contain datablocks or datasets supports subscriptions to the latest version of a container dataset (automatically triggers updates whenever e.g. the set of datablocks making up the container dataset changes) Datablocks (single copy of immutable set of files) Databuckets (diagram next slide) replication of a set of files using notification model (whenever new content appears on the databucket, the replication is triggered) Subscribes to DS1 Dataset Location Catalog updatedData buckets: Data buckets Data must be replicated (quickly) not by the appearance of a new version but by new content alternative would be constantly defining new versions of datasets! Will use notification model: Whenever new content appears on a data bucket, sites subscribing to it are notified and data is moved accordingly Data buckets can contain files Data buckets can contain datablocksSummary of Services: Summary of Services Global services Dataset catalogs Requirements: grid environment, database, Apache services Site services Subscriptions, Databuckets, Claims and minimal information system (monitoring, real-time reporting) Requirements: grid environment, database, Apache services, DQ2 agents for moving data, grid-specific data movement clients, Python, PyCURL, grid certificate Local worker node client Contact local LRC, get and put data to local Storage Requirements: grid environment Clients Define datasets and datablocks, subscribe them to sites Associate files with new dataset versions Query dataset definition, contents, location … Requirements: Python, PyCURL, grid certificate for writingDetail on Subscriptions: Detail on Subscriptions State Machine unknownSURL knownSURL assigned toValidate validated done Agents Fetcher ReplicaResolver MoverPartitioner Mover ReplicaVerifier BlockVerifier Finds incomplete datasets Finds remote SURL Assigns Mover agents Moves file Verifies local replica Verifies whole dataset complete Function List of software required to handle subscriptions. Requires minimal deployment effort (laptop support!)Claims: Claims Claims catalog manages the usage of datasets User requests have a lifetime Claim is assigned User may add claims on existing datasets Claim owner may (should) release claim when done Claim owner may extend lifetime of claim Automatically handled by user client tools Behavior Each claim has an expiration time (now plus lifetime) Claim is active until released or expired Datasets may have multiple active claims for different users Cache-turnover relies on expired claims Claims provide mechanism for accounting, policy enforcement and dealing with Mass Storage (claim triggers SRM stage request) Conclusion: Conclusion Evolve the model based on past experience based on proven technologies Appears to scale so far load, geographic and very important administrative scalability It is running now across some US ATLAS and LCG sites Ramping up (starting now!) to the full set of LCG and US ATLAS resources. http://uimon.cern.ch/twiki/bin/view/Atlas/DDM