logging in or signing up Wyatt rc Clown Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 75 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: August 30, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Status of SAM, JIM, SAM-Gridand Future Plans: Status of SAM, JIM, SAM-Grid and Future Plans Definition of terms Some brief requirements Status and deployments of SAM Status and deployments of JIM Near term developments for SAM and JIM SAM-Grid: The work plan for the next 2 years Data handling in context Definition of terms: Definition of terms Data handling system – the complex of software tools and servers which tracks locations of collider data andamp; derived data sets,defines sets of these data files, delivers specific sets to applications, tracks processing information to desired levels SAM – DØ's data handling system. Depends on/uses: Central ORACLE database at Fermilab (central metadata repository) ENSTORE mass storage system at Fermilab (central data repository) Access to various file transfer protocols: bbftp, GridFTP, rcp,dccp.. Other mass storage systems: HPSS, TSM,… Jobs can be submitted to SAM via: direct SAM command (sam submit) d0tools (most usually) MCRunjob (in development) JIM (in testing) Dataset definition tools: Web-based GUI Command line interface Dataset definition wizard (in development) Definition of termscont’d: Definition of terms cont’d JIM (Job Information and Management) – component of SAM-Grid which provides for remote job submission and monitoring Data Grid – collection of computing andamp; storage resources among which data files can be transferred by a general software service. The computers which run SAM stations today can be considered a data grid. (aka distributed data processing) Compute Grid – collection of computing resources among which applications can be distributed by a general software service. The computers which run SAM stations and have installed JIM execution sites can be considered a compute grid. (aka Grid, DataGrid) Database Grid – collection of computing resources which can share metadata from distributed, possibly disparate databases We don’t have a functioning example of a database grid. Some brief requirementsfor remote computing: Some brief requirements for remote computing Minimum remote data handling requirements: 1) Distribute data files from the primary mass store to all computing systems in the experiment which install the data handling system. Route data in a configurable way. 2)Receive for storage in the primary mass store simulated or derived data sets from remote institutions which have installed the DH system. Accommodate secondary storage locations. 3)Track processing apps at central and remote stations. Next level requirement: Distribute applications from arbitrary submission sites with the DØ code distribution available to generally available computing resources. And then: Provide tools for the automatic scheduling of jobs and for priorization: co-locate data and processing Some brief requirementsfor remote computing: Some brief requirements for remote computing Maintain high availability of services Minimize operational load Provide for error handling: Worker node failure File transfer failure User application crashes/hangs/is killed by user/goes berserk Tune systems for most common use cases / expected bottlenecks / … Provide for system monitoring andamp; debugging Provide for user monitoring andamp; accounting Mitigate security risks Status and deployments of SAM: Status and deployments of SAM Operational 24/7 on the Fermilab central systems: online, reconstruction farm, d0mino, cab, clued0 Operational at Monte Carlo production sites Operational at remote analysis sites: ~20 active, ~40 deployed Coming on line for remote reconstruction sites: U Mich, U Wisc, others later, including Lyon? Statistics: ~40000 proj FNAL, ~3500 proj remote (since 1/1/03) ~200M evts collider data, andgt; 350 TB stored at FNAL Status and deployments of SAM: Status and deployments of SAM Current versions Software suite: V5.0 Includes sam_station v4_2_1_43 (sophisticated ‘compute element’: routing, different caching strategies), new user api, new batch adapter Database schema: V4.9 Expected releases this autumn Database schema V5.1 – major changes to data_files table (many-to-many files to runs connection; metadata specification according to file type) Software suite: V5.x – dbserver rewrite to take advantage of new schema Status and deployments of JIM: Status and deployments of JIM Current JIM version V1.0 – Job broker, execution and submission site software, job monitor, client software Current push to deploy JIM V1.0 Deployment of demo last November – proof of principle Deployment of submission site on clued0 with execution site on CAB – immediately pointed out lack of viable model for job transfer (either input or output). Working on that now. (Note we can already submit SAM jobs to CAB from clued0 using d0tools.) Request: can job broker pick between clued0 and CAB based on queue status? Deployment of submission and execution sites at Wuppertal, IC, RAL, Lancaster. In progress or envisioned at several other sites: U Wisc, U Mich, GridKA, NIKHEF, … Near term developments for SAM and JIM: Near term developments for SAM and JIM JIM: Complete deployment of GridFTP as WAN transfer protocol SAM: Deploy new schema and accompanying dbserver improvements JIM: Devise and implement viable input/output sandbox model [Both: Bring CDF stations to full operational status] SAM: Migrate to new ORACLE database server machine and new version of ORACLE JIM: Migrate to new version of Globus software Integrate remote analysis stations into regular metric reporting; define offline shift responsibilities wrt remote stations SAM-Grid: The work plan for the next 2 years: SAM-Grid: The work plan for the next 2 years Implement Schema Update I. (file_type, runs changes) Equip job broker to automate MC production; understand issues in automating job distribution for re-processing and analysis. Revise caching strategies. (local vs fileserving; merging operations; connections w/ other layers) Implement Schema Update II. (processing requirements for jobs, group info) Equip optimizers and job brokers to deal w/ info in Schema Update II. Sort out parallelization issues. Implement Virtual Organization tools. Implement Monitoring and Information server on the SAM side Provide for distributed database: two parts, file location info and processing info.; equip servers for more autonomous operation SAM-Grid: The work plan for the next 2 years: SAM-Grid: The work plan for the next 2 years Evaluate technology changes/upgrades Improvements for installation/config management? Move to VDT suite (production version of Condor, Globus, etc.) Possible CORBA replacements – WebServices? XML-based logging – will this be the way to go? Which solution for distributed DB’s? Plan for interoperability Merge SAM catalog w/ other replica schemas? Follow example of DØ/CDF merge? Interoperation with other replica catalogs? GLUE schema for resource description; job description language Sam_batch_adapter technology Working with SRM’s – await outcome of caching strategy discussions Interactions of tools w/ data handling system: cf. mc_runjob andamp; d0tools w/ JIM and CAF(CDF) VO organization issues Security issues (VO, file transfer, job submission) Data handling in context: Data handling in context We have the ability to distribute data to remote locations, allowing institutions to utilize their local resources for analysis (sam station) We have a system for job processing and analysis bookkeeping which can be deployed remotely as well (sam station, project master, user api,…) We have the ability to interface to local storage and batch systems (HPSS, TSM,… ; LSF, PBS, FBS, Condor, SGE,…) We see distinct opportunities to add the ability for automated job scheduling and remote submission capabilities Data handling in context: Data handling in context For DØ, maintaining stable operation and current functionality is a priority The operational load remains a concern: we need more shifters, and adding development projects that impose a larger operational load would be a time for cost/benefit calculations You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Wyatt rc Clown Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 75 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: August 30, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Status of SAM, JIM, SAM-Gridand Future Plans: Status of SAM, JIM, SAM-Grid and Future Plans Definition of terms Some brief requirements Status and deployments of SAM Status and deployments of JIM Near term developments for SAM and JIM SAM-Grid: The work plan for the next 2 years Data handling in context Definition of terms: Definition of terms Data handling system – the complex of software tools and servers which tracks locations of collider data andamp; derived data sets,defines sets of these data files, delivers specific sets to applications, tracks processing information to desired levels SAM – DØ's data handling system. Depends on/uses: Central ORACLE database at Fermilab (central metadata repository) ENSTORE mass storage system at Fermilab (central data repository) Access to various file transfer protocols: bbftp, GridFTP, rcp,dccp.. Other mass storage systems: HPSS, TSM,… Jobs can be submitted to SAM via: direct SAM command (sam submit) d0tools (most usually) MCRunjob (in development) JIM (in testing) Dataset definition tools: Web-based GUI Command line interface Dataset definition wizard (in development) Definition of termscont’d: Definition of terms cont’d JIM (Job Information and Management) – component of SAM-Grid which provides for remote job submission and monitoring Data Grid – collection of computing andamp; storage resources among which data files can be transferred by a general software service. The computers which run SAM stations today can be considered a data grid. (aka distributed data processing) Compute Grid – collection of computing resources among which applications can be distributed by a general software service. The computers which run SAM stations and have installed JIM execution sites can be considered a compute grid. (aka Grid, DataGrid) Database Grid – collection of computing resources which can share metadata from distributed, possibly disparate databases We don’t have a functioning example of a database grid. Some brief requirementsfor remote computing: Some brief requirements for remote computing Minimum remote data handling requirements: 1) Distribute data files from the primary mass store to all computing systems in the experiment which install the data handling system. Route data in a configurable way. 2)Receive for storage in the primary mass store simulated or derived data sets from remote institutions which have installed the DH system. Accommodate secondary storage locations. 3)Track processing apps at central and remote stations. Next level requirement: Distribute applications from arbitrary submission sites with the DØ code distribution available to generally available computing resources. And then: Provide tools for the automatic scheduling of jobs and for priorization: co-locate data and processing Some brief requirementsfor remote computing: Some brief requirements for remote computing Maintain high availability of services Minimize operational load Provide for error handling: Worker node failure File transfer failure User application crashes/hangs/is killed by user/goes berserk Tune systems for most common use cases / expected bottlenecks / … Provide for system monitoring andamp; debugging Provide for user monitoring andamp; accounting Mitigate security risks Status and deployments of SAM: Status and deployments of SAM Operational 24/7 on the Fermilab central systems: online, reconstruction farm, d0mino, cab, clued0 Operational at Monte Carlo production sites Operational at remote analysis sites: ~20 active, ~40 deployed Coming on line for remote reconstruction sites: U Mich, U Wisc, others later, including Lyon? Statistics: ~40000 proj FNAL, ~3500 proj remote (since 1/1/03) ~200M evts collider data, andgt; 350 TB stored at FNAL Status and deployments of SAM: Status and deployments of SAM Current versions Software suite: V5.0 Includes sam_station v4_2_1_43 (sophisticated ‘compute element’: routing, different caching strategies), new user api, new batch adapter Database schema: V4.9 Expected releases this autumn Database schema V5.1 – major changes to data_files table (many-to-many files to runs connection; metadata specification according to file type) Software suite: V5.x – dbserver rewrite to take advantage of new schema Status and deployments of JIM: Status and deployments of JIM Current JIM version V1.0 – Job broker, execution and submission site software, job monitor, client software Current push to deploy JIM V1.0 Deployment of demo last November – proof of principle Deployment of submission site on clued0 with execution site on CAB – immediately pointed out lack of viable model for job transfer (either input or output). Working on that now. (Note we can already submit SAM jobs to CAB from clued0 using d0tools.) Request: can job broker pick between clued0 and CAB based on queue status? Deployment of submission and execution sites at Wuppertal, IC, RAL, Lancaster. In progress or envisioned at several other sites: U Wisc, U Mich, GridKA, NIKHEF, … Near term developments for SAM and JIM: Near term developments for SAM and JIM JIM: Complete deployment of GridFTP as WAN transfer protocol SAM: Deploy new schema and accompanying dbserver improvements JIM: Devise and implement viable input/output sandbox model [Both: Bring CDF stations to full operational status] SAM: Migrate to new ORACLE database server machine and new version of ORACLE JIM: Migrate to new version of Globus software Integrate remote analysis stations into regular metric reporting; define offline shift responsibilities wrt remote stations SAM-Grid: The work plan for the next 2 years: SAM-Grid: The work plan for the next 2 years Implement Schema Update I. (file_type, runs changes) Equip job broker to automate MC production; understand issues in automating job distribution for re-processing and analysis. Revise caching strategies. (local vs fileserving; merging operations; connections w/ other layers) Implement Schema Update II. (processing requirements for jobs, group info) Equip optimizers and job brokers to deal w/ info in Schema Update II. Sort out parallelization issues. Implement Virtual Organization tools. Implement Monitoring and Information server on the SAM side Provide for distributed database: two parts, file location info and processing info.; equip servers for more autonomous operation SAM-Grid: The work plan for the next 2 years: SAM-Grid: The work plan for the next 2 years Evaluate technology changes/upgrades Improvements for installation/config management? Move to VDT suite (production version of Condor, Globus, etc.) Possible CORBA replacements – WebServices? XML-based logging – will this be the way to go? Which solution for distributed DB’s? Plan for interoperability Merge SAM catalog w/ other replica schemas? Follow example of DØ/CDF merge? Interoperation with other replica catalogs? GLUE schema for resource description; job description language Sam_batch_adapter technology Working with SRM’s – await outcome of caching strategy discussions Interactions of tools w/ data handling system: cf. mc_runjob andamp; d0tools w/ JIM and CAF(CDF) VO organization issues Security issues (VO, file transfer, job submission) Data handling in context: Data handling in context We have the ability to distribute data to remote locations, allowing institutions to utilize their local resources for analysis (sam station) We have a system for job processing and analysis bookkeeping which can be deployed remotely as well (sam station, project master, user api,…) We have the ability to interface to local storage and batch systems (HPSS, TSM,… ; LSF, PBS, FBS, Condor, SGE,…) We see distinct opportunities to add the ability for automated job scheduling and remote submission capabilities Data handling in context: Data handling in context For DØ, maintaining stable operation and current functionality is a priority The operational load remains a concern: we need more shifters, and adding development projects that impose a larger operational load would be a time for cost/benefit calculations