Gold Accounting Manager: Gold Accounting Manager Scott Jackson
Scalable Systems Software Center
SC2002
20 NOV 2002
Introduction: Introduction In a nutshell, Gold is:
A resource bank (allocation management system)
tracks and manages resource usage. Much like a bank, it associates a cost to computing resources and allows resource credits to be allocated to users and projects. As jobs complete or as resources are utilized, projects are dynamically charged and resource usage recorded.
An accounting system
Can be dynamically customized to record any type of accounting data – pacct, sar, node availability, etc.
An information service
Also functions as a powerful generalized information service useful in a variety of means, such as providing mappings for meta-scheduling mappings of machines to resources, applications, accounts, users, etc.
But First, A Little Background…: But First, A Little Background… Scalable Systems Software Center
Research, develop and support an integrated suite of systems software and tools for the effective management and utilization of the highest scale computational resources.
SciDAC
Scientific Discovery through Advanced Computing – A DOE initiative to improve the impact of scientific computing
QBank
A dynamic allocation management system developed at PNNL and in use at about a dozen sites.
Motivation: Motivation Show return on investment
Funding sources have invested heavily in a supercomputer and require a means to show that it is being utilized efficiently.
Fairness
Management needs a means to fairly distribute the underlying computing resources (processors, memory, disk) to the various users and projects.
Capacity Planning
Accurate allocation and usage information is needed to effectively make decisions on resource commitments and new procurements.
Centralized Access Control
Many organizations want centralized control over which users and projects have access to what machines and for how long.
Motivation (from a meta-scheduling slant): Motivation (from a meta-scheduling slant) Meta-computing
Trust issues
Guarantees
Distributed Accounting
Local Control
Equitable trade agreement
Security
Different userids, accounts, execution environments on different resources
Nonfunctional Requirements: Nonfunctional Requirements Scalable
Targeting systems with tens of thousands of processors and thousands of simultaneous jobs.
Secure
Will use strong authentication (no clear-text passwords) to prevent unauthorized access and data encryption to prevent sensitive information from being intercepted (XML-DSIG and XML-ENC).
Fault Tolerant
Database performs automatic rollbacks on failed transactions. A distributed design that includes data replication will be researched.
Nonfunctional Requirements (cont.): Nonfunctional Requirements (cont.) Open Source
Allows free distribution, allows sites to make local modifications, and derived works, and promotes sharing of patches, ports and enhancements from user community.
Portable
Written in Java – initially tested to a reference Linux platform and expanded to include architectures used at the largest DOE computing facilities.
Easy to Use
Web accessible GUI (based on PHP and Javascript) will help managers, users and admins gain the access they need from their own PC’s.
Operational Characteristics: Operational Characteristics Supports familiar bank operations
Deposits, withdrawals, transfers, refunds, balance checks and bank statements
Reservations
Before a job runs, a reservation (or hold) is placed on the account based on the wallclock limit. This prevents overdrafts.
Quotations
In a meta-scheduling environment it is useful to know how much a job is going to “cost” so that you can make a decision on the best place to run your job.
Hierarchical Accounts
Projects can be nested (trickle down deposits, trickle up withdrawals)
Dynamic Resource Management Interaction: Dynamic Resource Management Interaction Make Deposits, etc.
Submit Job
Balance Check
Make Reservation
Start Job
Job Completes
Remove Reservation & Make Withdrawal
Dynamic Resource Management Interaction(with meta-scheduling): Dynamic Resource Management Interaction (with meta-scheduling) Resource Manager
(PBS, LL)
Allocation Manager
(Gold) 0 2 1 5 3 3 7 Make Deposits, etc.
Submit Job
Locate Feasible Systems & Obtain Quote
Stage Job
Balance Check
Make Reservation
Start Job
Job Completes
Remove Reservation & Make Withdrawal Meta-Scheduler
(Silver)
Scheduler
(Maui) 6 8 4
Allocations: Allocations An allocation is a collection of resource credits valid toward an arbitrary group of users, machines and projects and a timeframe for expenditure.
Commonly associated with a single project (account), and a set of users and machines.
Fine-grained control of who can use how much within a project can be achieved by multiple allocations in the same project.
The dimensions of Allocation Management: The dimensions of Allocation Management Projects
Grand Challenge
Development
Weather Modeling
Chem101
Navy
SETI
Viz
…
Users
Tom
Sheri
Scientist
Developer
Admin
Workshop
Manager
…
Resources Time DOE PNNL LLNL SDSC ANL MPP1 Colony Jupiter
Allocation-User Distribution Possibilities: Allocation-User Distribution Possibilities
Allocation-Machine Distribution Possibilities: Allocation-Machine Distribution Possibilities
Allocation Timeframes: Allocation Timeframes
Allocation Timeframes: Allocation Timeframes
Allocation Timeframes: Allocation Timeframes
Allocation Timeframes: Allocation Timeframes
Allocation Timeframes: Allocation Timeframes
Journaling: Journaling State Preservation
Preserves indefinite historical state of all objects and records
Bank Statements
Journaling allows bank statements to show balances for any arbitrary time in the past
Undo/Redo
With a powerful querying/updating comes the potential for rampant administrative mistakes
Time Travel
You can run any command as if it were an arbitrary date in the past
Flexible Charging Mechanism: Flexible Charging Mechanism Besides CPU, a resource supplier can charge based on the amount of memory, disk, or any other consumable resource as well as quality of service, primetime, nodetype, class, etc.
An external pricing engine interface will allow any sort of charging algorithm to be used such as dynamic price adjustment according to load or queue backlog, a query to an external information service or a cached second-price auction result.
Traceback Mechanism: Traceback Mechanism
Allows all parties of a transaction (resource requestor and provider) to have a record of the resource utilization and to have a say as to whether or not the job should be permitted to run, based on their independent policies and priorities. A job will only run if all parties are agreeable to the idea that the target resources can be used in the manner and amount requested. MetaLBNL runPNNL Meta
Account PNNL LBNL Traceback debit
Flexible and Extensible: Flexible and Extensible Powerful Querying/Updating Capabilities
Create, query, modify, delete, undelete
Support for operators (equals, less than, not equal, matching, etc.)
Conjunctive expression combinations (and, or)
Object joined queries
Dynamically Extensible
New object/record types and their fields can be dynamically created/modified through the regular query language (command line or GUI). This capability turns this system into a generalized information service. This capability is extremely powerful and can be used for meta-scheduling resource-mapping, an interface for persistence for other components, and all varieties of accounting possibilities!
Schedule: Schedule 4Q02 Requirements gathering completed and release initial Resource Management Interface Specs
2Q03 QBank bundled with SSS initial release (possible alpha-testing on Gold)
2Q04 Beta release of Gold
4Q05 Production release of Gold (includes support)
Contact Information: Contact Information Scott Jackson
Pacific Northwest National Laboratory
Scott.Jackson@pnl.gov
(509) 376-2205
Scalable Systems Software Center
http://www.scidac.org/ScalableSystems
QBank documentation and download
http://www.emsl.pnl.gov:2080/docs/mscf/qbank