pdsffarm

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

PDSF Site Report: 

PDSF Site Report Iwona Sakrejda PDSF/NERSC/LBNL HEPIX-HEPNT Jefferson Lab Newport News, VA October 30 - November 3, 2000

Who are we: 

Who are we Lawrence Berkeley National Laboratory National Energy Research Scientific Computing Center Energy Sciences Network Parallel Distributed Systems Facility 2 box cars full of hardware from SSC - only name left.

Slide3: 

1.2TB Veritas Vol Mgr Interactive Nodes (8) 6 GB Scratch/node 2 CPU/node 333 MHz/PII Batch Nodes (58) LSF 3.2 86 CPU’s total 400MHz, 450MHz, 266MHz 14 Data Vaults (4.5TB) Sun E450 Sun Ultra 60 (AFS) CISCO 5500 HPSS 100Mb network PDSF CLUSTER

Linux nodes……………....: 

Linux nodes…………….... Node Linux Task MHz Ram/MB Swap/GB CPUs Local Scr/GB pdsflx01 - 06 RH 6.1 interactive 333/II 512 2 2 6.1 pdsflx07 - 08 RH 5.1 interactive 333/II 512 2 2 6.1 pdsflx09 - 14 RH 6.1 batch 400/II 256 2 2 13.3 pdsflx15 - 20 RH 6.1 batch 450/III 256 2 2 8.4 pdsflx21 - 38 RH 6.1 batch 450/III 256 2 2 10.2 pdsflx39 - 54 RH 6.1 batch 400/II 256 2 1 8.4 pdsflx55 - 67 RH 6.1 batch 266/II 64 2 1 3.7

Data Vaults: 

Data Vaults Raidzone Disk Vaults pdsfdv01 -pdsfdv07 are running RedHat 5.2 1CPU/node 256MB RAM 62.8 GB/node pdsfdv08-pdsfdv10 are running RedHat 6.1 2CPU/node 265MB RAM 475.8 GB/node pdsfdv11-pdsfdv12 are running RedHat 6.1 1CPU/node 265MB RAM 120.8 GB/node pdsfdv14-pdsfdv15 are running RedHat 6.1 2CPU/node 265MB RAM 998 GB/node

Sun’s in the Cluster: 

Sun’s in the Cluster pdsfsu00 - Quad 250MHz Sun E450 running Solaris 2.6 Alternative platform for software development Reliable file server pdsfsu05 - Ultra 60 running Solaris 2.7 AFS client , access to 96 cells /afs/cern.ch /afs/rhic knfs server providing AFS for all the linux nodes and pdsfsu00

Mass Storage: 

Mass Storage PDSF has access to HPSS (High Performance Storage System) maintained and supported by NERSC (there is a separate allocation procedure) Optimized for large files and fast transfers Built on multiple disk farms and tape libraries Multiple user interface utilities available (to all users that belong to a project with active allocation). HSI - utility built at SDSC (San Diego) and supported by a NERSC group. PFTP - parallel ftp FTP Automatic authentication under HIS and PFTP

Our Batch System: 

Our Batch System Load Sharing Facility (3.2) batch system from Platform Computing The following queues are available: long (default queue); For all linux jobs medium; For jobs <= 24 hr long on a linux machine short; For jobs <= 1 hr long on a linux machine normal_su; For all sun jobs short_su; For jobs <= 1 hr long on the sun “Fair Share” allocation based on group’s financial contribution. “Round Robin” for users with jobs of the same priority Home grown accounting and reporting (mySQL ans cgi scripts).

Our owners…………………...: 

Our owners…………………... PDSF is run as a cooperative venture Users provide resources ($$$$) for major purchases of hardware, software (compilers, batch system) and support. NERSC is : provides purchasing expertise and support staff to run the cluster (2 .5 FTE) network support mass storage housing computers provides heat, electricity administrative support

Our Owners…………...: 

Our Owners…………... Experiments’ contributions made at different times brought to a common denominator by depreciation: - 3 year flat and then 25% depreciation/year (hardware life-time) - Moore’s law (50% over 18 months) On emergency basis experiment can purchase disk and cpu from the NERSC share. Price based on recent purchases with some overhead (for software and support). Money is used to build up the cluster.

Our Users - HENP: 

Our Users - HENP 358 users (10/24/2000) Experiments: STAR (RHIC, BNL) ATLAS (LHC, CERN) SNO (neutrino, mine) AMANDA (neutrinos, south pole E871 (Fermilab) E895 (AGS, BNL) PDSF is a secondary computing facility for these experiments Importance of AFS/network Experiments gets assistance in moving their software off AFS Focus on MC simulations and post-DST analysis

Our new stuff……..: 

Our new stuff…….. Oakland facility New hardware New network...

Our Cool New Toys: 

Our Cool New Toys

Move to Oakland: 

Move to Oakland New data vaults (2TB) provided 4 weeks in advance and users encouraged to store data (no HPSS for 1 week) New nodes 87 650MHz/PIII installed at the new location in advance. pdsflx00 and pdsfsu00 (home directories, software) backed up and moved (10/26-10/29) 10/30 system back on the network 10/31 system back in service old CPU’s added to the system (140 batch nodes, 250 CPU’s ) data vaults rebuilt to 10TB total LSF 4.0 installed with licenses for all the nodes fast network (Gb)

Our Future in Numbers: 

Our Future in Numbers 25% disk hardware replacement/year starting FY 2001

Complete facility, end of FY2002 -(STAR point of view D.Olson): 

Complete facility, end of FY2002 -(STAR point of view D.Olson) Data vaults (disk) Our Future

Our Staff: 

Our Staff Craig Tull - Cluster project leader (now at CERN) Thomas Davis - Lead system manager Carry Whitney - system manager Shane Canon - system manager Iwona Sakrejda - user support Have to start training them young….