slab june2002

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

CASPUR Storage Lab: 

CASPUR Storage Lab Andrei Maslennikov CASPUR Consortium June 2002

Contents: 

Contents Reasons to build a lab Agenda Components Testing plans Current results - Protocols First results - SCSI over IP First results – GFS First impressions - GPFS Final remarks

Why should we build a lab?: 

Why should we build a lab? - Objective: inventory and comparative studies for both current and new storage solutions. - General issues to look at: - True data sharing across architectures - Best performance, scalability - Efficient remote data access - Performance and reliability, but possibly cheaper components - Questions that we have to answer in a short run: - Can we replace our NetApp F760 (NFS) and Sun R220 servers (AFS) with Linux servers, at no loss of performance? - Can we use cheaper disk systems, or we still have to buy the high-end models?

Agenda: 

- Large file serving across architectures File serving off Linux-based servers: performance, limitations, hardware issues - File serving for Linux clients: new solutions - Data access over WAN - New disk and tape media Agenda

Components: 

- High-end base Linux unit for both servers and clients - 6x SuperMicro Superserver 6041G with: 2 x Pentium III 1000 MHz 2 GB of RAM, dual channel 160 MB SCSI on board SysKonnect 9843 Gigabit Ethernet NIC Qlogic QLA2200 Fibre Channel HBA System disk: 15000 RPM (Seagate) NEW: - 2x 6012P machines (1GB RAM, 2xP4 2GHz, 400MHz bus) - Network - NPI Keystone 12-port switch (throughput 12 Gbit) - Myricom Myrinet 8-port switch, 4 nodes attached - Wide Area Lab: in collaboration with CNAF(INFN) - 2 identical Dell Poweredge 1550 servers, each equipped with SysKonnect GE and Qlogic FC cards - High-speed line Rome-Bologna organized by CNAF/GARR (~400 km, dedicated link at 2.5 Gbit) Components

Components -2: 

Disks: scsi : several 15K RPM local units scsi-fc : 7.2K and 10K RAID systems (DotHill 54xx and 71xx) fc-fc : 10K RAID 256MB cache (71xx, n loan from DotHill) fc-fc : 15K RAID 1GB cache (FASTt700, on loan from IBM) ide-fc : 7.2K RAID 256MB cache (Infortrend IFT-6300) Tapes: 4 LTO fc Ultrium drives via SAN 2 AIT-3 fc drives via SAN (on loan from ENEA) SCSI / IP appliances: CISCO SN5420 appliance (Fibre Channel / iSCSI) – on loan from CISCO, now bought it DotHill Axis appliance (Fibre Channel / Ipstor) – on loan from DotHill CISCO 6500 crate (arrived) + beta FCIP unit(coming) – on loan from CISCO Alacritech Gigabit NIC with TCP-offload capability Components -2

CASPUR / CNAF Storage Lab: 

CASPUR / CNAF Storage Lab 5420 AXIS Dell 1550 Rome SM 6041G SM 6041G Myrinet 2.5 Gbit WAN, 400km FCIP SM 6012P SM 6012P

Testing Plans: 

Series 1. Comparison of the file transfer methods for large files - Setup : One server with a local disk, several clients on the network. - Goals : Benchmark several most commonly used file transfer methods: NFS, AFS, AFS-cacheless(Atrans), RFIO, ROOT, GridFTP, both on LAN and over WAN. Use large files (>1 GB). Study the case of multiple clients accessing the same server. Study other Linux file systems for large file hosting on the server (64 bit, volume manager etc ). Testing Plans Series 2. Study of SCSI-over-IP solutions - Setup : Fibre channel devices (tapes, disks), FC / IP appliances, tcp offload-capable NICs, clients on LAN and WAN - Goals : Provide client access over IP for native fibre channel devices, in a variety of ways (Ipstor, iSCSI, and others). Study SAN interconnection on the WAN (FCIP, iFCP, SoIP etc). Benchmark the performance, compare with the numbers obtained on the native fibre channel connection.

Testing Plans, 2: 

Series 3. Study of serverless disk sharing - Setup : Fibre channel disk devices accessible from several clients on the LAN - Goals : Configure and study: Sistina Global File System, IBM Sanergy. For DMEP-capable devices, try hardware locking (with GFS). See if GFS may be used for HA configurations (mail, web, dns etc). Testing Plans, 2 Series 4. Scalable NFS server based on IBM GPFS - Setup : Several server nodes with local disk interconnected with a fast, low-latency network; several client nodes. - Goals : Configure IBM GPFS, benchmark peak performance on the clients. Benchmark also the aggregate perfomance of the multinode server complex. Calculate the costs.

Testing Plans, 3: 

Series 5. Study of the new media (AIT-3, ide-fc) - Setup : New media (AIT-3 tapes, ide-fc RAID systems), test machines. - Goals : Configure systems and run a series of stress tests. Benchmark the performance. Testing Plans, 3

Current results – Series 1 (Protocols): 

Participated: - CASPUR : A.Maslennikov, G.Palumbo. - CERN : F.Collin, J-D.Durand, G.Lee, F.Rademakers, R.Többicke. Hardware configuration: Current results – Series 1 (Protocols) Server Client DotHill RAID FC 95 MB/sec memory-memory (ttcp) 15000 RPM disk R: 60 MB/sec W: 53 MB/sec IBM FASTt700 Infortrend IFT6300 R: 82 MB/sec W: 69 MB/sec R: 82 MB/sec W: 79 MB/sec R: 71 MB/sec W: 49 MB/sec

Series 1 - details: 

Some settings: - Kernel: 2.4.18-4 (RedHat 7.3) - AFS : cache was set up on ramdisk (400MB), chunksize=256 KB - NFS : version=3, rsize=wsize=65535 - used ext2 filesystem on servers (ext3, reiserfs make things slower) Problems encountered (we are still working on these issues): - Two highly performant cards on the same PCI bridge interfere visibly. We have just received the new 6012P machines with a faster bus, and start trying them as servers. - Caching effects on both client and the server side are quite pronounced. Even if we believe that we have accounted for all of them, more checks will be done. - Ratio File-Size / Available-Ram-On-Server is a factor. NB: What we report here are only our current and not final numbers. We will be shortly remeasuring. In particular, we see that in our current setup RFIO and ROOT write speeds are too low, and our first priority is to investigate this behaviour. Series 1 - details

Series 1 - more detail: 

Write tests: - Measured average time needed to transfer 1 GB from memory on the client to the disk of the file server, including the time needed to run “sync” command on both client and the server at the end of operation: dd if=/dev/zero of=<filename on server> bs=1000k count=1000 T=Tdd + max(Tsyncclient, Tsyncserver) For RFIO, this was done via a named pipe; For ROOT, 1GB file on client was first put in memory with “cat” command - Raw disk write speed tests were done with all the memory on the server locked. We used large files in these tests (1.5 GB). Series 1 - more detail

Series 1 - more detail: 

Read tests: - Measured average time needed to transfer 1 GB file from a disk on the server to the memory on the client (output directly to /dev/null ). - Reading was done in a loop over groups of 10 different files of 1GB each, so it was guaranteed that neither client nor server had any part of the file in the memory, at the moment when the file was read. Series 1 - more detail

Series 1- current results (MB/sec) [SM 6041 – 2GB RAM on server and client]: 

Series 1- current results (MB/sec) [SM 6041 – 2GB RAM on server and client] Next steps: - Clarify the caching issues, investigate RFIO and ROOT (with their authors) - NFS special tuning (with L.Genoni and I.Lisi) - Real cycle benchmarks (staging+processing) - WAN testing - Aggregate max speeds (multiple clients) - GridFTP / bbftp benchmarks

Series 2 (SCSI over IP): 

Participated: - CASPUR : M.Goretti, A.Maslennikov, G.Palumbo. - CNAF : PP.Ricci, F.Ruggieri, S.Zani. Hardware configuration: Series 2 (SCSI over IP) 5420 or AXIS Dell 1550 Bologna Dell 1550 Rome Disk Tapes Gigabit IP (Rome) 2.5 Gbit WAN, 400km

Series 2 - details: 

TCP settings for WAN: - With default TCP settings, we have obtained these speeds on the WAN link between the two Dell 1650 servers: 11 MB/sec (TCP, ttcp test) ~100 MB/sec (UDP, netperf test) We used then the B.Tierney’s cookbook, and got advice from L.Pomelli (CISCO), and S.Ravot(CERN). In the end, TCP window size was set to 256 Kbytes (we tried different values), and our best results were: 65 MB/sec on kernel 2.4.16 (TCP, ttcp test) 15 MB/sec on kernel 2.2.16 (TCP, ttcp test) - Obviously, AXIS performance on WAN was expected to be poor, because this box uses kernel 2.2.16. And we also were obliged to use the same kernel on the client, to make ipstor modules happy. Series 2 - details

Series 2 - more detail: 

What was measured: - Write tests: average time needed to transfer 1 GB from memory on the client to the iSCSI or ipstor disk or tape, including the time needed to run “sync” command on the client at the end of operation. - Read tests: average time needed to transfer 1 GB file from iSCSI or ipstor disk or tape to the memory on the client. Like in the Series 1 tests, reading was done in a loop over several different files of 1GB each. Series 2 - more detail

Series 2- current results (MB/sec): 

Series 2- current results (MB/sec) Notes: - New CISCO firmware may further improve the aggregate speed on 5420 - Waiting for AXIS sw upgrade to repeat the WAN tests with kernel 2.4.x R/W speed on native Fibre Channel HBA: this disk: 56/36, this tape: 15/15.

Series 3 (Sistina Global File System): 

Participants: - CASPUR : A.Maslennikov, G.Palumbo. Hardware configuration (2 variants): Series 3 (Sistina Global File System) 5420 Disk FC SAN SM 6041G SM 6041G SM 6041G SM 6041G (1) (2) Gigabit IP

Series 3 - details: 

GFS installation: - Requires kernel 2.4.16 (may be downloaded from Sistina together with the trial distribution). On fibre channel, everything works out of the box. - CISCO driver required recompilation. Compiled smoothly but would not work with Sistina kernel (we used the right 2.4.16 source tree, complete with their patches). - Found a workaround: rebuilt kernel with 2.4.16 source + Sistina patches. Then CISCO driver compiled and loaded smoothly, but Sistina modules would not load. Hacked them with the “objcopy”. All then worked automagically. What was measured: - Read and Write transfer rates (memory<->GFS file system) for large files, for both configurations. Series 3 - details

Series 3 – GFS current results (MB/sec): 

Series 3 – GFS current results (MB/sec) Next steps: - Will repeat benchmarks with the disk from Series 1 test, and compare them with those for the other methods - Will be exploring hardware DMEP function of DotHill disk system R/W speed on native Fibre Channel HBA: this disk: 56/36 NB: - Out of 4 nodes: 1 node was running the lock server process 3 nodes were doing only I/O

Series 4 (NFS server based on IBM GPFS): 

Participants: - CASPUR : A.Maslennikov, G.Palumbo - Pisa University : M.Davini Hardware configuration: Series 4 (NFS server based on IBM GPFS) SM 6041G SM 6041G SM 6041G SM 6041G Local disks NFS

Series 4 – details and first results: 

Installation: - Smooth and easy. Any kernel will do (all kernel-space stuff is GPL), we used 2.4.9-31 (RedHat 7.2) - Documentation: sufficient - 2 local 15K-rpm disks per node were used Myrinet: - Measured 150 MB/sec (memory-memory) with ttcp. Works out of the box. First results (work is still in progress): - 93 MB/sec write speed on one of the server nodes (4 active nodes with 2 disks/node) - NFS access from a single external client: poor, < 20 MB/sec - Two writers on two server nodes drop the aggregate speed significantly Series 4 – details and first results

Final remarks: 

We will continue with the tests, and any comment is very welcome. We are open for any collaboration. Vendors see these activities with a good eye, so new hardware may be arriving for tests, at no charge. Final remarks