1 / 11

Linux Servers with JASMine

Linux Servers with JASMine. K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003. JASMine. JASMine JLab’s Mass Storage System i.e. CASTOR, Enstore, … Distributed Servers Data Movers (tape and disk) Two tape drives per Data Mover 600+GB of staging disk space (3 9840B tapes)

heaton
Download Presentation

Linux Servers with JASMine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003

  2. JASMine • JASMine • JLab’s Mass Storage System • i.e. CASTOR, Enstore, … • Distributed Servers • Data Movers (tape and disk) • Two tape drives per Data Mover • 600+GB of staging disk space (3 9840B tapes) • Need fast access to/from disk to keep up with the 9940B tape drives and gigabit ethernet • Cache Servers (disk) • 1-2TB file servers • JASMine manages the files • Copies from Data Movers via JASMine’s jcp protocol • User access via NFS (read-only)

  3. Lastest Data Mover • Operating System • RedHat 7.3, kernel 2.4.20-xfs • XFS File System • Hardware • Dual 2.2GHz Xeon CPUs • SuperMicro P4DPE Motherboard • 2 GBytes RAM • 2 LSI Logic MegaRaid 320-2 raid controllers • 14 Seagate 73GB disk drives (hot swap) • Qlogic 2342 dual port fiber card ($$) • 2 9940B tape drives • Intel PRO/1000XT Server Ethernet Card • 3U Chassis with N+1 power supplies • $14,200.00 US (without the 2 9940B tape drives)

  4. Disk Performance Tests • Used Standard Tests (Disktest, Bonnie++, IOZone) • 4GB file size used • Wanted to try the Fermi test (lack of time) • Parameters tested • Write-through vs Write-back cache policy • Optimum disk read/write block sizes • RAID-5 vs. RAID-50 performance • RAID 5 array done in hardware (1 RAID card) • RAID 50 • 2 RAID-5 arrays done in hardware (1 per RAID card) • RAID-0 array done in software

  5. Issues/Problems Discovered • LSI Logic MegaRAID 320-2 raid controllers • Vendor support only if you use standard RedHat kernels • These do not have XFS support • RAID monitor software from LSI Logic • Causes SCSI Bus Resets • Occurs every 20 seconds (not changeable) • Throughput drops to 4-5MB/sec when occurring as it resets the bus and flushes cache • Work Around • Turn off Raid monitoring • Without this, there is no real way to monitor the status of the disks and raid hardware • Disk failures go unnoticed • Looking into Adaptec 2200S RAID cards

  6. Disk Test Results • Disk Results • Use Write-back cache on RAID card • 32K block sizes are optimum • Raid 50 was fastest (no real surprise) • Idle System (1 reader or 1 writer) • 210MB/sec disk read throughput • 140MB/sec write throughput • Busy system (8 readers and 8 writers) • 40MB/sec aggregate read throughput • 110 MB/sec aggregate write throughput

  7. Tape Performance Testing • Used JASMine test program (Java) • Double-buffered • Threads simultaneously reading and writing from/to the buffer • Calculates/Verifies file checksum • Moves file between disk and tape • Used real raw data from the experiments • 2GB files • HallA and HallC data in CODA format • Does not compress • CLAS data in BOS format • Does compress

  8. Tape Test Results • No Issues or Problems • Qlogic 2342 dual port fiber card works well with Linux • Some Extra CPU required for checksums • Hyper-Threading really helps the performance here • 9940B Results as Expected • Direction does not matter (read/write) • 30MB/sec if file is not compressible • Up to 45MB/sec if file is compressible • Depends on the compressibility of the file • Two simultaneous copies • 30MB/sec each if file is not compressible (no change) • Expected 37.5MB/sec each for compressible file read from tape - Observed 30MB/sec each

  9. Latest Cache Server • Operating System • RedHat 7.3, kernel 2.4.18-xfs • XFS File System • Hardware • Dual 2.0GHz Xeon CPUs • SuperMicro P4DPE Motherboard • 2 GBytes RAM • 2 3ware 7850 IDE/ATA RAID controllers (RAID-5) • 16 Hot Swap Disk Drives • Maxtor 160GB ATA133 • Western Digital 180GB ATA100 • Intel PRO/1000XT Server Ethernet Card • 4U Chassis with N+1 power supplies • $9,000.00 US

  10. Issues/Problems Discovered • Western Digital 180GB/200GB ATA100 Drives • Drives go offline/idle (WD feature) • 3ware card thinks the drive died • Solution • Get Disk Firmware Version 63.13F70 from Western Digital • Use Maxtor 160GB ATA133 drives

  11. Experience with IDE/ATA Drives in General • High failure rates during the first two months of use • 1-3 per week • Need a longer burn in period • Failure rates decrease after two months of use • 1 every 6-8 weeks • marginal drives gone? • They still fail more often than SCSI disks • Then again, we lost 2 SCSI disks today • Number of disks by type used in servers • 191 SCSI • 320 ATA

More Related