1 / 22

Hotfoot HPC Cluster

Hotfoot HPC Cluster. March 31, 2011. Topics. Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance. Overview - Hotfoot Pilot. Launched May 2009 Original Partnership Astronomy Statistics CUIT Office of the Executive Vice President for Research.

cleta
Download Presentation

Hotfoot HPC Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hotfoot HPC Cluster March 31, 2011

  2. Topics • Overview • Execute Nodes • Manager/Submit Nodes • NFS Server • Storage • Networking • Performance

  3. Overview - Hotfoot Pilot • Launched May 2009 • Original Partnership • Astronomy • Statistics • CUIT • Office of the Executive Vice President for Research

  4. Overview - Hotfoot Expansion • Expanded March 2011 • More Nodes • More Storage • Changed Scheduler • New Participant • Social Science Computing Committee (SSCC)

  5. Overview – Cluster Components • 52 Execute Nodes • 520 Total Cores • 2 Manager Nodes • 1 NFS Server (1 Cold Spare) • 52 TB Storage (72 TB Raw)

  6. Overview

  7. Overview - Architecture

  8. Execute Nodes

  9. Manager/Submit Nodes • HP DL360 G5, 4 GB RAM • Torque Resource Manager (OpenPBS descendent) • Maui Cluster Scheduler • User Access via virtual interface (vif) • Failover via Torque High Availability (HA)

  10. NFS Servers • Primary • HP DL360 G7 • 2 x 4 cores • 16 GB RAM • Backup • HP DL360 G5 • 1 x 2 cores • 8 GB RAM

  11. Storage • HP P2000 Storage Array • 32 x 2 TB Drives • RAID 5 • ~52 TB Usable

  12. Networking • Execute Nodes • Channel-bonding mode 2 (load-balancing and fault tolerance) • 1 Gb connection to chassis switches • Usage records suggested this was sufficient

  13. Networking Sample Traffic for an Execute Node

  14. Networking • Chassis • Each chassis has four Cisco 3020 switches • 1 Gb connection to Edge switches • Usage records suggested this was sufficient

  15. Networking Sample Traffic for a Chassis Switch

  16. Networking Original Chassis, Showing Network Connections for Two Servers

  17. Performance • Concern about the ability of NFS to handle i/o demands. • Reviewed performance of pilot system. • Ran tests on expanded system.

  18. Performance Memory Usage on Old NFS Server

  19. Performance Load Average on Old NFS Server

  20. Performance Test Program #include <stdio.h> #define MILLION 1000000 int main(intargc, char *argv[]) { int max, i; max = 100 * MILLION; for(i = 0; i < max; ++i) { printf("%d\n", i); } }

  21. Performance

  22. Questions? • Questions? • Comments? • Contact: roblane@columbia.edu

More Related