1 / 96

High Performance Linux Clusters

High Performance Linux Clusters. Guru Session, Usenix, Boston June 30, 2004 Greg Bruno, SDSC. Overview of San Diego Supercomputer Center . Founded in 1985 Non-military access to supercomputers Over 400 employees Mission: Innovate, develop, and deploy technology to advance science

gella
Download Presentation

High Performance Linux Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Linux Clusters Guru Session, Usenix, Boston June 30, 2004 Greg Bruno, SDSC

  2. Overview of San Diego Supercomputer Center • Founded in 1985 • Non-military access to supercomputers • Over 400 employees • Mission: Innovate, develop, and deploy technology to advance science • Recognized as an international leader in: • Grid and Cluster Computing • Data Management • High Performance Computing • Networking • Visualization • Primarily funded by NSF

  3. My Background • 1984 - 1998: NCR - Helped to build the world’s largest database computers • Saw the transistion from proprietary parallel systems to clusters • 1999 - 2000: HPVM - Helped build Windows clusters • 2000 - Now: Rocks - Helping to build Linux-based clusters

  4. Why Clusters?

  5. Moore’s Law

  6. Cluster Pioneers • In the mid-1990s, Network of Workstations project (UC Berkeley) and the Beowulf Project (NASA) asked the question: Can You Build a High Performance Machine From Commodity Components?

  7. The Answer is: Yes Source: Dave Pierce, SIO

  8. The Answer is: Yes

  9. Types of Clusters • High Availability • Generally small (less than 8 nodes) • Visualization • High Performance • Computational tools for scientific computing • Large database machines

  10. High Availability Cluster • Composed of redundant components and multiple communication paths

  11. Visualization Cluster • Each node in the cluster drives a display

  12. High Performance Cluster • Constructed with many compute nodes and often a high-performance interconnect

  13. Cluster Hardware Components

  14. Cluster Processors • Pentium/Athlon • Opteron • Itanium

  15. Processors: x86 • Most prevalent processor used in commodity clustering • Fastest integer processor on the planet: • 3.4 GHz Pentium 4, SPEC2000int: 1705

  16. Processors: x86 • Capable floating point performance • #5 machine on Top500 list built with Pentium 4 processors

  17. Processors: Opteron • Newest 64-bit processor • Excellent integer performance • SPEC2000int: 1655 • Good floating point performance • SPEC2000fp: 1691 • #10 machine on Top500

  18. Processors: Itanium • First systems released June 2001 • Decent integer performance • SPEC2000int: 1404 • Fastest floating-point performance on the planet • SPEC2000fp: 2161 • Impressive Linpack efficiency: 86%

  19. Processors Summary

  20. But What You Really Build? • Itanium: Dell PowerEdge 3250 • Two 1.4 GHz CPUs (1.5 MB cache) • 11.2 Gflops peak • 2 GB memory • 36 GB disk • $7,700 • Two 1.5 GHz (6 MB cache) makes the system cost ~$17,700 • 1.4 GHz vs. 1.5 GHz • ~7% slower • ~130% cheaper

  21. Opteron • IBM eServer 325 • Two 2.0 GHz Opteron 246 • 8 Gflops peak • 2 GB memory • 36 GB disk • $4,539 • Two 2.4 GHz CPUs: $5,691 • 2.0 GHz vs. 2.4 GHz • ~17% slower • ~25% cheaper

  22. Pentium 4 Xeon • HP DL140 • Two 3.06 GHz CPUs • 12 Gflops peak • 2 GB memory • 80 GB disk • $2,815 • Two 3.2 GHz: $3,368 • 3.06 GHz vs. 3.2 GHz • ~4% slower • ~20% cheaper

  23. If You Had $100,000 To Spend On A Compute Farm

  24. What People Are Buying • Gartner study • Servers shipped in 1Q04 • Itanium: 6,281 • Opteron: 31,184 • Opteron shipped 5x more servers than Itanium

  25. What Are People Buying • Gartner study • Servers shipped in 1Q04 • Itanium: 6,281 • Opteron: 31,184 • Pentium: 1,000,000 • Pentium shipped 30x more than Opteron

  26. Interconnects

  27. Interconnects • Ethernet • Most prevalent on clusters • Low-latency interconnects • Myrinet • Infiniband • Quadrics • Ammasso

  28. Why Low-Latency Interconnects? • Performance • Lower latency • Higher bandwidth • Accomplished through OS-bypass

  29. How Low Latency Interconnects Work • Decrease latency for a packet by reducing the number memory copies per packet

  30. Bisection Bandwidth • Definition: If split system in half, what is the maximum amount of data that can pass between each half? • Assuming 1 Gb/s links: • Bisection bandwidth = 1 Gb/s

  31. Bisection Bandwidth • Assuming 1 Gb/s links: • Bisection bandwidth = 2 Gb/s

  32. Bisection Bandwidth • Definition: Full bisection bandwidth is a network topology that can support N/2 simultaneous communication streams. • That is, the nodes on one half of the network can communicate with the nodes on the other half at full speed.

  33. Large Networks • When run out of ports on a single switch, then you must add another network stage • In example above: Assuming 1 Gb/s links, uplinks from stage 1 switches to stage 2 switches must carry at least 6 Gb/s

  34. Large Networks • With low-port count switches, need many switches on large systems in order to maintain full bisection bandwidth • 128-node system with 32-port switches requires 12 switches and 256 total cables

  35. Myrinet • Long-time interconnect vendor • Delivering products since 1995 • Deliver single 128-port full bisection bandwidth switch • MPI Performance: • Latency: 6.7 us • Bandwidth: 245 MB/s • Cost/port (based on 64-port configuration): $1000 • Switch + NIC + cable • http://www.myri.com/myrinet/product_list.html

  36. Myrinet • Recently announced 256-port switch • Available August 2004

  37. Myrinet • #5 System on Top500 list • System sustains 64% of peak performance • But smaller Myrinet-connected systems hit 70-75% of peak

  38. Quadrics • QsNetII E-series • Released at the end of May 2004 • Deliver 128-port standalone switches • MPI Performance: • Latency: 3 us • Bandwidth: 900 MB/s • Cost/port (based on 64-port configuration): $1800 • Switch + NIC + cable • http://doc.quadrics.com/Quadrics/QuadricsHome.nsf/DisplayPages/A3EE4AED738B6E2480256DD30057B227

  39. Quadrics • #2 on Top500 list • Sustains 86% of peak • Other Quadrics-connected systems on Top500 list sustain 70-75% of peak

  40. Infiniband • Newest cluster interconnect • Currently shipping 32-port switches and 192-port switches • MPI Performance: • Latency: 6.8 us • Bandwidth: 840 MB/s • Estimated cost/port (based on 64-port configuration): $1700 - 3000 • Switch + NIC + cable • http://www.techonline.com/community/related_content/24364

  41. Ethernet • Latency: 80 us • Bandwidth: 100 MB/s • Top500 list has ethernet-based systems sustaining between 35-59% of peak

  42. Ethernet • What we did with 128 nodes and a $13,000 ethernet network • $101 / port • $28/port with our latest Gigabit Ethernet switch • Sustained 48% of peak • With Myrinet, would have sustained ~1 Tflop • At a cost of ~$130,000 • Roughly 1/3 the cost of the system

  43. Rockstar Topology • 24-port switches • Not a symmetric network • Best case - 4:1 bisection bandwidth • Worst case - 8:1 • Average - 5.3:1

  44. Low-Latency Ethernet • Bring os-bypass to ethernet • Projected performance: • Latency: less than 20 us • Bandwidth: 100 MB/s • Potentially could merge management and high-performance networks • Vendor “Ammasso”

  45. Application Benefits

  46. Storage

  47. Local Storage • Exported to compute nodes via NFS

  48. Network Attached Storage • A NAS box is an embedded NFS appliance

  49. Storage Area Network • Provides a disk block interface over a network (Fibre Channel or Ethernet) • Moves the shared disks out of the servers and onto the network • Still requires a central service to coordinate file system operations

  50. Parallel Virtual File System • PVFS version 1 has no fault tolerance • PVFS version 2 (in beta) has fault tolerance mechanisms

More Related