1 / 13

Interconnects for more than MPI

Interconnects for more than MPI. David Greenberg (special thanks to Duncan Roweth of Quadrics and Bill Carlson of CCS). Requirements. Last five years Low latency - aim for 10 us High Bandwidth - aim for 100 MB/s Support MPI - ad hoc (eg. collective support) Need to start aiming for

kenda
Download Presentation

Interconnects for more than MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interconnects for more than MPI David Greenberg (special thanks to Duncan Roweth of Quadrics and Bill Carlson of CCS)

  2. Requirements • Last five years • Low latency - aim for 10 us • High Bandwidth - aim for 100 MB/s • Support MPI - ad hoc (eg. collective support) • Need to start aiming for • Support for low overhead protocols (eg. UPC) • Low latency + overhead - aim for 1us + 100ns • Scalable Bandwidth - match memory bus on node and full bisection bandwidth

  3. Design suggestions • NIC does VM translation (possibly aided by connection table) • Load-store model: • extend write-queue • expand TLB span • support split transaction loads • Error check network, retry once, signal app • Source routing

  4. Available today • T3e: meets most needs, vendor specific, discontinued, some scaling issues • ASCI red: good for MPI, scales well, no RDMA, vendor specific, discontinued • Myrinet: good bandwidth, vendor neutral, potential for custom mods, some scaling issues, overhead and latency so-so with GM • VIA: not ready for HPC • Quadrics: see next slide

  5. Quadrics in my humble opinion • Design is on right track • Bandwidth fine • Latency limitted by PCI-bus • Overhead very good • Scalability currently suspect but fixable • Currently tied to Sun, Compaq but Linux port coming soon. • Reliability, availability, servicability good • Price remains to be seen.

  6. Quadrics: Performance Overview • Line rate 100 Mbytes/s • Peak data rate (adapter memory) 340 Mbytes/s • Compaq DS20 33Mhz/64bit PCI 200 Mbytes/s • Sun E450 66Mhz/64bit PCI 165 Mbytes/s • DMA write  32bytes 2.5s • MPI send 5s

  7. UPC on Quadrics: early numbers Oneway: 0  1,2  3. Twoway: 01  2  3  0. Saturation bandwidth = 169MB/s one-way, 68MB/s two-way

  8. UPC on T3E as comparison Production speeds on large (>100 node machine). Times in nsec per word transferred.

  9. Quadrics: Latency and bandwidth

  10. Breaking down the latency Reducing latency Architectural improvements Main clock speed Adapter clock speed PCI performance Don’t expect much Direct connect Network Performance

  11. Bi-Sectional Bandwidth • Full fat tree Order(N) bi-sectional bandwidth • 340 Mbytes/sec/link in each direction • PCI bridge critical your mileage may vary • Headroom in the network reduces contention

  12. Operating Systems Solaris 2.6 Digital Unix 5.0 Applications development Standard compilers MPI, CRAY-Shmem Totalview, Vampir Quadrics: Software Overview

  13. Quadrics: Linux Status & Plans • Under development with key customer • Elan driver and libraries running, IP running, RMS port in progress • GPL the base Elan driver and kernel changes • URL of Linux Driver coming soon. • Quadric’s list of Linux modifications necessary • Elan specific: VM hooks to map Elan command port, Callbacks notifying Elan of PTE load/unload. • Generic: fork/exec/exit callbacks, system calls in loadable modules

More Related