1 / 88

ISHPC International Symposium on High-Performance Computing 26 May 1999

ISHPC International Symposium on High-Performance Computing 26 May 1999. Gordon Bell http://www.research.microsoft.com/users/gbell Microsoft. What a difference spending >10X/system & 25 years makes!. 40 Tflops ESRDC c2002 (Artist’s view). 150 Mflops CDC 7600+ Cray 1 LLNL c1978.

denim
Download Presentation

ISHPC International Symposium on High-Performance Computing 26 May 1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISHPCInternational Symposium on High-Performance Computing26 May 1999 Gordon Bell http://www.research.microsoft.com/users/gbell Microsoft

  2. What a difference spending >10X/system & 25 years makes! 40 Tflops ESRDC c2002 (Artist’s view) 150 Mflops CDC 7600+ Cray 1 LLNL c1978

  3. Time $M structure example 1950 1 mainframes many... 1960 3 instruction //sm IBM / CDC mainframe SMP 1970 10 pipelining 7600 / Cray 1 1980 30 vectors; SCI “Crays” 1990 250 MIMDs: mC, SMP, DSM “Crays”/MPP 2000 1,000 ASCI, COTS MPP Grid, Legion Supercomputers(t)

  4. Supercomputing: speed at any price, using parallelism Intra processor Memory overlap & instruction lookahead Functional parallelism (2-4) Pipelining (10) SIMD ala ILLIAC 2d array of 64 pe vs vectors Wide instruction word (2-4) MTA (10-20) with parallelization of a stream MIMDs… multiprocessors… parallelization allows programs to stay with ONE stream SMP (4-64) Distributed Shared Memory SMPs 100 MIMD… multicomputers force multi-streams Multicomputers aka MPP aka clusters (10K) Grid: 100K

  5. Growth in Computational Resources Used for UK Weather Forecasting 1010/ 50 yrs = 1.5850 10T • 1T • 100G • 10G • 1G • 100M • 10M • 1M • 100K • 10K • 1K • 100 • 10 • YMP 205 195 KDF9 Mercury Leo • 1950 • 2000

  6. Talk plan • The very beginning: “build it yourself” • Supercomputing with one computer… the Cray era 1960-1995 • Supercomputing with many computers…parallel computing 1987- • SCI: what was learned? • Why I gave up to shared memory… • From the humble beginnings • Petaflops: when, … how, how much • New ideas: Now, Legion, Grid, Globus … • Beowulf: “build it yourself”

  7. Supercomputer: old definition(s) • In the beginning everyone built their own computer • Largest computer of the day • Scientific and engineering apps • Large government defense, weather, aero, laboratories and centers are first buyers • Price is no object: $3M … 30M, 50M, 150 … 250M • Worldwide market: 3-5, xx, or xxx?

  8. Supercomputing: new definition • Was a single, sequential program • Has become a single, large scale job/program composed of many programs running in parallel • Distributed within a room • Evolving to be distributed in a region andglobe • Cost, effort, and time is extraordinary • Back to the future: build your own super with shrink-wrap software!

  9. Manchester: the first computer. Baby, Mark I, and Atlas

  10. von Neumann computers: Rand JohniacWhen laboratories built their own computers

  11. Cray1925-1996seegbellhomepage

  12. CDC 1604 & 6600

  13. CDC 7600: pipelining

  14. CDC STAR… ETA10 Scalar matters

  15. Cray 1 #6 from LLNL.Located at The Computer Museum History Center, Moffett Field

  16. Cray 1 150 Kw. MG set & heat exchanger

  17. Cray XMP/4Proc.c1984

  18. A look at the beginning of the new beginning

  19. SCI (Strategic Computing Initiative)funded by DARPA and aimed at a Teraflops!Era of State computers and many efforts to build high speed computers… lead to HPCCThinking Machines, Intel supers,Cray T3 series

  20. Minisupercomputers: a market whose time never came. Alliant, Convex, Ardent+Stellar= Stardent = 0,

  21. Cydrome and Multiflow: prelude to wide word parallelism in Merced Minisupers with VLIW attack the market Like the minisupers, they are repelled It’s software, software, and software Was it a basically good idea that will now work as Merced?

  22. KSR 1: first commercial DSM NUMA (non-uniform memory access) aka COMA (cache-only memory architecture)

  23. Intel’s ipsc 1 & Touchstone Delta

  24. In Dec. 1995 computers with 1,000 processors will do most of the scientific processing. “ Danny Hillis 1990 (1 paper or 1 company) ”

  25. The Bell-Hillis BetMassive Parallelism in 1995 TMC World-wide Supers TMC World-wide Supers TMC World-wide Supers Applications Petaflops / mo. Revenue

  26. Thinking Machines: CM1 & CM5 c1983-1993

  27. Bell-Hillis Bet: wasn’t paid off! • My goal was not necessarily to just win the bet! • Hennessey and Patterson were to evaluate what was really happening… • Wanted to understand degree of MPP progress and programmability

  28. SCI (c1980s): Strategic Computing Initiative funded ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),

  29. Those who gave their lives in the search for parallellism Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC, Chen Systems, CHOPP, Cogent, Convex (now HP), Culler, Cray Computers, Cydrome, Dennelcor, Elexsi, ETA, E & S Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, KSR, MasPar, Multiflow, Myrias, Ncube, Pixar, Prisma, SAXPY, SCS, SDSA, Supertek (now Cray), Suprenum, Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Thinking Machines, Vitec, Vitesse, Wavetracer.

  30. What can we learn from this? • The classic flow: university research to product development worked • SCI: ARPA-funded product development failed. No successes. Intel prospered. • ASCI: DOE-funded product purchases creates competition • First efforts in startups… all failed. • Too much competition (with each other) • Too little time to establish themselves • Too little market. No apps to support them • Too little cash • Supercomputing is for the large & rich • … or is it? Beowulf, shrink-wrap clusters

  31. Humble beginning:In 1981…would you have predicted this would be the basis of supers?

  32. The Virtuous Economic Cycle that drives the PC industry Competition Volume Standards Utility/value Innovation

  33. Platform Economics 100000 10000 Price (K$) Traditional computers: custom or semi-custom, high-tech and high-touch New computers: high-tech and no-touch 1000 Volume (K) Applicationprice 100 10 1 0.1 0.01 WS Browser Mainframe Computer type

  34. Computer ops/sec x word length / $

  35. Intel’s ipsc 1 & Touchstone Delta

  36. GB with NT, Compaq, HP cluster

  37. The Alliance LES NT Supercluster “Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft • Andrew Chien, CS UIUC-->UCSD • Rob Pennington, NCSA • Myrinet Network, HPVM, Fast Msgs • Microsoft NT OS, MPI API 192 HP 300 MHz 64 Compaq 333 MHz

  38. Are we at a new beginning? “Now, this is not the end. It is not even the beginning of the end, but it is, perhaps, the end of the beginning.” 1999 Salishan HPC Conference from W. Churchill 11/10/1942 “You should not focus NSF CS Research on parallelism. I can barely write a correct sequential program.” Don Knuth 1987 (to Gbell) “I’ll give a $100 to anyone who can run a program on more than 100 processors.”Alan Karp (198x?) “I’ll give a $2,500 prize for parallelism every year.”Gordon Bell (1987)

  39. Bell Prize and Future Peak Gflops (t) Petaflops study target CM2 XMP NCube

  40. 1989 Predictions vs 1999 Observations • Predicted 1 TFlops PAP 1995. Actual 1996. Very impressive progress! (RAP<1 TF) • More diversity =>less software progress! • Predicted: SIMD, mC (incl. W/S), scalable SMP, DSM, supers would continue as significant • Got: SIMD disappeared, 2 mC, 1-2 SMP/DSM, 4 supers, 2 mCv with one address space 1 SMP became larger and clusters, MTA, workstation clusters, GRID • $3B (unprofitable?) industry; 10+ platforms • PCs and workstations diverted users • MPP apps market DID/could NOT materialize

  41. Intel/Sandia: 9000 Pentium Pro LLNL/IBM: 488x8x3 PowerPC (SP2) LNL/Cray: 6144 P in DSM clusters Maui Supercomputer Center 512x1 SP2 U. S. Tax Dollars At Work. How many processors does your center have?

  42. ASCI Blue Mountain 3.1 Tflops SGI Origin 2000 12,000 sq. ft. of floor space 1.6 MWatts of power 530 tons of cooling 384 cabinets to house 6144 CPU’s with 1536 GB (32GB / 128 CPUs) 48 cabinets for metarouters 96 cabinets for 76 TB of raid disks 36 x HIPPI-800 switch Cluster Interconnect 9 cabinets for 36 HIPPI switches about 348 miles of fiber cable

  43. Half of LASL

  44. Comments from LLNL Program manager • Lessons Learned with “Full-System Mode” • It is harder than you think • It takes longer than you think • It requires more people than you can believe • Just as in the very beginning of computing, leading edge users are building their own computers.

  45. NEC Supers

  46. 40 Tflops Earth Simulator R&D Center c2002

  47. Fujitsu VPP5000 multicomputer:(not available in the U.S.) • Computing nodesspeed: 9.6 Gflops vector, 1.2 Gflops scalar primary memory: 4-16 GBmemory bandwidth: 76 GB/s (9.6 x 64 Gb/s) inter-processor comm: 1.6 GB/s non-blocking with global addressing among all nodesI/O: 3 GB/s to scsi, hippi, gigabit ethernet, etc. • 1-128 computers deliver 1.22 Tflops

  48. C1999 Clusters of computers. It’s MPP when processors/cluster >1000 Who ΣP.pap ΣP. P.pap ΣP.pap/CΣp/.C ΣMp./C ΣM.s T.fps #.K G.fps G.fps # GB TB LLNL 3.9 5.9 .66 5.3 8 2.5 62(IBM) LANL 3.1 6.1 .5 64 128. 32 76 (SGI) Sandia 2.7 9.1 .3 .6 2 -(Intel) Beowulf 0.5 2.0 4 Fujitsu 1.2 .13 9.6 9.6 1 4.-16 NEC 4.0 .5 8 128 16 128ESRDC 40 5.12 8 64 8 16

  49. High performance architecture/program timeline 1950 . 1960 . 1970 . 1980 . 1990 . 2000 Vtubes Trans. MSI(mini) Micro RISC nMicr Sequential programming---->------------------------------ <SIMD Vector--//--------------- Parallelization--- Parallel programming <--------------- multicomputers <--MPP era------ ultracomputers 10X in size & price!10x MPP “in situ” resources 100x in //sm NOW VLSC Grid

  50. Yes… we are at a new beginning! Single jobs, composed of 1000s of quasi-independent programs running in parallel on 1000s of processors (or computers). Processors (or computers) of all typesare distributed (I.e. connected) in every fashionfrom a collection using a single shared memoryto globally disperse computers.

More Related