160 likes | 170 Views
Benchmark software for HPC systems. Kei Hiraki The University of Tokyo. My Position. Working in “Computer Architecture ” For me, Benchmark means SPEC CPU. My Position. Working in “Computer Architecture ” For me, Benchmark means SPEC CPU But SPEC CPUint of most supercomputers are small
E N D
Benchmark software for HPC systems Kei Hiraki The University of Tokyo
My Position • Working in “Computer Architecture” • For me, Benchmark means SPEC CPU
My Position • Working in “Computer Architecture” • For me, Benchmark means SPEC CPU But SPEC CPUint of most supercomputers are small except for Intel x86 HPC is different world for measuring benchmarks
Diversity of Supercomputers • 1 MFLOPS • 1964, CDC 6600 • ILP (Pipeline parallel)、Out of order、Score board • 1 GFLOPS • 1984, Cray XMP/4 • Vector architecture、SMP • 1 TFLOPS • 1997, ASCI-Red • Cluster computer of many MPU • 1 PFLOPS • 2008, IBM Roadrunner (Cell base) • GPGPU • On Chip Multi CPU, Huge parallelism
Development of Supercomputers(1964-2018) CDC6600 CDC IBM 360/67 Vector CDC7600 SIMD GPU Cluster Distributed Memory Shared Memory TI 1970 ASC STAR-100 Burroghs Fastest system at one time FPS Cray ILLIAC IV AP-120B Research/Special system Fujitsu Cray-1 CMU Hitachi C.mmp 230-75APU Denelcor ICL M180IAP HEP Cyber205 DAP 1980 Goodyear Cray-XMP Intel Cray Computer VP-200 S-810 MPP Cosmic Cube NEC Ncube Thinking Machines Cydrome Allient Encore SX-2 VP-400 CM-1 Ncube Cray-2 iPSC Multimax FX-8 Multiflow CMU WARP Sequent S-820 CM-2 FX800 ETA-10 IBM Cray-YMP Maspar RP3 CS-1 FX2800 VP-2600 1990 Fujitsu SX-3 MP-1 T.S.Delta KSR-1 QCD-PAX AP1000 Cray-C90 MP-2 CM-5 SUN SGI NWT CS6400 S-3800 Paragon SP1 Challenge Cray-3 AP3000 Cray T3D CS-2 Hitachi SX-4 Cray-T90 VPP700 Intel T3E SR-2201 Tera/Cray ASCI RED Origin2000 MMX SP2 Starfire MTA SX-5 U of Tokyo Cray-SV1 SR8000 2000 Sony/IBM VPP5000 SSE SP3 PrimePower Origin3800 PS2EE GRAPE-6 ASCI White QCDSP Regatta SUN Fire ES SSE2 GPGPU Cray-X1 SX-6 HPC2500 SSE3 SR11000 XT3 Altix G80 BG/L CELL Cray X2 SX-8 FX1 GTX280 XT5 Road runner BG/P SR16000 Fermi AVX XT6 SX-9 IA Clusters GRAPE-DR GPGPU K 2010 Tianhe1A Xeon/Phi Power7 Blue Waters 星雲 XK6 BlueWater BG/Q FX10 SR16000M1 XK7 Tianhe2 XC30 PEZYSC1 SX-ACE XC40 FX100 SUNWAY Xeon/Phi PEZYSC2 IBM XC50 XC50 Today POWER9 +GV100 SX-Aurora PEZYSC3 2020 PostK
History of Fastest supercomputers (1) Name Year to start LinpackperformancePeak performance • UNIVAC LARC 1960 (0.16Mflops) • IBM STRECH 1961 (0.3Mflops) • CDC-6600 1964 0.5Mflops (3 Mflops) *N=100Linpack • CDC-7600 1969 3.3Mflops (10 Mflops)*N=100 Linpack • TI ASC 1972 ~30 Mflops (64 Mflops) • ILLIAC IV 1975 ~40 Mflops (150 Mflops) • Cray-1 1976 110 Mflops (160 Mflops)*N=1000 Linpack • Cray-XMP4 1982 714 Mflops (800 Mflops) • SX-2 1985 885 Mflops (1.3Gflops) • Cray-2 1985 1.4Gflops (1.9Gflops) • CM-2 1987 2.4Gflops (5 Gflops) (ETA-10 1988 496 Mflops (9.1 G (single)/4.6G(double) /8proc 並列動作は不動 *N=1000 Linpack, 1 proc. 7ns • Every fastest supercomputer has its interesting drama. • Behind fastest supercomputers, there are numerous supercomputers that fail to become the world fastest
History of Fastest supercomputers (2) Name Year to start LinpackperformancePeak performance • SX-3/44R 1990 23.2Gflops (25.6Gflops) • CM-5 1993 60 Gflops (131 Gflops) • Fujitsu NWT 1993 124 Gflops (236 Gflops) • Intel Paragon XP 1994 143 Gflops (184 Gflops) • Fujitsu NWT 1994 170 Gflops (236 Gflops) • Hitachi SR-2201 1996 220 Gflops (307 Gflops) • Hitachi CP-PACS 1996 368 Gflops (614 Gflops) • Intel ASCI RED 1997 1.1Tflops ( 1.5Tflops) • IBM ASCI White 2000 4.9Tflops (12.4Tflops) • NEC ES 2002 35 Tflops (40.1Tflops) • IBM BlueGene/L 2004 71 Tflops (92 Tflops) • IBM Roadrunner 2008 1.0Pflops (1.4 Pflops) • Cray XT-5 2009 1.8Pflops (2.3 Pflops) • Tianhe-1A 2010 2.5 Pflops (4.7 Pflops) • K-computer 2011 10.5 Pflops (11 Pflops) • BlueGene/Q 2012 16 Pflops (20 Pflops) • Cray XK7 2012 17.6 Pflops (27 Pflops) • Tianhe-2 2013 33.9 Pflops (55 Pflops) • Sunway 2016 93.9 Pflops (125 Pflops) • IBMAC922+NVIDIAV100 2018 122.3 Pflops (188 Pflops)
Various Benchmarks • SPEC CPUint • We cannot submit papers without SPEC CPUint • Even Dhrystone is useful • Linpack • HPC linpack is not a bad benchmarks • Good for time-line comparison • HPCC • Too many result figures • Reduncant • DGEMM, FFT, Stream are useful • HPCG • Today’s topic
HPC Benchmarks • How can I compare apples and oranges? Distributed Memory Shared Memory Vectors GPGPUs SIMD
Simplest history of supercomputers • 1 MFLOPS • 1964, CDC 6600 • ILP (Pipeline parallel)、Out of order、Score board • 1 GFLOPS • 1984, Cray XMP/4 • Vector architecture、SMP • 1 TFLOPS • 1997, ASCI-Red • Cluster computer of many MPU • 1 PFLOPS • 2008, IBM Roadrunner (Cell base) • GPGPU • On Chip Multi CPU, Huge parallelism • 1 EFLOPS • 2022? • Special purpose accelerator?3D semiconductor? • 1 Zflops • 2038?? • Billion core?More specialized accelerator? 20 years 13 yesars 11 yesars 14 years 16years?
Today’s Topics • What is the best benchmarks for • Exa flops developments • Zetta flops development • Comparison to Quantum computers
Why XXX/Rpeak important Ratio improves when CPU has less FPUs Now area for FPU is not a major factor of CPU
Return to simplicity • Weighted means of • DGEMM • STREAM • FFT • Selection of Weight is the problem OR
Return to simplicity • Weighted means of • Linpack • HPCG • FFT
Purpose of Benchmark software • Characterization of the system • Balance of system components • Proof of improvements • Evidences for purchase decisions • Performance / Cost • Performance / Power • Time-line comparison • Single number v.s. Multiple number