170 likes | 382 Views
Accelerators. Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski. (research)(motivation) questions. We love accelerators, but… What accelerators ? What workload? What “killer applications” ? Why study / develop them? Who needs them?
E N D
Accelerators Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski
(research)(motivation) questions • We love accelerators, but… • What accelerators ? • What workload? What “killer applications” ? • Why study / develop them? • Who needs them? • What architecture(s) ? • What goals are we seeking to fulfill ? • In addition to winning ICRI-CI research grants
Why accelerators? • Semiconductor industry sells $300B/year (10% INTC) • 1M high profit chips/day • $100/chip, $100M/day. Mostly CPU. • 10% of revenues. 100-1000% gross profit • 90M low cost chips/day • $10/chip, $900M/day. 50% gross profit • Growth < 10% • In the year 2023? • Need to expand into another rich industry • Store-and-compute accelerators will be the driver
Which industry is • Rich • Much richer than semiconductors • Under-utilized • Begs for progress (and can pay for it) • Critical, will not disappear • Video? Entertainment? Communication?
Health Care • $2.5 Trillion in US alone • Already 10x the entire global semiconductor industry • $4.5T by 2020 • Global is probably 3X, $15T by 2020 • Key challenge: • Today: imprecise, statistics-based diagnosis and treatment • Develop into more efficient, more successful discipline by combining science & computing
Future health care is computerized (store and compute) • Medical/health data about 10B people • Genomics, proteomics (5 GB/person) • Health & medical record (1 GB/person) • Continuous accumulating readings of sensors(4 GB/person) • Medical, environmental, food & drugs • Monitor and process all individuals • Machine learning • Predict and alert medical conditions • Individualize drugs, diets, treatments
Storage required • 10 GB/person • 10B people • 1020 Bytes (100 ExaBytes, 100 Mega-TeraBytes) • 100 million of today’s 1 TBytes disk. 100+ data centers • 500 MegaWatts to store, read and write • $350 Million / year
Computing required • Run through 50% of data each day • Perform 10 op / byte • 1021 OP/day = 1016 OP/sec • Only 10M cores of 1 GOPS each • 100 data centers • Power: only 10 MegaWatt • 2% of storage power
Solution: move computing closer to data • The HMC industry already makes the first step • 100,000 TSV vertical interconnects
Not yet there • Wish to get closer: stack memory on top CPU ? • NO. Too hot • CPU operates above 100ºC • DRAM is useless above 85ºC • Solution • Dispose of the CPU • Create 3D low-power (low temperature), uniform-power-density, high-performance store & compute machine
NVM NVM NVM NVM NVM NVM Store & Compute NVM NVM DRAM+SRAM NVM NVM NVM NVM • 1 Tbyte / chip in 2020 • Combined DRAM + NVM • Accelerators • 1000 cores “many-core” • MIMD • Associative Processors • SIMD • Internal + external networks NVM 3D Accelerator p-m NOC p-m NOC p-m NOC p-m NOC p-m NOC p-m NOC p-m NOC 2D Accelerator NOC
Challenges 5 mm • Need 100M chips • Max 0.1 W / chip • Total 10 MWatt • 100-1000 data centers 20 mm 20 mm NVM p-m NOC p-m NOC p-m NOC 2D Accelerator NOC 500 chips 50 Watt
More challenges • Understand workload • Understand algorithms • Architect the store & compute accelerators • Low lowlow power • High (data-intensive) performance
Approaches • Associative processors • Classic store & compute • Uniform power distribution • Massive parallelism • Very low power • Orthogonal access SIMD processors • Sequential and parallel access • Mitigate data-movement bottleneck
Approaches • Average case computing • ALU that runs faster than worst case • And dissipates less power than worst case • Enables low power just-in-time architecture • Personalized vision/graphics for personal mobile devices • Inspires workload understanding • Memristive processors and resistive memories • Presented by Yuval Cassuto