1 / 46

Power-Aware Computing: Then and Now Margaret Martonosi Princeton University

Power-Aware Computing: Then and Now Margaret Martonosi Princeton University. I h appily acknowledge the contributions of my grad students, co-authors, and funding agencies to much of the work I’ll discuss today. . In one slide: Why care about power?.

bozica
Download Presentation

Power-Aware Computing: Then and Now Margaret Martonosi Princeton University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power-Aware Computing:Then and NowMargaret MartonosiPrinceton University I happily acknowledge the contributions of my grad students, co-authors, and funding agencies to much of the work I’ll discuss today.

  2. In one slide: Why care about power? • Battery Life: I want my cellphone battery to last longer! • Environment: Taken in aggregate, computing power usage has significant environmental impact. • Performance: Excessive power dissipation leads to thermal overload, and limits maximum chip or data center performance.

  3. Smartphone Power/Energy • Battery Life & Performance • CPU performance within phone is limited by power dissipation • Limit battery usage • Limit overheating • Aggregate Impact: • Apple iPhone 5: ~20,000 joules of energy in the battery. • 1 Smartphone: 1-5 kilowatt-hours of energy usage per year. • 3000 smartphones ≈ typical electric consumption of a US household. • 2016: 1 Billion smartphones on earth ≈ Electricity consumption of medium-sized city like Austin TX http://blog.opower.com/2012/09/

  4. Next consider Data Centers

  5. Environmental impact of data centers:If they were a country….. Carbon emissions of world-wide DCs [Mankoff’08] 34th 35th MMT/year

  6. Information and Communications Technology (ICT) responsible for 2.5% of US TOTAL carbon footprint Similar to transportation manufacturing sector Worldwide Data centers: 30 Billion Watts, equivalent to the output of 30+ large nuclear power plants Current trends: IT energy usage continues to grow Computing Energy Consumption Sources: GeSI and the Climate Group, “Global Smart 2020 report”, 2008; EIA Annual Energy Outlook 2008. NYTimes Sept, 2012.

  7. Within one computer…

  8. Components of Server Power Watts Components based on 2007 dual-core 2U server Source: Arstechnica, 2007

  9. Server Power Breakdown • Storage important, but CPUs more important • Idle power surprisingly large • Poor “Energy-Proportionality” • Want wattage to go to 0 when usage goes to 0.

  10. Energy Tradeoffs of Computation vs. Communication (Calculations vs. Data Movement…) Within one CPU? Distributed Systems?

  11. Within one CPU : The energy of calculation (adds, multiplies) is dwarfed by the energy of data motion Source: Steve Keckler keynote, MICRO 2011

  12. Smartphone Example: Computation vs. Communication • LTE smartphone with high-end CPUs: • Samsung Galaxy S4, iPhone 5, etc. • LTE Comm Energy per byte ≈ 1000X Computation energy per instruction Takeaway message: Computing locally on the phone can be huge power win vs. offloading computation to the cloud.

  13. In short, power matters a lot! Across all domains: Embedded->mobile->server->data center Across all system design levels: Devices->circuits->architecture->software

  14. History and Trends:Looking back, Looking forward

  15. Power is not a new problem… The ENIAC consumes 150 kilowatts… The power consumption may be broken up as follows; 80 kilowatts for heating the tubes 45 kilowatts for generating d.c. voltages, 20 kilowatts for driving the ventilator blower and 5 kilowatts for the auxiliary card machines. Source: Original ENIAC press release. Feb, 1946

  16. Power is not a new problem… • Computers built from different building blocks over the decades: • Vacuum tube… relays… Bipolar transistors… • Older technologies all reached points where their power was excessive. • Previous response: Find a new technology and switch to it. • But now, we don’t have something new to switch to! Chu et al. SEMITHERM 1999

  17. 1999: Moore’s Law and Power dissipation (Graph courtesy Fred Pollack, Intel) 1999 MICRO Keynote talk by Fred Pollack, Intel The Good News: 2X Transistor counts every 18 months

  18. 1999: Moore’s Law and Power dissipation (Graph courtesy Fred Pollack, Intel) The Bad News: To get uniprocessor performance improvements from these, CPU Power density increased as well… What to do?

  19. Responding to the Power Challenge:The Processor-level View

  20. Architecture’s Response to the CPU Power Challenge • 1990’s: The Wakeup Call • Previously, power issues dealt with at lower device and circuit levels. • Response: Clock Gating, Power Gating, Bitwidth Optimizations, Instruction Speculation Control. Example: If multiplying two “narrow” operands then disable upper parts of 64-bit multiplier to save power. [Brooks, Martonosi. HPCA 1999]

  21. Architecture’s Response to the CPU Power Challenge • 1990’s: The Wakeup Call • Previously, power issues dealt with at lower device and circuit levels. • Response: Clock Gating, Power Gating, Bitwidth Optimizations, Instruction Speculation Control. Example: If multiplying two “narrow” operands then disable upper parts of 64-bit multiplier to save power. [Brooks, Martonosi. HPCA 1999]

  22. Architecture’s Response to the CPU Power Challenge • 1990’s: The Wakeup Call • Previously, power issues dealt with at lower device and circuit levels. • Response: Clock Gating, Power Gating, Bitwidth Optimizations, Instruction Speculation Control. for (i=0; i<N; i++) { … } Easy-to-predict branching => Speculatively execute past branch Example: Don’t speculate past hard-to-predict branches. Automatically assess “predictor confidence”. if (random() < threshold) { … } Hard-to-predict branch => Wait for branch resolution to save power [Manne, Klauser, Grunwald, ISCA 1998]

  23. Architecture’s Response to the CPU Power Challenge • 2000-2005: Model, Measure, Mitigate… • Examples: Wattch, SimplePower, PowerTimer • Goal: Early-stage power assessments to guide early architectural choices. Example: Parameterized Module-level Power Estimates in Wattch. [Brooks, Tiwari, Martonosi. ISCA 2000]

  24. Architecture’s Response to the CPU Power Challenge • 2000-2005: Model, Measure, Mitigate… • More Examples: Cache Decay and other unit-by-unit power optimizations Cache Decay: Predict cache data that is likely to never be used again, and “turn it off”. If prediction is correct, save static leakage energy. If incorrect, fetch from next memory hierarchy level. Use competitive algorithm techniques to guide prediction choices. [ISCA 2001]

  25. Architecture’s Response to the CPU Power Challenge • Mid- to Late 2000’s: Industry transition to multi-core • Only way to maintain Moore’s law performance scaling without unmanageable heat/power • Intel’s “Right Hand Turn” [Danowitz, et al. April, 2012. ACM Queue. ]

  26. Architecture’s Response to the CPU Power Challenge • 2005-now: The Rise of Heterogeneous and Distributed Parallelism • Different CPUs on same chip: e.g. CPU+GPU • Specialized Accelerators: eg video

  27. Today, for many design targets:Parallelism = Power-Efficiency • Balances high performance and power-efficiency… • Mitigates design and management complexity But, not so simple… • How to program? • How to ensure performance portability across technology generations? • How to map applications across distributed collections of these?

  28. Responding to the Power Challenge:The System-level View

  29. Going Forward: A Systems-Level View 1) Think beyond the box *Systems* not just processors: multi-core, multi-chip, multi-location! 2) Managing Communication Energy matters 3) Specialization and Heterogeneity Solutions require better unification and tailoring from applications down to devices! Concepts apply both to chip, node or server design, as well as to broader distributed systems design.

  30. Tailoring Applications -> Devices for Power Efficiency Mobile Examples & Deployment Experiences

  31. The ZebraNet Project • Biology Goals:Fine-grained animal location tracking in rural areas with no cell coverage. • System Goals: Abide by stringent energy budget while gathering as detailed GPS data as possible. • Key idea: Collaboration between tracking collars reduces energy use 10-100X and improves data gathering. Tracking node with CPU, FLASH, radio and GPS Data Store-and-forward communications Data Data Base station (car or plane) Zhang et al. Sensys2004

  32. RAM Flash mem SFU Special fcn units SFU CPU CPU CPU CPU CPU CPU CPU CPU CPU Radio Radio Radio Act Actuators Sensors Sense Sense ZebraNet: Energy Issues • Node Design: • Architected for low-power. • Heterogeneous and parallel • Aggressive duty-cycling of GPS and radio • System Design: • Parallel and Distributed • Protocol Design: • Collar-to-Collartransfers at 10-100X energy savings over collar-to-base. • Data Compression: • Aggressive On-Collar Data Compression methods use CPU energy (cheap) to save radio energy (expensive!)

  33. ZebraNet: Summary • Project fundamentally-driven by energy constraints • Finest-grained zebra tracking ever. First night-time data. • 2 real-zebra deployments in Kenya + graduated great students, tech transfer to industry, etc.

  34. More recently: SignalGuru • Goal: Guide drivers regarding good speed and routes in order to maintain traffic flow. • No special lights. Infrastructureless… • Implemented: • In-vehicle cameras sense the traffic signal • Track stoplight red-green timings • Advise drivers on optimal speeds to avoid stop-go • Car-to-car info sharing • Tested in US and Singapore Koukoumidis, Peh, Martonosi. MobiSys 2011

  35. SignalGuru Design Options Take Photo • Compute: Which parts on phone vs. in cloud? • Communicate: With other nodes? With aggregator in the cloud? • Answers to these influence performance, accuracy, and energy! Image Processing Signal Detection Transition Filtering Merge info across cars Estimate Optimal Speed Share Estimate Across cars

  36. SignalGuru: Fully Local Option Take Photo Take Photo Image Processing Image Processing All compute on in-car phone No Communication. But, low accuracy Signal Detection Signal Detection Transition Filtering Transition Filtering Merge info across cars Estimate Optimal Speed Estimate Optimal Speed Share Estimate Across cars

  37. SignalGuru: Mostly Local Option Take Photo Take Photo Image Processing Image Processing Light communication to share info. Better accuracy. Signal Detection Signal Detection Transition Filtering Transition Filtering Merge info across cars Merge info across cars Estimate Optimal Speed Estimate Optimal Speed Share Estimate Across cars Share Estimate Across cars

  38. SignalGuru: (Almost) All in the Cloud Take Photo Take Photo Communication = High-res photo every 1-2s Image Processing Image Processing Signal Detection Signal Detection Almost all in cloud. Heavy communication! Transition Filtering Transition Filtering Merge info across cars Merge info across cars Estimate Optimal Speed Estimate Optimal Speed Share Estimate Across cars Share Estimate Across cars

  39. SignalGuru: Partial Cloud Option Take Photo Take Photo Image Processing Image Processing In-car Image analysis => 10X drop in Communication. Big energy and bandwidth savings! Signal Detection Signal Detection Transition Filtering Transition Filtering Merge info across cars Merge info across cars Estimate Optimal Speed Estimate Optimal Speed Share Estimate Across cars Share Estimate Across cars

  40. SignalGuru Design Questions Take Photo • Compute: Which parts on phone vs. in cloud? • Communicate: With other nodes? With aggregator in the cloud? • Answers to these influence performance, accuracy, and energy! • 10X improvement in radio energy if initial image filtering performed on phone. • Factor of N bandwidth savings if collaboration facilitated through aggregator node. • Singapore Deployment: • 5X improvement in accuracy if info collaboratively shared between nodes. • 20% vehicle fuel savings by abiding by SignalGuru Advisories Image Processing Signal Detection Transition Filtering Merge info across cars Estimate Optimal Speed Share Estimate Across cars

  41. Remaining research questions? • For many current applications, how can collaboration between mobile devices lead to better results? • How to dynamically split compute and communication between mobile device and cloud to optimize: latency, accuracy, energy? • How to build software frameworks to support high-performance, energy-efficient operation across distributed and highly-heterogeneous nodes?

  42. Where do we stand? • Power matters across the design spectrum: • Embedded->mobile->server->data center • Devices->circuits->architecture->software • We know some basic strategies: • Hardware: Voltage scaling, clock gating, parallelism, specialization • Software: Aggressive optimization and tailoring. • System: Compute locally to avoid radio power where possible • But the problem is still worsening

  43. Going forward: More work is needed! • Must view Power and Energy as first-class design Constraints, along with performance • Need new programming, architecture and system concepts to fuse heterogeneous and geographically distributed compute elements Performance Performance Applications Applications Power Power OS & Compilers OS & Compilers Performance Applications Power . . . Architecture Architecture OS & Compilers Circuits Circuits Architecture Devices Devices Circuits Devices

  44. Acknowledgments • Current & Former Students: • David Brooks, Gilberto Contreras, CanturkIsci, SibrenIsaacman, WenhaoJia, Philo Juang, Ting Liu, Dan Lustig, Chris Sadler, Logan Stafman, Yong Wang, OzlemBilgirYetim, YavuzYetim, Pei Zhang… • Other Collaborators: • Ramon Caceres, Steve Lyon, Dan Rubenstein, VivekTiwari…. • Funding: • NSF, DARPA, GSRC, C-FAR, Intel, AMD, Google…

  45. Thank you!

  46. Power Breakdown Source: Arstechnica, 2007

More Related