120 likes | 272 Views
PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER. Guanglei Liu Department of Electrical and Computer Engineering Florida International University July 12, 2012 Major Professor: Dr. Gang Quan. Thermal Design Challenges. Number of transistors keeps increasing.
E N D
PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER Guanglei Liu Department of Electrical and Computer Engineering Florida International University July 12, 2012 Major Professor: Dr. Gang Quan
Thermal Design Challenges Number of transistors keeps increasing • Nearly 40 billon transistors are integrated into single die [Mizunuma, 2009 ICCAD] More complicated architectures are built • 80 core single chip processor has been demonstrated by Intel [Vangal, 2007 ISSCC] Figure from Intel Microprocessor Technology Lab, 2011 High transistor density increases power density Environmental concerns Electric Bill • U.S. Datacenters: 120 billon kilowatt hours in 2012 • 9 billion dollar, 15% of all energy in U.S. • In U.S, 46% of electricity is generated by fossil fuels. Source: Environmental Protection Agency (EPA) Report High power density brings up the on-chip temperatures and causes thermal issues
Thermal Issues Computing system cooling solutions Mechanical Cooling Solution High cooling cost Dynamic Thermal Management (DTM) • Dynamic voltage and frequency scaling (DVFS) technique[Kim, HPCA 2008] • Task migration [Lim QED 2002] • Clock gating [Gunther, ITJ 2001] • Fetch toggling [Brooks, HPCA 2001] Air-cooling (e.g. fan + heat sink) Affect reliability Increase package/cooling costs • Cooling cost takes 51% of overall server power budget [Lefurgy, COM 2003] • Noise level increases 10dB as fan speed increases by 50% [Lyon, STMMS 2004] • As much as 50% reduction of device’s life span for every 10oC increase [Yeo, DAC 2008] Sacrifice system performance • 1-3 dollar per watt [Skadron, ICSA 2003] • Data center, each watt on computing, ½ - 1 watt for cooling[Brill, 2007] Degrade performance Increase Leakage power consumption Crush the computing system Liquid-cooling • Processor’s self-protect mechanism automatically shuts down processor to avoid physical damage [Rohou, WFDO 1999] • 10-15% more circuit delay for each 15oC increase [Santarini, EDN 2005] • Temperature from 65oC to 110oC can increase the leakage power by 38% for IC circuits.[Santarini, EDN 2005] • High density liquid absorb 3500 times more heat than air [Chu, DMR 2004]
Related Theoretical Work Thermal-aware throughput maximization Peak temperature minimization [Chantem et al., ISLPED 2009] [Zhang et al., ICCAD 2007] [Chatha et al., DAC 2010] [Chaturvedi et al., ASPDAC 2011] [Liu et al., RTAS 2010] [Qiu et al., ICESS 2010] Overall energy reduction under peak temperature constraints Real-time guarantee under peak temperature constraint [Bao et al., DATE 2010] [Andrei et al., DAC 2009] [Huang et al., DATE 2011] [Chaturvedi et al., CIT 2010] [Wang et al., RTS 2006] [Huang et al., RTSS 2009] Those theoretical work are derived based on simplified mathematical thermal models and idealized assumptions Our Research Goal: To develop up a practical hardware platform that enables us to investigate the limitations of the existing theoretical work, and develop practical and effective DTM techniques to accommodate those limitations
Major contributions Practical hardware platform • Intel i5 Quad core • Linux operating system • [SouthEast 2011] Proactive DTM algorithm Multi-core Thermal management validation Reactive DTM Single-core • Limitations of theoretical works • Non-constant sampling period • Thermal profiling analysis • DTM techniques VS air-cooling • DTM vs DPM algorithm • Fundamental DTM principles validation • Neighbor-aware temperature prediction • Algorithm for multicore with task migration • [SUSCOM 2012] • [DATE 2012] • [ASP2012] • [GreenCom 2012]
Practical Hardware Platform SPEC CPU2000 Benchmark Dell Precision T1500 workstation Integers and floating point operations Linux kernel version of 2.6.23 Cpufreqmodule CPU_affinitymodule SPEC Benchmark DVFS technique DVFS technique Taskmigration Migrate process between cores 12 different speed levels Power measurement DVFS Technique Fluke current clamp,Multimeter CoreTempdriver DVFS technique Intel i5 quad core Temperature capturing Cooling/ CPU power consumption Read on-chip thermal sensor Fan Speed Control Fan control Fancontrolshell script Computing system hardware monitoring tool Lm-sensors Tool Manually adjust fan speed Monitor system information Voltage value Temperature value Fan Speed
Our Approach Enhanced reactive DTM (ERDTM) Buffer zone and safe region is maximum possible temperature increment 4oC Buffer zone: Offline thermal profiling analysis • Build up a temperature vs. speed lookup table • Run benchmarks with different speed levels • Collect corresponding peak temperatures Temperature TURESHOLD T Buffer zone Tsafe Safe region: Safe region Time
Experimental results Experiment setup DTM algorithm Performance evaluation • Four identical tasks assigned to four cores to simulate single-core environment • Temperature threshold is 55oC • Construct the lookup table offline Frequency lookup table ERDTM average throughput improvement is 8.1% FSDTM algorithm VS-DTM algorithm ERDTM algorithm Number of violations 87 Number of violations 12 Number of violations 0
Neighbor-aware temperature prediction Individual increment factor Training process Neighbor increment factor Processor temperature increment Heat transfer from neighbor processor Obtained offline Run the tasks and record temperature information Our Neighbor-aware prediction where and are weights, which are obtained by collecting training data Apply least-square estimation
Neighbor-aware Task Migration NADTM Algorithm Conventional approach: • Always migrate task from hottest core to the coolest core. Predict thermal emergency Increasing factor: to evaluate the temperature increment Heatfactor: to evaluate the processor hotness Our migration strategy choose the migration candidate with the minimum Migrate task DVFS technique
Performance analysis • NADTM algorithm can effectively control the temperature under the threshold • It has a small temperature oscillation of 1oC Multiple task Single task An average of 5.8% overall throughput improvement An average of 3.6% overall throughput improvement
Journals Guanglei Liu, M. Fan, G. Quan, M. Qiu “On-Line Predictive Thermal Management under Peak Temperature Constraints for Practical Multi-core Platforms”, Journal of Low Power Electronics (ASP). (under review), 2012. Guanglei Liu, G. Quan, M. Qiu “Practical Dynamic Thermal Management on An Intel Desktop Computer ” , Embedded Software Design, Journal of Sustainable Computing (SUSCOM) (under review), 2012. H. Huang, V. Chaturvedi, Guanglei Liu, G. Quan, ”Leakage Aware Scheduling On Maximum Temperature Minimization For Periodic Hard Real-Time Systems”, Journal of Low Power Electronics (ASP), 2012. Thank You for Your Attention ! Peer Reviewed Conferences Guanglei Liu, M. Fan, G. Quan, “Neighbor-Aware Dynamic Thermal Management for Multi-core Platform”, The 15th Design, Automation, and Test in Europe (DATE 2012), Dresden, Germany, March 12-16, 2012. Guanglei Liu, G. Quan, M. Qiu, “The Practical On-line Scheduling for Throughput Maximization on Intel Desktop Platform under the Maximum Temperature Constraint“, The 2011 IEEE/ACM Green Computing and Communications (GreenCom 2011), Sichuan, China, August 4-5, 2011. Guanglei Liu, G. Quan, ”Thermal Aware Scheduling on an Intel Desktop Computer,” IEEE SouthEast Conference (SouthEast 2011), Nashville, Tennessee, March 17-20, 2011. Guanglei Liu, J. Fan, “Framework for Statistical Analysis of Homogeneous Multi- core Power Grid Networks“, IEEE 8th International Conference on ASIC (ASICON 2009), Changsha, China, October 20-23, 2009. C. Liu, J. Tan, R. Chen, Guanglei Liu, J. Fan, “Thermal Aware Clocktree Optimization in Nanometer VLSI Systems Considering Temperature Variations“, IEEE 40th Southeastern Symposium on System Theory (SSST 2008), New Orleans, LA, March 17-18, 2008.