1 / 6

Energy Efficient For Cloud Based GPU Using DVFS With Snoopy Protocol

GPU design trends show that the register file size will continue to increase to enable even more thread level parallelism. As a result register file consumes a large fraction of the total GPU chip power. It explores register file data compression for GPUs to improve power efficiency. Compression reduces the width of the register file read and writes operations, which in turn reduces dynamic power. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread registers is small. Compression exploits the value similarity by removing data redundancy of register values. Without decompressing operand values some instructions can be processed inside register file, which enables to further save energy by minimizing data movement and processing in power hungry main execution unit. Evaluation results show that the proposed techniques save 25 of the total register file energy consumption and 21 of the total execution unit energy consumption with negligible performance impact. Performance and energy efficiency are major concerns in cloud computing data centers. More often, they carry conflicting requirements making optimization a challenge. Further complications arise when heterogeneous hardware and data center management technologies are combined. For example, heterogeneous hardware such as General Purpose Graphics Processing Units GPGPUs improved performance at the cost of greater power consumption while virtualization technologies improve resource management and utilization at the cost of degraded performance. Prof. Avinash Sharma | Anchal Pathak "Energy Efficient For Cloud Based GPU Using DVFS With Snoopy Protocol" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: https://www.ijtsrd.com/papers/ijtsrd18412.pdf Paper URL: http://www.ijtsrd.com/computer-science/computer-network/18412/energy-efficient-for-cloud-based-gpu-using-dvfs-with-snoopy-protocol/prof-avinash-sharma<br>

ijtsrd
Download Presentation

Energy Efficient For Cloud Based GPU Using DVFS With Snoopy Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. International Journal of Trend in International Open Access Journal International Open Access Journal | www.ijtsrd.com International Journal of Trend in Scientific Research and Development (IJTSRD) Research and Development (IJTSRD) www.ijtsrd.com ISSN No: 2456 ISSN No: 2456 - 6470 | Volume - 2 | Issue – 6 | Sep 6 | Sep – Oct 2018 Energy Efficient For For Cloud Based GPU Using DVFS Snoopy Protocol Prof. Avinash Sharma1, Anchal Pathak2 Associate Professor,2PG Scholar sing DVFS With Prof. Avinash Sharm 1Associate Professor Department of CSE Department of CSE, Millennium Institute of Technology and Science Bhopal,Madhya Pradesh, India nd Science, ABSTRACT GPU design trends show that the register file size will continue to increase to enable even more thread level parallelism. As a result register file consumes a large fraction of the total GPU chip power. It explores register file data compression for GPUs to improve power efficiency. Compression reduces the width of the register file read and writes operations, which in turn reduces dynamic power. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread registers is small. Compression exploits the value similarity by removing data redundancy of register values. Without decompressing operand values some instructions can be processed inside register file, which enables to further save energy by minimizing data movement and processing in power hungry main execution unit. Evaluation results show that the proposed techniques save 25% of the total register file energy consumption and 21% of the total execution unit energy consumption with negligible performance impact. Performance and energy efficiency are major concerns in cloud computing data centers. More often, they carry conflicting optimization a challenge. Further complications arise when heterogeneous hardware and data center management technologies example, heterogeneous hardware such as General Purpose Graphics Processing Units (GPGPUs) improved performance at the cost of greater power consumption while virtualization improve resource management and utilization at the cost of degraded performance. Keyword: Redundancy, General Purpose Graphics Processing Units. 1.INTRODUCTION Distributed computing is picking up fame because of economies of scale and high adaptability. In any case, because of incessant equipment invigorating and substitutions, the basic heterogeneous. Heterogeneity results from various processor designs, memory, stockpiling, control administration, and stage structures. All the more as of late, high-throughput models like GPUs are turning into a staple in distributed computing offices and in this manner adding another level of heterogeneity to the server farm. In planning, critical wasteful aspects can result if heterogeneity isn't thought about since workloads run all the more productively on equipment with coordinating qualities. Be that as it may, heterogeneity in cloud server farms isn't the ma worry with new designs; for instance, GPU control utilization is more noteworthy contrasted with multi center models. GPUs expend generous measure of intensity as an end-result of execution, in this manner raising worry about power and vitality utilizat server farms. It has been evaluated that server farms in the US expended 100 billion kWh in 2011 at a cost of $7.4 billion every year. An ongoing report demonstrated that overall utilization for 2010 surpassed 250 billion kWh, just about 1.5% of th world's aggregate power utilization [16]. To grow administrations and bolster developing applications, huge endeavors are being applied to move conventional private High Performance Computing (HPC) to the cloud. Developing applications like remote workstations and cloud gaming are cases of GPU, GPU, Energy Energy Consumption, Consumption, Cloud Cloud Data Data GPU design trends show that the register file size will Redundancy, General Purpose Graphics Processing Units. Compression, Compression, Computing, Computing, even more thread level parallelism. As a result register file consumes a large fraction of the total GPU chip power. It explores register file data compression for GPUs to improve power efficiency. Compression reduces the width of nd writes operations, which in turn reduces dynamic power. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread ompression exploits the value similarity by removing data redundancy of register values. Without decompressing operand values some instructions can be processed inside register file, which enables to further save energy by minimizing essing in power hungry main execution unit. Evaluation results show that the proposed techniques save 25% of the total register file energy consumption and 21% of the total execution unit energy consumption with negligible performance Distributed computing is picking up fame because of economies of scale and high adaptability. In any case, because of incessant equipment invigorating and substitutions, the basic heterogeneous. Heterogeneity results from various designs, memory, stockpiling, control administration, and stage structures. All the more as throughput models like GPUs are turning into a staple in distributed computing offices and in this manner adding another level of heterogeneity to server farm. In planning, critical wasteful aspects can result if heterogeneity isn't thought about since workloads run all the more productively on equipment with coordinating qualities. Be that as it may, heterogeneity in cloud server farms isn't the main worry with new designs; for instance, GPU control utilization is more noteworthy contrasted with multi- center models. GPUs expend generous measure of result of execution, in this manner raising worry about power and vitality utilizations in server farms. It has been evaluated that server farms in the US expended 100 billion kWh in 2011 at a cost of $7.4 billion every year. An ongoing report demonstrated that overall utilization for 2010 surpassed 250 billion kWh, just about 1.5% of the world's aggregate power utilization [16]. To grow administrations and bolster developing applications, huge endeavors are being applied to move conventional private High Performance Computing (HPC) to the cloud. Developing applications like ations and cloud gaming are cases of design design winds winds up up and energy efficiency are major concerns in cloud computing data centers. More often, they carry conflicting optimization a challenge. Further complications arise when heterogeneous hardware and data center requirements requirements making making are combined. ombined. For For example, heterogeneous hardware such as General Purpose Graphics Processing Units (GPGPUs) improved performance at the cost of greater power consumption while virtualization improve resource management and utilization at the technologies technologies @ IJTSRD | Available Online @ www.ijtsrd.com www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct 2018 Oct 2018 Page: 133

  2. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 ?To lessen server control spending plan and acquire greatest execution, we propose to reallocate obtained capacity to GPU VMs. ?By joining the two strategies, we demonstrate that server control spending plan can be diminished while keeping up execution of multi 2.BACKGROUND Vitality and execution of parallel frameworks are an expanding worry for frameworks. Research has been produced because of this test pointing the maker of more vitality proficient frameworks. In this unique circumstance, this paper proposes streamlining strategies to quicken execution and increment vitality effectiveness of stencil application utilized in conjunction to calculation and GPU memory qualities. The enhancements we created connected to GPU calculations for stencil applications accomplish an execution change of up to 44.65% contrasted and the read-just form. (Matheus S. Serpa, Emmanuell D. Carreño, Jairo Panetta, Jean 2018) Vitality proficiency has turned out to be one of the best plan criteria for current processing frameworks. The Dynamic Voltage and Frequency Scaling (DVFS) have been broadly received by smart phones and cell phones to moderate vitality, while the GPU DVFS still at a specific early age. This paper goes for investigating the effect of GPU DVFS on the application execution and power utilization, and besides, on vitality protection. (Xinxin Meia, Qiang Wanga, Xiaowen Chua, 2017) This paper depicts blame to dispersed stencil calculation on cloud bunches. It utilizes pipelining to cover the information development with calculation in the radiance district and additionally parallelizes information development inside the GPUs. Rather than running stencil codes on customary bunches and calculation is performed on the Amazon Web Service GPU cloud, and uses its spot occasions to enhance cost-proficiency. (Jun Zhou, Yan Zhang and Weng Fai Wong, 2017) GPU configuration patterns demonstrate that the enroll document size will keep on increasing to empower considerably more string level parallelism. Thus enroll record expends an extensive portion of the aggregate GPU chip control. This paper investigates enroll document information pressure for GPUs to nroll document information pressure for GPUs to promising markets. The primary obstacles confronting these business sectors administration issues. The execution concern is because of virtualization overhead. Virtualization is fundamentally used to enhance usage in server farms through combination yet additionally has benefits in administration, security, Virtualization overhead is expected to the hypervisor (additionally called a Virtual Machine Monitor, VMM) that goes about as an interface amongst applications and the hidden equipment. Answers for enhance execution depend on bypassing the hypervisor to accomplish close local task [7]. Bypassing the hypervisor furnishes VMs with guide access to I/O and equipment and enhances execu altogether. This engineering enables a GPU to be appended or gone through to a VM with least punishment on execution. Cloud specialist organizations like Amazon Elastic Compute Cloud (EC2), Microsoft Windows Azure and Google Compute Engine are putti distributed computing frameworks at scales that have never been conceivable. For instance, Amazon EC2 furnishes clients with levels called occurrences custom fitted to the clients' necessities regarding virtual figure control, memory measure, stockpiling, organize transmission capacity, and bunch GPU cases. These assets are designated to the client straightforwardly, powerfully and on request. Be that as it may, in such conditions, noteworthy wasteful aspects can result if appli are not mapped to the best fitting stage. GPUs and virtualization advances have been very much examined and assessed independently. We consolidate those two advances to plan and execute, on genuine equipment, a system for upgrading virtual asset utilization and expanding vitality productivity of servers. Since assets are allotted powerfully, two issues are of concern: server control spending plan and provisioning of assets. By profiling differing cloud workloads for control utilization and execution, we propose strategies to relieve the impacts of utilizing GPUs in server farms. We make the accompanying commitments: ?We demonstrate that over-provisioning GPU VMs with virtual CPUs (VCPUs) does not enhance execution and squanders assets. ?To enhance execution applications, we propose to redistribute VCPUs to VMs that can profit by the extra assets. VMs that can profit by the extra assets. promising markets. The primary obstacles confronting these business sectors administration issues. The execution concern is because of virtualization overhead. Virtualization is nhance usage in server farms through combination yet additionally has benefits in administration, security, Virtualization overhead is expected to the hypervisor (additionally called a Virtual Machine Monitor, To lessen server control spending plan and acquire greatest execution, we propose to reallocate obtained capacity to GPU VMs. the two strategies, we demonstrate that server control spending plan can be diminished while keeping up execution of multi-strung VMs. are are execution execution and and and and live live relocation. relocation. Vitality and execution of parallel frameworks are an expanding worry for search has been produced because of this test pointing the maker of more vitality proficient frameworks. In this unique circumstance, this paper proposes streamlining strategies to quicken execution and increment vitality effectiveness of stencil on utilized in conjunction to calculation and GPU memory qualities. The enhancements we created connected to GPU calculations for stencil applications accomplish an execution change of up to 44.65% just form. (Matheus S. Serpa, nuell D. Carreño, Jairo Panetta, Jean-François, n interface amongst new new substantial substantial scale scale applications and the hidden equipment. Answers for enhance execution depend on bypassing the hypervisor to accomplish close local task [7]. Bypassing the hypervisor furnishes VMs with guide access to I/O and equipment and enhances execution altogether. This engineering enables a GPU to be appended or gone through to a VM with least Cloud specialist organizations like Amazon Elastic Compute Cloud (EC2), Microsoft Windows Azure and Google Compute Engine are putting forth distributed computing frameworks at scales that have never been conceivable. For instance, Amazon EC2 furnishes clients with levels called occurrences custom fitted to the clients' necessities regarding virtual figure control, memory measure, stockpiling, organize transmission capacity, and bunch GPU cases. These assets are designated to the client straightforwardly, powerfully and on request. Be that as it may, in such conditions, noteworthy wasteful aspects can result if applications are not mapped to the best fitting stage. GPUs and virtualization advances have been very much examined and assessed independently. We consolidate those two advances to plan and execute, on genuine equipment, a system for upgrading virtual asset ilization and expanding vitality productivity of servers. Since assets are allotted powerfully, two issues are of concern: server control spending plan administrations administrations and and Vitality proficiency has turned out to be one of the best plan criteria for current processing frameworks. The Dynamic Voltage and Frequency Scaling (DVFS) have been broadly received by smart phones and cell phones to moderate vitality, while the GPU DVFS is still at a specific early age. This paper goes for investigating the effect of GPU DVFS on the application execution and power utilization, and besides, on vitality protection. (Xinxin Meia, Qiang Wanga, Xiaowen Chua, 2017) This paper depicts blame tolerant structure for dispersed stencil calculation on cloud-based GPU bunches. It utilizes pipelining to cover the information development with calculation in the radiance district and additionally parallelizes information development er than running stencil codes on customary bunches and calculation is performed on the Amazon Web Service GPU cloud, and uses its spot occasions to enhance proficiency. (Jun Zhou, Yan Zhang and Weng- supercomputers, supercomputers, the the By profiling differing cloud workloads for control on, we propose strategies to relieve the impacts of utilizing GPUs in server farms. We make the accompanying commitments: provisioning GPU VMs GPU configuration patterns demonstrate that the enroll document size will keep on increasing to empower considerably more string level parallelism. Thus enroll record expends an extensive portion of the aggregate GPU chip control. This paper investigates with virtual CPUs (VCPUs) does not enhance execution of of multi multi-strung applications, we propose to redistribute VCPUs to @ IJTSRD | Available Online @ www.ijtsrd.com www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct 2018 Oct 2018 Page: 134

  3. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 U ≤ r c enhance control productivity. Pressure lessens the width of the enlist document read and composes tasks, which thusly decreases dynamic power. (Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Murali Annavaram and Won Woo Ro, 2016) 3.PROBLEM IDENTIFICATION The recognized issue in existing work is as per the following: 1.During read and compose process in enlist record, execution time is high, because of this reason GPU misfortune influence productivity. 2.The execution is corrupted and execution time is high, when pipeline stages progress toward becoming increment. 3.When pipeline stages increment then GPU speed might be diminished. Pipeline may enhance errand execution. 4.Recovery overhead high because of expan pipeline stages. 4.METHODOLOGY The method is Dynamic Voltage Frequency Scaling with Snoopy Protocol (DVFS-SP), which is described through following point. [b, f] = DVFS-SP (T) // b is block in task, f is frequency level in task and T is the set of task Input: Task set T with transform task period Output: Assign speed levels to h schedule Step 1: Whenever task complete early Step 2: Compute the critical speed Sc Speed () enhance control productivity. Pressure lessens the width of the enlist document read and composes tasks, which thusly decreases dynamic power. (Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Murali Step 3: if then assign S then assign Sc to the whole h 1 schedule ( ) 1 Step 4: Else Step 5: Now check 2 states of energy function in each cache Step 6: 1.Valid state: Block can be read and written 2.Invalid state: Block has no data Step 7: Write invalid data all other cache copy that is each task can have multiple simultaneous reader or block but write invali Step 8: Event can be issued from processor or observed on the bus through following signal PRd = Process Read, PWr = Process Write, BusRd = Bus Read, BusWr = Bus Write Bus Read, BusWr = Bus Write = − − × × ε = S × − c r c c U U P P n n ε c δ = − 1 c S S − 1 c c The recognized issue in existing work is as per the states of energy function in each cache During read and compose process in enlist record, execution time is high, because of this reason GPU misfortune influence productivity. execution is corrupted and execution time is high, when pipeline stages progress toward Valid state: Block can be read and written Invalid state: Block has no data data all other cache copy that can have multiple simultaneous reader or block but write invalid data them can be issued from processor or ed on the bus through following When pipeline stages increment then GPU speed might be diminished. Pipeline may enhance errand PRd = Process Read, PWr = Process Write, BusRd = Recovery overhead high because of expanded The method is Dynamic Voltage Frequency Scaling SP), which is described // b is block in task, f is frequency level in task and T Step 9: Assign Speed Sc of each to the interval Step 10: Assign Speed Sc of each to the interval Step 11: Update bi and fi of each task. Assign Speed Sc of each block (Valid State) 0, [0, [ P of each task. ] ] − δ kP Task set T with transform task period Assign speed levels to h schedule Assign Speed Sc of each block (Valid State) P − δ , P P k k k k using Critical Figure 1: Flowchart of Dynamic Voltage Frequency Scaling with Snoopy Protocol (DVFS Figure 1: Flowchart of Dynamic Voltage Frequency Scaling with Snoopy Protocol (DVFS Figure 1: Flowchart of Dynamic Voltage Frequency Scaling with Snoopy Protocol (DVFS-SP) @ IJTSRD | Available Online @ www.ijtsrd.com www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct 2018 Oct 2018 Page: 135

  4. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 5. The analysis performs on basis of radius and chunk size. The execution time of the pipeline stages are corresponding to piece size and range. We profiled the execution time of each phase with various lump sizes and sweeps on 27 hubs masterminded in a 3 x 3 x 3 network, each having an issue size of 512x512x1024. We took the profile of the hub at position (1; 1; 1) in the hub lattice as it has correspondence with each of the six neighbors. Table 1: Execution time (ms) of Copy between of FTCS [3] and DVFS-SP (Proposed) where r=1 CHUNK SIZE 32 1117 64 2450 128 6225 RESULTS AND ANALYSIS The analysis performs on basis of radius and chunk time of the pipeline stages are corresponding to piece size and range. We profiled the execution time of each phase with various lump sizes and sweeps on 27 hubs masterminded in a 3 x 3 x 3 network, each having an issue size of ofile of the hub at Figure 3: Analysis of execution time of Computation in between FTCS [3] and DVFS-SP Figure 3: Analysis of execution time of Computation in between FTCS (proposed) where radius = 1 (proposed) where radius = 1 position (1; 1; 1) in the hub lattice as it has correspondence with each of the six neighbors. Above figure demonstrates the execution time of each phase in Computation arrange with various lump estimate c and settled sweep of r = 1. From the outcome, we can see that the execution time is corresponding to lump estimate c. Table 3: Execution Time (ms) of Copy between of FTSC [3] and DVFS where r=1 Chunk Size FTSC[3] DVFS 32 140 64 251 128 478 Table 1: Execution time (ms) of Copy In Data in Above figure demonstrates the execution time of each phase in Computation arrange with various lump estimate c and settled sweep of r = 1. From the outcome, we can see that the execution time is corresponding to lump estimate c. SP (Proposed) DVFS DVFS- FTSC[3] SP(PROPOSED) 1089 2213 6081 6081 SP(PROPOSED) 1089 2213 Table 3: Execution Time (ms) of Copy Out Data in [3] and DVFS-SP (Proposed) where r=1 DVFS-SP(Proposed) 128 226 398 Figure 4: Analysis of execution time of Copy Out Figure 4: Analysis of execution time of Copy Data in between FTCS (Proposed) where radius = 1 (Proposed) where radius = 1 Figure 2: Analysis of execution time of Copy Data in between FTSC [3] and DVFS (Proposed) where radius = 1 (Proposed) where radius = 1 Figure 2: Analysis of execution time of Copy In [3] and DVFS-SP [3] and DVFS-SP Above figure demonstrates the execution time of each phase in Copy out Data organize measure c and settled span of r = 1. From the outcome, we can see that the execution time is relative to piece measure c. Table 4: Execution time (ms) of Co between of FTCS [3] and DVFS where c=128 RADIUS FTSC[3] DVFS 1 6225 2 7522 3 8959 4 11393 Above figure shows the execution time of each stage in Copy In Data stage with different chunk size c fixed radius of r = 1. From the result, we can see that the execution time is proportional to chunk size c. Table 2: Execution time (ms) of computation in between of FTSC [3] and DVFS-SP (Proposed) where r=1 CHUNK SIZE 32 656 64 1307 128 2688 Above figure demonstrates the execution time of each Data organize with various piece measure c and settled span of r = 1. From the outcome, we can see that the execution time is relative Above figure shows the execution time of each stage Data stage with different chunk size c and fixed radius of r = 1. From the result, we can see that the execution time is proportional to chunk size c. Table 2: Execution time (ms) of computation in Table 4: Execution time (ms) of Copy In Data in [3] and DVFS-SP (Proposed) where c=128 DVFS-SP(PROPOSED) SP (Proposed) DVFS DVFS- FTSC[3] SP(PROPOSED) 407 1182 2396 2396 SP(PROPOSED) 407 1182 5719 6837 8009 10038 @ IJTSRD | Available Online @ www.ijtsrd.com www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct 2018 Oct 2018 Page: 136

  5. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 correspondence workload increments at various rates. The technique perform on NVIDIA based GPU. The power effectiveness is kept up to 256 pieces measure decrease excess checkpoint in enroll record. The execution of GPU is better for various pipelines stages like pieces size and sweep of group information record. The execution time for various pieces sizes and sweeps are essentially up to 31 % and tely. The execution time of GPU might be lessening fundamentally and GFLOPS is enhancing for information calculation in cloud. Subsequently, recuperation overhead might be diminished. correspondence workload increments at various rates. The technique perform on NVIDIA based GPU. The power effectiveness is kept up to 256 pieces measure and furthermore decrease excess checkpoint in enroll record. The execution of GPU is better for various pipelines stages like pieces size and sweep of group information record. The execution time for various pieces sizes and sweeps are essentially up to 31 % and 8% separately. The execution time of GPU might be lessening fundamentally and GFLOPS is enhancing for information calculation in cloud. Subsequently, recuperation overhead might be diminished. 1.GPU implementation might be enhanced through heaviness plan of put away 2.Computation process in cloud can be diminishing through MIMD processor setup in server farm. 3.GPU execution can be investigated if there should be an occurrence of dispersed framework. 7. REFERENCES 1.Matheus S. Serpa, Emmanuell D. Panetta, Jean-François, “Improving Performance and Energy Efficiency Applications on GPU”, IEEE Conference on Green Energy, June, 2018. 2.Xinxin Meia, Qiang Wanga, Xiaowen Chua,b, “A survey and measurement study of GPU DVFS on energy conservation”, Elsivier Journal of Digital Communications and Networks, 2017. 3.Jun Zhou, Yan Zhang and Weng “Fault Tolerant Stencil Computation on Cloud based GPU Spot Instances”, IEEE Transactions on Cloud Computing, 2017. 4.Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Murali Annavaram and Won Woo Ro, “Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution”, Journal of IEEE Transactions on Computers, August 2016. 5.S. Huang, S. Xiao and W. Feng, “On the En Efficiency of Graphics Processing Units for Scientific Computing”, IEEE Conference on Cloud Computing, 2016. 6.Qing Jiao, Mian Lu, Huynh Phung Huynh, Tulika Mitra, “Improving GPGPU Energy through Concurrent Kernel Execution and DVFS”, IEEE/ACM International Symposium on Code Generation and Optimization, 2015. Generation and Optimization, 2015. Figure 5: Analysis of execution time of Copy Data in between FTCS [3] and DVFS (Proposed) where c = 128 (Proposed) where c = 128 Figure 5: Analysis of execution time of Copy In [3] and DVFS-SP Above figure shows the execution time of each stage in Copy In Data stage with different radius r and fixed chunk size of c = 128. From the result, we can see that the execution time is proportional to chunk size r. 6. CONCLUSIONS AND FUTURE WORK The GPU DVFS for vitality protection. We center around the most up and coming GPU DVFS innovations and their effect on the execution and power utilization. We outline the philosophy and the execution of existing GPU FTCS models. As a rule, the nonlinear demonstrating procedure, for example, the ANN and the changed SLR, has better estimation exactness. What's more, we direct certifiable DFS/DVFS estimation probes the NVIDIA Fermi and Maxwell GPUs. The trial result proposes that both the center and memory recurrence impact the vitality utilization altogether. Utilizing the most noteworthy memory recurrence would dependably save vitality for the Maxwell GPU, which isn't the situation on the Fermi stage. As per the Fermi DVFS tests, downsizing the center voltage is crucial to monitor vitality. Both the study and the estimations spotlight the test of building a precise DVFS execution show, and besides, applying proper voltage/recurrence settings to moderate vitality. We go away these for our outlook investigation. In addition, it is another essential heading to coordinate the GPU DVFS into the substantial scale bunch level power administration later on. It will intrigue investigate how to adequately join GPU DVFS with other vitality preservation systems, for example, errand planning, VM solidification, control execution mediating and runtime control observing. Bunch size and range is corresponding to each other the season of all the three phases (Copy Computation and Copy Out Data) are relative to the lump estimate (c) and sweep (r). At the point when the sweep is expanded, the calculation and the sweep is expanded, the calculation and Above figure shows the execution time of each stage stage with different radius r and fixed chunk size of c = 128. From the result, we can see that the execution time is proportional to chunk size r. GPU implementation might be enhanced through heaviness plan of put away in sequence in cloud. Computation process in cloud can be diminishing through MIMD processor setup in server farm. GPU execution can be investigated if there should be an occurrence of dispersed framework. CONCLUSIONS AND FUTURE WORK The GPU DVFS for vitality protection. We center around the most up and coming GPU DVFS innovations and their effect on the execution and power utilization. We outline the philosophy and the execution of existing GPU FTCS models. As a rule, monstrating procedure, for example, the ANN and the changed SLR, has better estimation exactness. What's more, we direct certifiable DFS/DVFS estimation probes the NVIDIA Fermi and Maxwell GPUs. The trial result proposes that both the urrence impact the vitality utilization altogether. Utilizing the most noteworthy memory recurrence would dependably save vitality for the Maxwell GPU, which isn't the situation on the Fermi stage. As per the Fermi DVFS tests, Matheus S. Serpa, Emmanuell D. Carreño, Jairo François, “Improving Performance and Energy Efficiency Applications on GPU”, IEEE Conference on Green Energy, June, 2018. of of Stencil Stencil Based Based Xinxin Meia, Qiang Wanga, Xiaowen Chua,b, “A survey and measurement study of GPU DVFS on energy conservation”, Elsivier Journal of Digital Communications and Networks, 2017. Jun Zhou, Yan Zhang and Weng-Fai Wong, “Fault Tolerant Stencil Computation on Cloud- based GPU Spot Instances”, IEEE Transactions on is crucial to monitor vitality. Both the study and the estimations spotlight the test of building a precise DVFS execution show, and besides, applying proper voltage/recurrence settings to moderate vitality. We go away these for Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Murali Annavaram and Won Woo Ro, ficiency of GPUs through Data Compression and Compressed Execution”, Journal of IEEE Transactions on Computers, In addition, it is another essential heading to coordinate the GPU DVFS into the substantial scale bunch level power administration later on. It will intrigue investigate how to adequately join GPU DVFS with other vitality preservation e, errand planning, VM solidification, control execution mediating and S. Huang, S. Xiao and W. Feng, “On the Energy Efficiency of Graphics Processing Units for Scientific Computing”, IEEE Conference on Qing Jiao, Mian Lu, Huynh Phung Huynh, Tulika Mitra, “Improving GPGPU Energy-Efficiency through Concurrent Kernel Execution and DVFS”, M International Symposium on Code Bunch size and range is corresponding to each other the season of all the three phases (Copy In Data, ) are relative to the timate (c) and sweep (r). At the point when @ IJTSRD | Available Online @ www.ijtsrd.com www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct 2018 Oct 2018 Page: 137

  6. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 7.Amer Qouneh, Ming Liu, Tao Li, “Optimization of Resource Allocation and Energy Efficiency in Heterogeneous Cloud International Conference on Parallel Processing, 2015. 8.Mohammed Sourouri, Johannes Langguth, Filippo Spiga, Scott B. Baden and Xing Cai, “CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters”, IEEE 18th International Conference on Computational Science and Engineering, 2015. 9.Ashish Mishra, Nilay Khare, “Analysis of DVFS Techniques for Improving the GPU Energy Efficiency”, Open Journal of Energy Efficiency, 2015. 10.Sparsh Mittal, “A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems”, International Computer Aided Engineering and Technology, 2014. 11.Juan M. Cebrian, Gines D. Guerrero and Jose M. Garcia “Energy Efficiency Analysis of GPUs”, IEEE Journal of Cloud Computing, 2014. 12.Sparsh Mittal, Jeffrey S. Vetter, “A Survey of Methods for Analyzing and Improving GPU Energy Efficiency”, ACM Computing Surveys, 2014. 13.Kai Sun, Zhikui Chen, Jiankang Ren, Song Yang, Jing Li, “M2C: Energy Efficient Mobile Cloud System for Deep Learning”, IEEE Conference on Cloud Computing, 2014. 14.Jishen Zhao, Guangyu Sun, Gabriel H. Loh, Yuan Xie, “Optimizing GPU Energy Efficiency with 3D Die-Stacking Graphics Reconfigurable Memory Transactions on Architecture Optimization, December 2013. 15.Xiaohan Ma, Marion Rincon, and Zhigang De “Improving Energy Efficiency of GPU based General-Purpose Scientific Computing through Automated Selection Configurations”, IEEE Conference on Parallel Computing, 2011. 16.C.-M. Liu, T. Wong, E. Wu, R. Luo, S. Y. Li, B. Wang, C. Yu, X. Chu, K. Zhao, et al., SOAP3: ultra-fast GPU-based parallel alignment tool for short reads, Bioinformatics 28 (6) (2012) 878–879. 17.K. Zhao, X. Chu, G-BLASTN: accelerating nucleotide alignment by graphics processors, Bioinformatics 30 (10) (2014) 1384 18.X. Chu, C. Liu, K. Ouyang, L. Y.-W. Leung, Perasure: a parallel cauchy reed solomon coding library for GPUs, in: Proceedings of the IEEE International Conference on Communications, London, UK, 2015, pp. 436 441. 19.Q. Li, C. Zhong, K. Zha Implementation and analysis of AES encryption on GPU, in: Proceedings of the Third IEEE International Workshop on Frontier of GPU Computing, Liverpool, UK, 2012. 20.X. Chu, K. Zhao, Practical random linear network coding on GPUs, in: Procee International Conferences on Networking, IFIP, Archen, Germany, 2009. 21.Y. Li, K. Zhao, X. Chu, J. Liu, Speeding up K means algorithm by GPUs, in: Proceedings of the 10th IEEE International Conference on Computer and Information Technology, pp. 115–122. 22.R. Raina, A. Madhavan, A. deep unsupervised learning using graphics processors, in: Proceedings of the 26th ACM Annual International Conference on Machine Learning, 2009, pp. 873–880. 23.A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, N. Andrew, Deep learning with COTS HPC systems, in: Proceedings of the 30th International Conference on Machine Learning, 2013, pp. 1337–1345. 24.Q. V. Le, Building high-level features using large scale unsupervised learning, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8595 8598. 25.D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., Mastering the game of Go with deep neural networks and tree search, Nature 529 (7587) (2016) 484–489. Amer Qouneh, Ming Liu, Tao Li, “Optimization of Resource Allocation and Energy Efficiency in Heterogeneous Cloud International Conference on Parallel Processing, BLASTN: accelerating nucleotide alignment by graphics processors, Bioinformatics 30 (10) (2014) 1384–1391. 44 44th Data Data Centers”, Centers”, X. Chu, C. Liu, K. Ouyang, L. S. Yung, H. Liu, W. Leung, Perasure: a parallel cauchy reed- solomon coding library for GPUs, in: Proceedings of the IEEE International Conference on Communications, London, UK, 2015, pp. 436– Mohammed Sourouri, Johannes Langguth, Filippo Spiga, Scott B. Baden and Xing Cai, “CPU+GPU Programming of Stencil Computations for Efficient Use of GPU Clusters”, IEEE 18th International Conference on Computational Q. Li, C. Zhong, K. Zhao, X. Mei, X. Chu, Implementation and analysis of AES encryption on GPU, in: Proceedings of the Third IEEE International Workshop on Frontier of GPU Computing, Liverpool, UK, 2012. Ashish Mishra, Nilay Khare, “Analysis of DVFS Techniques for Improving the GPU Energy Efficiency”, Open Journal of Energy Efficiency, X. Chu, K. Zhao, Practical random linear network coding on GPUs, in: Proceedings of the 8th International Conferences on Networking, IFIP, Sparsh Mittal, “A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems”, International Journal of Computer Aided Engineering and Technology, Y. Li, K. Zhao, X. Chu, J. Liu, Speeding up K- means algorithm by GPUs, in: Proceedings of the 10th IEEE International Conference on Computer and Information Technology, Bradford, UK, 2010, Juan M. Cebrian, Gines D. Guerrero and Jose M. Garcia “Energy Efficiency Analysis of GPUs”, IEEE Journal of Cloud Computing, 2014. Sparsh Mittal, Jeffrey S. Vetter, “A Survey of Analyzing and Improving GPU Energy Efficiency”, ACM Computing Surveys, R. Raina, A. Madhavan, A. Y. Ng, Large-scale deep unsupervised learning using graphics processors, in: Proceedings of the 26th ACM Annual International Conference on Machine 880. Kai Sun, Zhikui Chen, Jiankang Ren, Song Yang, Jing Li, “M2C: Energy Efficient Mobile Cloud System for Deep Learning”, IEEE Conference on T. Wang, D. Wu, B. Catanzaro, N. Andrew, Deep learning with COTS HPC systems, in: Proceedings of the 30th International Conference on Machine Learning, yu Sun, Gabriel H. Loh, Yuan Xie, “Optimizing GPU Energy Efficiency with 3D Stacking Graphics Reconfigurable Memory Transactions on Architecture Memory Interface”, Interface”, Memory and ACM Code Code and ACM level features using large scale unsupervised learning, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8595– and and Xiaohan Ma, Marion Rincon, and Zhigang Deng, “Improving Energy Efficiency of GPU based Purpose Scientific Computing through Automated Selection Configurations”, IEEE Conference on Parallel J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., Mastering the game of Go with deep neural networks and tree search, Nature 529 (7587) of of Near Near Optimal Optimal M. Liu, T. Wong, E. Wu, R. Luo, S.-M. Yiu, , X. Chu, K. Zhao, et al., based parallel alignment tool for short reads, Bioinformatics 28 (6) (2012) @ IJTSRD | Available Online @ www.ijtsrd.com www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct 2018 Oct 2018 Page: 138

More Related