GPU Architecture and Programming

GPU Architecture and Programming

GPU vs CPU https://www.youtube.com/watch?v=fKK933KK6Gg

GPU Architecture • GPU (Graphics Processing Unit) were originally designed as graphics accelerators, used for real-time graphics rendering. • Starting in the late 1990s, the hardware became increasingly programmable, culminating in NVIDIA's first GPU in 1999.

CPU + GPU is a powerful combination • CPUs consist of a few cores optimized for serial processing, • GPUs consist of thousands of smaller, more efficient cores designed for parallel performance. • Serial portions of the code run on the CPU while parallel portions run on the GPU

Architecture of GPU Image copied from http://www.pgroup.com/lit/articles/insider/v2n1a5.htm Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

CUDA Programming • CUDA (Compute Unified Device Architecture) is a parallel programming platform created by NVIDIA based on its GPUs. • By using CUDA, you can write programs that directly access GPU. • CUDA platform is accessible to programmers via CUDA libraries and extensions to programming languages like C, C++ AND Fortran. • C/C++ programmers use “CUDA C/C++”, compiled with nvcc compiler • Fortran programmers can use CUDA Fortran, compiled with PGI CUDA Fortran

Terminology: • Host: The CPU and its memory (host memory) • Device: The GPU and its memory (device memory)

Programming Paradigm Parallel function of application: execute as a kernel Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Programming Flow • Copy input data from CPU memory to GPU memory • Load GPU program and execute • Copy results from GPU memory to CPU memory

Each parallel function of application is execute as a kernel • That means GPUs are programmed as a sequence of kernels; typically, each kernel completes execution before the next kernel begins. • Fermi has some support for multiple, independent kernels to execute simultaneously, but most kernels are large enough to fill the entire machine.

Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

Hello World! Example • _ _global_ _ is a CUDA C/C++ keyword meaning • mykernel() will be exectued on the device • mykernel() will be called from the host Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Addition Example • Since add runs on device, pointers a, b, and c must point to device memory Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Vector Addition Example Kernel Function: Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

main: Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Alternative 1: Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Alternative 2: intglobalThreadId= threadIdx.x + blockIdx.x * M //M is the number of threads in a block IntglobalThreadId= threadIdx.x + blockIdx.x * blockDim.x Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

So the kernel becomes Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

The main becomes Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Handling Arbitrary Vector Sizes Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

GPU Architecture and Programming

GPU Architecture and Programming

Presentation Transcript

GPU programming

CIS 700/010: GPU Programming and Architecture

CIS 565: GPU Programming and Architecture

GPU Programming and Architecture: Course Overview

GPU Programming and Architecture: Course Overview

GPU Programming

GPU Programming and Architecture: Course Overview

GPU Architecture

CS179: GPU Programming

GPU Programming

GPU Programming

Graphics Processing Unit (GPU) Architecture and Programming

Graphics Processing Unit (GPU) Architecture and Programming

GPU Architecture Overview

GPU Programming and CUDA

Graphics Processing Unit (GPU) Architecture and Programming

GPU System Architecture

GPU Programming

GPU Programming and Architecture: Course Overview

GPU Programming and CUDA

GPU Architecture Overview

GPU Programming and CUDA