1 / 21

GPU Architecture and Programming

GPU Architecture and Programming. GPU vs CPU. https://www.youtube.com/watch?v=fKK933KK6Gg. GPU Architecture. GPU (Graphics Processing Unit) were originally designed as graphics accelerators, used for real-time graphics rendering.

marnie
Download Presentation

GPU Architecture and Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Architecture and Programming

  2. GPU vs CPU https://www.youtube.com/watch?v=fKK933KK6Gg

  3. GPU Architecture • GPU (Graphics Processing Unit) were originally designed as graphics accelerators, used for real-time graphics rendering. • Starting in the late 1990s, the hardware became increasingly programmable, culminating in NVIDIA's first GPU in 1999.

  4. CPU + GPU is a powerful combination • CPUs consist of a few cores optimized for serial processing, • GPUs consist of thousands of smaller, more efficient cores designed for parallel performance. • Serial portions of the code run on the CPU while parallel portions run on the GPU

  5. Architecture of GPU Image copied from http://www.pgroup.com/lit/articles/insider/v2n1a5.htm Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

  6. CUDA Programming • CUDA (Compute Unified Device Architecture) is a parallel programming platform created by NVIDIA based on its GPUs. • By using CUDA, you can write programs that directly access GPU. • CUDA platform is accessible to programmers via CUDA libraries and extensions to programming languages like C, C++ AND Fortran. • C/C++ programmers use “CUDA C/C++”, compiled with nvcc compiler • Fortran programmers can use CUDA Fortran, compiled with PGI CUDA Fortran

  7. Terminology: • Host: The CPU and its memory (host memory) • Device: The GPU and its memory (device memory)

  8. Programming Paradigm Parallel function of application: execute as a kernel Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  9. Programming Flow • Copy input data from CPU memory to GPU memory • Load GPU program and execute • Copy results from GPU memory to CPU memory

  10. Each parallel function of application is execute as a kernel • That means GPUs are programmed as a sequence of kernels; typically, each kernel completes execution before the next kernel begins. • Fermi has some support for multiple, independent kernels to execute simultaneously, but most kernels are large enough to fill the entire machine.

  11. Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

  12. Hello World! Example • _ _global_ _ is a CUDA C/C++ keyword meaning • mykernel() will be exectued on the device • mykernel() will be called from the host Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  13. Addition Example • Since add runs on device, pointers a, b, and c must point to device memory Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  14. Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  15. Vector Addition Example Kernel Function: Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  16. main: Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  17. Alternative 1: Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  18. Alternative 2: intglobalThreadId= threadIdx.x + blockIdx.x * M //M is the number of threads in a block IntglobalThreadId= threadIdx.x + blockIdx.x * blockDim.x Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  19. So the kernel becomes Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  20. The main becomes Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

  21. Handling Arbitrary Vector Sizes Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

More Related