1 / 30

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh. Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer Engineering – Prof. Olivera Notaros.

Download Presentation

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer Engineering – Prof. Olivera Notaros

  2. Project Goals: To develop parallel versions of applications that will run on a graphics card and measure the performance. • Started with a simple Matrix Multiply program. • We intend to develop at least one or two additional applications and also to pursue an analysis of hardware optimizations. • Develop a process for tuning applications & hardware that other developers can use more easily.

  3. Tyler Drake – Computer Science major • Robert Wrisley – Computer Science/Computer Engineering dual major • Kyle Von Koepping – Electrical Engineering major • Justin Walsh – Computer Science/Computer Engineering dual major • Shared coding responsibilities • Enables comparison and greater understanding for all team members • Possibly divide responsibilities for the second half of the project

  4. Moore’s Law • Transistor densities on single-core processors were doubling approximately every 18 months. • This trend has remained valid since first observed in 1965 and is expected to hold for several more years. • This natural trend had become the standard goal for hardware companies.

  5. Limits of Moore’s Law • There is an ultimate limit to Moore’s law. • Transistors will soon reach sizes of atomic level. • Moore’s law does not apply to Random Access Memory (RAM) speeds and hard drive seek times. (AKA Memory Wall) • Redesign of processor architecture isn’t driven directly by Moore’s Law, but by the fact that these and other factors have not kept up with this growth rate.

  6. The Graphics Card • CPU or multiple CPU’s are not the only processors found on a personal computer • The graphics card has a graphics processing unit (GPU). • The GPU is specifically designed to render 3D models onto a 2D display • Designed for floating point computation with a highly parallel architecture.

  7. CUDA • Engineers have begun to exploit the highly parallel architecture of the GPU for general applications. • Graphics companies encourage general purpose computing on the GPU (GPGPU). • Nvidia has developed CUDA (Compute Unified Device Architecture). • Based on the C language programmers can easily shift to developing on the GPU

  8. What We Have Done So Far

  9. What Have We Been Doing? • Learning about CUDA • NVIDIA CUDA guides • Lecture slides from University of Illinois, Urbana-Champaign • Papers from various academic groups • University of Illinois, Urbana-Champaign • Tokyo Institute of Technology • University of California at Berkeley • Learning to write parallel programs in CS475 using MPI & OpenMP • Writing simple programs using CUDA and observing performance • Matrix Multiply

  10. Results and Optimizations • Results • Achieved 131 Gigaflops/sec on a GTX280 with N = 1024. GTX 280 peak is 933 Gigaflops/sec. • Optimizations • Tiling the result matrix into smaller sub-matrices and having each thread block compute a sub-matrix will reduce amount of data needed to be loaded by each thread block. • This helps to reduce memory latency.

  11. Significant Lessons Learned and Other Useful Notes • Memory • Must allocate memory on the graphics card from the main program being run on the CPU • Memory for the graphics card is explicitly managed by the programmer • An “extension” to C, not a separate language • Similar to MPI, OpenMP, etc.

  12. Where is our project headed? Increasing problem complexity • Some are no longer “Pleasantly Parallel” • Higher degree of kernel analysis • Moving to more dynamic programs

  13. Additional programs being written for the GPU include: • Scan: Matrix computation where the ith index is the sum of the previous i-1 indices! • Knapsack: profit maximization given a capacity and list of items with their weight & profit • Matrix Multiply for still larger matrices • Triangular Matrix Multiplication

  14. Potential Applications Mandelbrot Set • Pleasantly parallel, familiar • Easily scalable

  15. Potential Applications Ray Tracing • Very computationally intensive • Feasible for non-realtime computations • Very dynamic, due to recursion • High degree of realism

  16. Potential Applications Examples of images generated by Ray Tracing

  17. Potential Applications Hidden Markov Models • Clear parallelism • Wide range of applications

  18. Potential Applications Uses of Hidden Markov Models

  19. To develop a more complex application for the GPU and optimize the performance • To analyze hardware optimizations and evaluate the performance gains • Develop a process for future programmers that will give them the best performance increases with the minimum development effort • Please Note: These goals are tentative and subject to change.

  20. Moore’s Law now being applied to processors per core instead of transistors per processor. • Multi-core machines offer the next generation of performance enhancements… but they are already here! • GPUs provide massively parallel architectures that programmers can take advantage of to see phenomenal performance gains.

  21. Learning to use the CUDA library and some of the nuances. Have gotten good performance on Matrix-Multiply attempts. Also completing CUDA versions of Scan and Knapsack problems. Move on to a more complex application. Researching hardware optimizations that can further enhance performance on GPUs. Develop a combined approach for future applications programmers to follow.

  22. $50 spent for a graphics card that is CUDA compatible. • We’d like to thank Prof. Dan Connors for the use of his machines with Nvidia GTX280 graphics cards. • This provided us free access to a consistent build for all of us to run our code and sample code on. • We don’t project any major costs next semester, except perhaps for some materials for our E-Days presentation.

More Related