1 / 19

Graph Analysis with High Performance Computing

by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008 issue of Computing in Science and Engineering 2/9/11 Presented by Darlene Barker. Graph Analysis with High Performance Computing. Overview.

cerise
Download Presentation

Graph Analysis with High Performance Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008 issue of Computing in Science and Engineering 2/9/11 Presented by Darlene Barker Graph Analysis with High Performance Computing

  2. Overview • Explored the use of high-performance computing to study large, complex graphing algorithms • Presented the challenges running graphing algorithms using explicit message passing using a MPI in distributed-memory computers • Proposed solution—developing graph algorithms on a nontraditional, multithreaded supercomputers

  3. Distributed-memory computers • Most popular class of parallel machines which uses programming with explicit message passing (MPI) • The user divides the data among processors and determines which processor performs which task. The processors exchange data via user-controlled messages.

  4. Alternatives to using explicit message passing (MPI) to program distributed-memory parallel computers:

  5. UPC • The number of control threads is constant in a UPC and is generally equal to the number of processors or cores.

  6. Cache-coherent parallel computers • Global memory is universally accessible to each computer and presents some challenges, such as latency: • while using faster hardware to access memory but still with limitations in that it adds overhead degrading performance. • Requires a protocol for thread synchronization and scheduling

  7. Massively multithreaded architecture • Examples: Cray MTA-2, XMT • Addresses latency challenge seen with ensuring that the processor has other work to do while waiting for a memory request to be satisfied. • When a memory request is issued, the processor immediately switches its attention to another thread that’s ready to execute.

  8. Drawbacks • Custom vs. commodity processors which are expensive and have a much slower clock rate than mainstream processors. • MTA-2’s programming model although simple and elegant it is not portable to other architectures.

  9. To fix the cross-architectural problem with the MTA-2 programming model Use generic programming libraries that hide machine E.g. Generic programming underlies: • C++ Standard Template Library • Boost C++ Libraries • Boost Graph Library (BGL) Use the massively multithreaded architecture with an extended subset of the Boost Graph Library

  10. Studied two fundamental graph algorithms on different platforms s-t connectivity To find a path from vertex s to vertex t that traverses the fewest possible edges. Single-Source Shortest Paths (SSSPs) Find the shortest-length path from a specific vertex to all other vertices in the graph.

  11. Focused on two different classes of graphs • Erdos-Renyi random graphs – constructed by assigning a uniform edge probability to each possible edge and then using a random number generator to determine which edges exists. • Inverse power law graphs (RMAT) – constructed by recursively adding adjacencies to a matrix in an intentionally uneven way.

  12. Example of a Erdos-Renyi random graph – not relating to the paper http://upload.wikimedia.org/wikipedia/commons/1/13/Erdos_generated_network-p0.01.jpg

  13. Results • Only the MTA-2 has a programming model and architecture sufficiently robust to easily test instances of inverse power law graphs with close to a billion edges.

  14. Challenges for distributed-memory machines • High degree nodes using standard scientific computing practice—storing ghost nodes. • High-degree vertices requiring very large message buffers. Ghost nodes limit memory scalability and help runtime scalability.

  15. R-t connectivity

  16. Conclusion - 1 • Unlike most scientific computing kernels, graph algorithms exhibit complex memory access patterns and limited amounts of actual processing. • Performance is determined by the computer’s ability to access memory, not by actual processor speed. • They believe a broad trend exists in the scientific computing community towards increasingly complex and memory limited simulations.

  17. Conclusion - 2 • With current microprocessor sizes going from silicon to spare, the author’s believe that this space should be used to support massive multithreading, resulting in processors and parallel machines that can apply to a broader range than current offerings.

More Related