1 / 9

Breakout Group: Debugging

Breakout Group: Debugging. David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan. Exascale Debugging. Debugging: finding problems in the execution of code. Identifying and dealing with sources of: incorrectness (application and architecture)

irma
Download Presentation

Breakout Group: Debugging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan

  2. Exascale Debugging • Debugging: finding problems in the execution of code. Identifying and dealing with sources of: • incorrectness (application and architecture) • application failure (deadlock, hang, segfault) • critical application bottlenecks (standstill, performance cliff) • Exascale issues • Concurrency expense of debugging • Scalability of debugger methodologies (data and interfaces) • Concurrency scaling of the frequency of errors/failures • Heterogeneity and lightweight OS

  3. Exascale Trends relevant to debugging To which broad exascale trends is debugging related? • Concurrency ✓ • Reliability ✓ • Power Costs • Heterogeneity in a node ✓ • I/O and memory: ratios and breakthroughs

  4. What’s different about exascale debugging? • Assumption that many things may/will go wrong at the same time will require triage, filtering, and clustering of faults and problems • Focus on multi-level debugging, communicating details of faults between software layers • Synthesis of fault information into understanding in the context of application and architecture • Simulation of concurrency when possible • Excision of buggy code snippets to run at lower concurrencies

  5. Debugging Priority Research Direction (use one slide for each) Key challenges Summary of research direction • Basic challenge of concurrency (hard & $$) • Interoperability with compiler, library, runtime, OS and I/O • Debugging without stopping (resilient analysis of victim processes) • Vertical integration of debug and performance information across software layers • Layered contexts of debugging (just MPI, just I/O, or framework/application defined ) • Scalable clustering of application process states and contexts. Filter/search within debugger • Automatically triggered debugging Potential impact on software component Potential impact on usability, capability, and breadth of community • More eyes on debug information besides the person running the debugger • Multi-layered debug histories become available/useful to system-wide monitoring • Debugging meets performance analysis • Debugging informs system software • Lowering overhead and barriers to debugging at large scale • Debuggers begin to communicate user level metrics, debugging becomes more meaningful • Greater certainty in scientific validity of exascale’s computational results. Trust.

  6. UR Graph Roadmap for exascale debugging Near-production exascale Scale of debugging Simulation @ 1e6 cores Breakthroughs needed for 1e6 core production debug LWDB @ 1e5 cores Planning & Workshops 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

  7. 4.x.Debugging narrative Technology drivers Alternative R&D strategies Recommended research agenda Crosscutting considerations

  8. Roadmap sections on debugging tools • Technology drivers for Debugging • Alternative R&D strategies for Debugging • Recommended research agenda Debugging + Identify cross-cutting consideration and connections (compilers, resiliency and performance) + Identify key regional interests, expertise, and resources

  9. State of the art • Debuggers scale to 10K procs • Vendors are developing solutions for new debugging contexts (memory, communication, etc.) • Some progress in clustering and data aggregation

More Related