1 / 7

Technology Drivers

Technology Drivers. Traditional HPC application drivers OS noise, resource monitoring and management, memory footprint Complexity of resources to be managed New and evolving programming models Shifting emphasis from managing cycles to managing data

walt
Download Presentation

Technology Drivers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technology Drivers • Traditional HPC application drivers • OS noise, resource monitoring and management, memory footprint • Complexity of resources to be managed • New and evolving programming models • Shifting emphasis from managing cycles to managing data • Programming models require more access to resource management decisions • Hybrid/Mixed programming models (composing applications) • Node and Memory structures • On-node RAM, DRAM, Flash • Stacked memory (performance implications for different access patterns) • Explicit cache/hierarchy management • On-node interconnect • Heterogenous cores • On-node power management • Global structures • Global address space • Integration of collectives, esp synchronization • Resilience (soft errors and damaged cores) • HPC OS Sustainability Increasing importance and complexity of resource management

  2. Alternate R&D Strategies • Evolve an existing OS • Linux, Plan 9, IBM CNK, Kitten • Start with an empty emacs buffer • Steal components from existing operating systems • Partitioning resources – independent management within a partition • Composibility • Collective/Global OS • Global address space? It’s time to define the winner

  3. Research Agenda • HPC Community OS • Define basic structure • Individual groups work on components • Expose management of critical resources • Simulation to evaluate scalability of resource management strategies • Enable co-design of hardware to support resource management • Define and implement OS mechanisms that will enable global, autonomic runtime systems

  4. Priority Research Direction:Community OS Framework for HPC Systems Key challenges Summary of research direction HPC applications have unique resource management needs (e.g., memory layout) Anticipated rapid evolution/revolution in architectures and programming models Limited ability to innovate in existing commodity operating systems Sustainability of HPC OS is difficult Develop an OS framework specific to the needs of HPC Open system architecture that exposes the management of critical resources Empower developers of libraries and runtime systems Potential impact on software component Potential impact on usability, capability, and breadth of community Context for individual innovation and contribution Common foundation for libraries and runtime environments This will enable full access to hardware resources Timeframe: 2-3 years

  5. Priority Research Direction:Scalable System Simulation Key challenges Summary of research direction Inability to conduct “apples to apples” comparisons in scalable resource management Evolution / revolution in new systems Wide variety of existing simulators Develop a scalable, full system simulation capability Address multi-scale challenges Adapt techniques that have been used in other branches of computational science Develop common interfaces between simulators Potential impact on software component Potential impact on usability, capability, and breadth of community Ability to evaluate resource management mechanisms and policies at scale Enable architecture/OS co-design Critical for the OS research/development community Important for runtime community Timeframe: 2-4 years

  6. Priority Research Direction:Open System APIs Key challenges Summary of research direction Communication management Thread management Memory management Power management Resilience (fault/failure isolation/management) Develop community based APIs to expose critical resources Develop prototype runtime environments for common programming models Potential impact on software component Potential impact on usability, capability, and breadth of community Critical for supporting the development of new programming models Critical for enabling the development of new architectures Timeframe: 3 to 8 years Provides a fixed point for innovation in API implementation and innovation in the implementation of runtimes (hourglass principle) Differentiation based on performance, not functionality

  7. 4.1 Operating Systems A Community HPC OS Autonomic runtime systems Robust, Scalable System Simulation APIs for energy management API for node resilience Runtime Environments enabled Community OS Framework Next Generation Interconnect API Prototype implementation of OS Framework 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

More Related