Performance-Driven Processor Allocation

Performance-Driven Processor Allocation Julita Corbalan, Xavier Martorell, Jesus Labarta {juli,xavim,jesus}@ac.upc.es DAC-UPC

Objective • Scheduling parallel applications in Shared Memory Multiprogrammed systems • Allocate processors to applications that “can take advantage of them” • Implemented in an SGI Origin2000 with 64 processors Performance-Driven Processor Allocation

Outline • Introduction & Related Work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

Introduction • Scheduling problem: allocate processors to applications • Space-Sharing / Time-Sharing • Number of processes = Number of Processors • Process Control [Tucker89] • Space-sharing approaches: • P fixed at submission time • FCFS, SJF, SCDF [Majumdar88,...] • P defined at execution time (Adaptive / Dynamic) • Equal-allocation of the resources: Equipartition [McCan93] • Processor allocation proportional to the application performance Performance-Driven Processor Allocation

Introduction (2) • Processor allocation proportional to application performance • Drawback: Application performance is not known before its execution • Solution: Calculate it a priori • Executing several times with different P andinput data • Extrapolate the values based on a few samples • These approaches may not be valid: • Application performance depends on run-time parameters: Initial data placement, process migrations, distance between processors and memory, … • It can be impracticable: e.g. infinite input data sets Performance-Driven Processor Allocation

Related Work • Dynamic performance analysis • Self-Tuning [Nguyen96], efficiency calculated at run-time as a function of: idleness, system and communication overhead • Adaptive/Dynamic processor allocation policies • Equal_efficiency [Nguyen96], tries to achieve the same efficiency on all processors • Dynamic Allocation, based on the idleness [McCann93] • Allocates the knee of the efficiency/execution time curve [Eager89] Performance-Driven Processor Allocation

Our proposal • We propose: • Dynamic performance analysis • Real speedup • Calculated at run-time • Allocate processors to applications that “can take advantage of them” • Dynamic partitioning • Cost conscious re-allocations (memory locality) • Really efficient use of processors • Dynamic multiprogramming level • Coordination between the medium & long term schedulers Performance-Driven Processor Allocation

Outline • Introduction & Related work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

NANOS Execution Environment -Controls the application arrival -Coordinated with the CPU Manager FCFS Queued applications OpenMP Parallel Applications (malleable) Start new application Queueing System -Implements the scheduling policy -Informs the applications about its decisions -Enforces the processor allocation New application? -Request processors -Informs about its performance Proc. request, speedup CPU Manager Proc. allocated Resume, bind, ... SelfAnalyzer Operating System Shared Memory Multiprocessor …. Performance-Driven Processor Allocation

Outline • Introduction & Related work • NANOS Execution Environment • Performance-Driven Processor Allocation: PDPA • Dynamic Performance Analysis: SelfAnalyzer • Performance-Driven Processor Allocation policy • Dynamic Multiprogramming Level • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

Basedon iterative parallel applications Source code available SelfAnalyzer calls inserted by the user or the compiler Source code not available Dynamic Periodicity Detection SelfAnalyzer dynamically loaded • Do • !$OMP PARALLEL DO • do • enddo • !$OMP END DO • !$OMP PARALLEL DO • do • enddo • !$OMP END DO • end do Dynamic Performance Analysis: SelfAnalyzer • Tool to estimate the application speedup and execution time Performance-Driven Processor Allocation

1 Proc. P Proc. ... T(1) T(P) B Proc. P Proc. ... T(b) T(P) Dynamic Performance Analysis: SelfAnalyzer(2) • Speedup calculated as the relationship between T(1) and T(P) Serialization!! Performance-Driven Processor Allocation

Performance-Driven Processor Allocation • Space-Sharing • Allocation for acceptable efficiency (S(p)/p) • In the range [low_eff , high_eff] [50%-70%] • Run-To-Completion • Minimum allocation of one processor • Dynamic partitioning, re-allocations when: • Applications inform about their speedups • Application arrival/Application end • Remembers the application state • Allocation, performance Performance-Driven Processor Allocation

Eff(p)<low_eff P=P-step Eff(p)>high_eff P=P+min(free,step) System Changes System Changes Eff(p)>low_eff Eff(p)<high_eff OR Not proportional benefit Performance-Driven Processor Allocation(2) • Policy parameters: step, low_eff and high_eff NewAppl P=min(Free Proc., Proc. Requested) NO_REF • Eff(p)<high_eff • && • Eff(p)>low_eff DEC STABLE INC Performance-Driven Processor Allocation

Dynamic Multiprogramming Level • Multiprogramming level (ML) • Number of applications running concurrently • Static/Dynamic ML • Coordination between the medium & long term schedulers • If (new_appl_fits()?) start_new_appl() • new_appl_fits() defined by the scheduling policy • Free processors during several quanta • start_new_appl() implemented by the queuing system Performance-Driven Processor Allocation

Outline • Introduction & Related work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Processor Allocation Policies • Applications & Workloads • Execution Time & Processor Allocation • Conclusions & Future Work Performance-Driven Processor Allocation

Processor Allocation Policies • Equip: equal CPUs to each running application • PDPA + DML : our proposal • Equal_eff: equal efficiency in all the processors • SGI-MP:nativeIRIX Scheduler • MP_BLOCKTIME=200000 • OMP_DYNAMIC=TRUE Performance-Driven Processor Allocation

Applications & Workloads • Architecture & System • SGI Origin2000 with 64 processors + IRIX 6.5.8 • Applications: Open MP • Swim(44.2), Bt(20.85), Hydro2d(6.3), apsi(1) • Workloads • Multiprogramming Level set to 4 • Request = 32 processors each application Performance-Driven Processor Allocation

ML=4 DML=5 Exec.Time & Proc. Allocation Limited processor allocation Total execution time reduced Appl. exc. time slightly increased Performance-Driven Processor Allocation

DML=10 Processors are efficiently used Performance affectedby the multiprogrammed execution Exec.Time & Proc. Allocation Total exec. Time improved Allocation proportional to the performance Performance-Driven Processor Allocation

SGI vs. PDPA 4476 vs. 4 processes migrations !!!! Processor Affinity+ Process Control Performance-Driven Processor Allocation

PDPA behavior (zoom) Tuning algorithm Performance-Driven Processor Allocation

Outline • Introduction & Related Work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

Conclusions • It is important to provide an accurate performance information • SelfAnalyzer: dynamic, accurate, easy to use • PDPA allocates processors to applications that “can take advantage of them” • The Dynamic Multiprogramming Level improves the system performance • Coordinating the medium & long term schedulers Performance-Driven Processor Allocation

Future Work • Dynamic performance analysis • Non-iterative applications • PDPA • Space Sharing+Time Sharing • Evaluation in a open environment • Step, low_eff and high_eff need further research • Number of reallocations limited • Coordination medium & long term schedulers • New policies Performance-Driven Processor Allocation

More contact info... • http://www.ac.upc.es/NANOS • http://www.ac.upc.es/homes/juli • juli@ac.upc.es Performance-Driven Processor Allocation

Related Work • Dynamic performance analysis • Self-Tuning [Nguyen96], efficiency calculated at run-time as a function of: idleness, system and communication overhead • Dynamic processor allocation policies • Equal_efficiency [Nguyen96], tries to achieve the same efficiency on all processors • Dynamic Allocation, based on the idleness [McCann93] • Allocates the knee of the efficiency/execution time curve [Eager89] It does not calculate the real speedup It does not ensure an efficient use of processors Excessive number of reallocations Uses a priori information Performance-Driven Processor Allocation

Performance-Driven Processor Allocation(3) • Advantages • PDPA works with run-time information • Ensures that processors are always efficiently used • Drawbacks • The tuning algorithm can introduce overhead inside the application • Dynamic step • Some processors can remain unallocated • Dynamic Multiprogramming Level Performance-Driven Processor Allocation

Performance-Driven Processor Allocation

Performance-Driven Processor Allocation

Presentation Transcript

Performance Yield-Driven Task Allocation and Scheduling for MPSoCs under Process Variation

Data Driven Performance Management

Safety Driven Performance 2013

Optimization: Technical Performance Allocation

Performance Analysis of Processor

High Performance Processor Architecture

Communication-Aware Processor Allocation for Supercomputers

Optimization: Technical Performance Allocation

PERFORMANCE BASED ALLOCATION SYSTEM (PBAS)

Dynamic Processor Allocation for Adaptively Parallel Jobs

Topic 4 Processor Performance

Performance-based allocation*

Data-Driven Performance

Processor Co-Allocation in Multicluster Systems

Cost Allocation and Performance Measurement

“Iron Law” of Processor Performance

Performance Analysis of Processor

PROCESSOR ALLOCATION

Evolution of Processor Performance

Optimization: Technical Performance Allocation

Performance Allocation Compensation Investment | Fsicm.com