1 / 28

Performance-Driven Processor Allocation

Performance-Driven Processor Allocation. Julita Corbalan, Xavier Martorell, Jesus Labarta {juli,xavim,jesus}@ac.upc.es DAC-UPC. Objective. Scheduling parallel applications in Shared Memory Multiprogrammed systems

geoff
Download Presentation

Performance-Driven Processor Allocation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance-Driven Processor Allocation Julita Corbalan, Xavier Martorell, Jesus Labarta {juli,xavim,jesus}@ac.upc.es DAC-UPC

  2. Objective • Scheduling parallel applications in Shared Memory Multiprogrammed systems • Allocate processors to applications that “can take advantage of them” • Implemented in an SGI Origin2000 with 64 processors Performance-Driven Processor Allocation

  3. Outline • Introduction & Related Work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

  4. Introduction • Scheduling problem: allocate processors to applications • Space-Sharing / Time-Sharing • Number of processes = Number of Processors • Process Control [Tucker89] • Space-sharing approaches: • P fixed at submission time • FCFS, SJF, SCDF [Majumdar88,...] • P defined at execution time (Adaptive / Dynamic) • Equal-allocation of the resources: Equipartition [McCan93] • Processor allocation proportional to the application performance Performance-Driven Processor Allocation

  5. Introduction (2) • Processor allocation proportional to application performance • Drawback: Application performance is not known before its execution • Solution: Calculate it a priori • Executing several times with different P andinput data • Extrapolate the values based on a few samples • These approaches may not be valid: • Application performance depends on run-time parameters: Initial data placement, process migrations, distance between processors and memory, … • It can be impracticable: e.g. infinite input data sets Performance-Driven Processor Allocation

  6. Related Work • Dynamic performance analysis • Self-Tuning [Nguyen96], efficiency calculated at run-time as a function of: idleness, system and communication overhead • Adaptive/Dynamic processor allocation policies • Equal_efficiency [Nguyen96], tries to achieve the same efficiency on all processors • Dynamic Allocation, based on the idleness [McCann93] • Allocates the knee of the efficiency/execution time curve [Eager89] Performance-Driven Processor Allocation

  7. Our proposal • We propose: • Dynamic performance analysis • Real speedup • Calculated at run-time • Allocate processors to applications that “can take advantage of them” • Dynamic partitioning • Cost conscious re-allocations (memory locality) • Really efficient use of processors • Dynamic multiprogramming level • Coordination between the medium & long term schedulers Performance-Driven Processor Allocation

  8. Outline • Introduction & Related work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

  9. NANOS Execution Environment -Controls the application arrival -Coordinated with the CPU Manager FCFS Queued applications OpenMP Parallel Applications (malleable) Start new application Queueing System -Implements the scheduling policy -Informs the applications about its decisions -Enforces the processor allocation New application? -Request processors -Informs about its performance Proc. request, speedup CPU Manager Proc. allocated Resume, bind, ... SelfAnalyzer Operating System Shared Memory Multiprocessor …. Performance-Driven Processor Allocation

  10. Outline • Introduction & Related work • NANOS Execution Environment • Performance-Driven Processor Allocation: PDPA • Dynamic Performance Analysis: SelfAnalyzer • Performance-Driven Processor Allocation policy • Dynamic Multiprogramming Level • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

  11. Basedon iterative parallel applications Source code available SelfAnalyzer calls inserted by the user or the compiler Source code not available Dynamic Periodicity Detection SelfAnalyzer dynamically loaded • Do • !$OMP PARALLEL DO • do • enddo • !$OMP END DO • !$OMP PARALLEL DO • do • enddo • !$OMP END DO • end do Dynamic Performance Analysis: SelfAnalyzer • Tool to estimate the application speedup and execution time Performance-Driven Processor Allocation

  12. 1 Proc. P Proc. ... T(1) T(P) B Proc. P Proc. ... T(b) T(P) Dynamic Performance Analysis: SelfAnalyzer(2) • Speedup calculated as the relationship between T(1) and T(P) Serialization!! Performance-Driven Processor Allocation

  13. Performance-Driven Processor Allocation • Space-Sharing • Allocation for acceptable efficiency (S(p)/p) • In the range [low_eff , high_eff] [50%-70%] • Run-To-Completion • Minimum allocation of one processor • Dynamic partitioning, re-allocations when: • Applications inform about their speedups • Application arrival/Application end • Remembers the application state • Allocation, performance Performance-Driven Processor Allocation

  14. Eff(p)<low_eff P=P-step Eff(p)>high_eff P=P+min(free,step) System Changes System Changes Eff(p)>low_eff Eff(p)<high_eff OR Not proportional benefit Performance-Driven Processor Allocation(2) • Policy parameters: step, low_eff and high_eff NewAppl P=min(Free Proc., Proc. Requested) NO_REF • Eff(p)<high_eff • && • Eff(p)>low_eff DEC STABLE INC Performance-Driven Processor Allocation

  15. Dynamic Multiprogramming Level • Multiprogramming level (ML) • Number of applications running concurrently • Static/Dynamic ML • Coordination between the medium & long term schedulers • If (new_appl_fits()?) start_new_appl() • new_appl_fits() defined by the scheduling policy • Free processors during several quanta • start_new_appl() implemented by the queuing system Performance-Driven Processor Allocation

  16. Outline • Introduction & Related work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Processor Allocation Policies • Applications & Workloads • Execution Time & Processor Allocation • Conclusions & Future Work Performance-Driven Processor Allocation

  17. Processor Allocation Policies • Equip: equal CPUs to each running application • PDPA + DML : our proposal • Equal_eff: equal efficiency in all the processors • SGI-MP:nativeIRIX Scheduler • MP_BLOCKTIME=200000 • OMP_DYNAMIC=TRUE Performance-Driven Processor Allocation

  18. Applications & Workloads • Architecture & System • SGI Origin2000 with 64 processors + IRIX 6.5.8 • Applications: Open MP • Swim(44.2), Bt(20.85), Hydro2d(6.3), apsi(1) • Workloads • Multiprogramming Level set to 4 • Request = 32 processors each application Performance-Driven Processor Allocation

  19. ML=4 DML=5 Exec.Time & Proc. Allocation Limited processor allocation Total execution time reduced Appl. exc. time slightly increased Performance-Driven Processor Allocation

  20. DML=10 Processors are efficiently used Performance affectedby the multiprogrammed execution Exec.Time & Proc. Allocation Total exec. Time improved Allocation proportional to the performance Performance-Driven Processor Allocation

  21. SGI vs. PDPA 4476 vs. 4 processes migrations !!!! Processor Affinity+ Process Control Performance-Driven Processor Allocation

  22. PDPA behavior (zoom) Tuning algorithm Performance-Driven Processor Allocation

  23. Outline • Introduction & Related Work • NANOS Execution Environment • Performance-Driven Processor Allocation:PDPA • Evaluation • Conclusions & Future Work Performance-Driven Processor Allocation

  24. Conclusions • It is important to provide an accurate performance information • SelfAnalyzer: dynamic, accurate, easy to use • PDPA allocates processors to applications that “can take advantage of them” • The Dynamic Multiprogramming Level improves the system performance • Coordinating the medium & long term schedulers Performance-Driven Processor Allocation

  25. Future Work • Dynamic performance analysis • Non-iterative applications • PDPA • Space Sharing+Time Sharing • Evaluation in a open environment • Step, low_eff and high_eff need further research • Number of reallocations limited • Coordination medium & long term schedulers • New policies Performance-Driven Processor Allocation

  26. More contact info... • http://www.ac.upc.es/NANOS • http://www.ac.upc.es/homes/juli • juli@ac.upc.es Performance-Driven Processor Allocation

  27. Related Work • Dynamic performance analysis • Self-Tuning [Nguyen96], efficiency calculated at run-time as a function of: idleness, system and communication overhead • Dynamic processor allocation policies • Equal_efficiency [Nguyen96], tries to achieve the same efficiency on all processors • Dynamic Allocation, based on the idleness [McCann93] • Allocates the knee of the efficiency/execution time curve [Eager89] It does not calculate the real speedup It does not ensure an efficient use of processors Excessive number of reallocations Uses a priori information Performance-Driven Processor Allocation

  28. Performance-Driven Processor Allocation(3) • Advantages • PDPA works with run-time information • Ensures that processors are always efficiently used • Drawbacks • The tuning algorithm can introduce overhead inside the application • Dynamic step • Some processors can remain unallocated • Dynamic Multiprogramming Level Performance-Driven Processor Allocation

More Related