1 / 34

Motivation

Job Scheduling for the BlueGene/L System Elie Krevat, Jose G.Castanos, Jose E.Moreira Presented by Savitha Krishnamoorthy CIS 888 The Ohio State University. Motivation. Problems associated with toroidal interconnects: Require rectangular,contiguous job partitions

kaemon
Download Presentation

Motivation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Job Scheduling for the BlueGene/L SystemElie Krevat, Jose G.Castanos, Jose E.MoreiraPresented bySavitha KrishnamoorthyCIS 888The Ohio State University

  2. Motivation Problems associated with toroidal interconnects: • Require rectangular,contiguous job partitions • Introduce fragmentation issues- affect utilization,wait time • Lead to slow down

  3. Toroidal Interconnects • “Endless” connection • Simple, modular, scalable • Examples: Cray T3D, T3E m/c • Problems: • Nodes not fully connected,not equidistant • Spatial location of nodes while allocating jobs - critical • Fragmentation due to rectangular, contiguous partitions

  4. A 2D Torus Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8

  5. Schemes analyzed • Space-sharing scheduling techniques • Backfilling • Moves low priority job (FCFS) ahead • No delay to high priority job • Migration • On-the-fly defragmentation

  6. FCFS Scheduler • Maximize largest free rectangular partition remaining in Torus • Invoked whenever job arrives/ terminates • Rectangles requiring prime number of nodes can’t be found • Simplest Algorithm

  7. FCFS with Backfilling • System utilization  • Estimation of job execution time required • But we know – overestimating execution time doesn’t affect backfilling • Invoked when waiting queue not empty+FCFS scheduler halted

  8. FCFS - With and Without Backfilling

  9. FCFS+Migration • Rearrange running jobs,  contiguous rectangular free partition • Empty torus -> Reschedule • Decision Metrics: • FNtor =Free Nodes:Torus Size • FNmax= Fraction of free nodes in maximal free partition

  10. Backfilling + Migration • Schedule via FCFS first • Rearrange torus through migration, minimize fragmentation • repeat FCFS • Finally backfill

  11. The BG/L System • 32x32x64 3D torus of cells (nodes) • Processor, mem, links to 6 neighbors in each cell • Unit of job allocation 8x8x8 config • Each unit is a supernode • BG/L- a 4x4x8 torus of supernodes

  12. The Simulation Environment • Simulator input: Job log(arrival time,execution time,size of job), type of scheduler (FCFS,B,M,B+M) • 4 Primary events: • Arrival:when job first submitted and placed in scheduler’s waiting queue • Schedule:when job allocated onto torus • Start:Job begins to run(?why 1 second) • Finish:when job completes & is deallocated

  13. Metrics • Torus size N • Arrival time of job j=taj • Execution time = tej • Size of job = sj • Start time = tsj • Finish time = tfj

  14. Parameters • Wait time: twj = tsj – taj • Response time: trj = tfj – taj • Bounded slowdown: • Bound used as some jobs skew slowdown due to very short exe times

  15. Parameters contd… • System Utilization: T is the make span • Total unused capacity: f(t) = free nodes at time t q(t) = total number of nodes requested at t Measure of work unused due to lack of jobs

  16. Parameters contd… • The product T*N – Maximum utilization of the system • Balance of the system capacity, considered lost

  17. Workload characteristics • Experiments performed on 10000-job span of 2 job logs • NASA Ames 128 node iPSC/860 • SDSC 128-node machines

  18. Work load Summary

  19. Size Vs Workload

  20. Wait time Vs Utilization

  21. Mean job slowdown Vs Utilization

  22. Comparing fully connected models

  23. Performing Migration • Recall… • Parameters to determine attempting a migration- FNtor and FNmax • FNtor = Free nodes:Size of Torus • FNmax = Free nodes in maximal free partition:Free nodes • Migration attempted when: • FNtor >= 0.1; FNmax <= 0.7

  24. Migrations Vs Utilization

  25. Average Time B/w Migrations Vs Utilization

  26. Comments…+,- • Compared the schedule when applied fully connected topologies • Studying effect of fragmentation on util,wait time and slowdown • How the schedule affected utilization • Could have given an Average job wait time statistics for each scheduler • Fragmentation important distinction • Could have compared capacity unused, using fully connected system as ideal

  27. Advantage of parameters • Frequency of migration attempts  • Avg benefits of successful migrn  • Comparison of job wait times with: • Scheduler that uses the parameters • Scheduler that always migrates

  28. Mean Job wait time Vs Utilization

  29. Capacity Statistics

  30. POP Algorithm • Projection of Partitions • Solves problem of finding largest free rectangular partition • Exhaustive search M9 for MxMxM Torus • POP is O(M5)

  31. Basic Algorithm • Given a base location from M3, find largest partition first in 1 dimension • Project adjacent dimension, find largest partition in 2D • Projects adjacent 2D planes, find largest partition in 3D

  32. The Algorithm • FREEPART = {<B,S>|B=base location (i,j,k); S=partition size (a,b,c), s.t  x,y,z i<=x<(i+a), j<=y<(j+b), k<=z<(z+c), Node(x%M,y%M,z%M) is free • Largest 1D partitions PFREEPART pre-computed for all 3 Ds in O(M4) time(Every possible base location)

  33. The Algorithm contd…

  34. Future Work

More Related