1 / 17

A Dynamic Space Sharing Method For Resource Management Gabriel Mateescu

A Dynamic Space Sharing Method For Resource Management Gabriel Mateescu Research Computing Support Group National Research Council Canada Gabriel.Mateescu@nrc.ca HPCS 2001 presentation Windsor, Ontario, June 19-20, 2001. Agenda. Motivation Outline of the Approach Job Taxonomy

elden
Download Presentation

A Dynamic Space Sharing Method For Resource Management Gabriel Mateescu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Dynamic Space Sharing Method For Resource Management • Gabriel Mateescu Research Computing Support Group National Research Council Canada Gabriel.Mateescu@nrc.ca HPCS 2001 presentation Windsor, Ontario, June 19-20, 2001

  2. Agenda • Motivation • Outline of the Approach • Job Taxonomy • Pseudocode • Evaluation

  3. Motivation • Continuously increasing demand for computation resources is met by clusters, distributed or shared memory supercomputers • Dual objective: • optimization resource utilization, high throughput • quality of service to users: low turn-around time • Batch scheduling based on space sharing and static partitioning has limited scalability • Main contribution: provide a method for achieving both high resource utilization and low turn-around time

  4. The Problem • Parallel supercomputer/cluster shared among a number of divisions • Dual objective: • optimization resource utilization, high throughput • quality of service to users: low turn-around time • Batch scheduling based on space sharing and static partitioning has limited scalability • Main contribution: provide a method for achieving both high resource utilization and low turn-around time

  5. Example Parallel Computer Biotech Department Physics Department Node (CPU + memory) Partition boundary Job Requests

  6. Outline of the Approach • Dynamic space sharing method for batch job scheduling • Partition the resources into a set of dedicated queues • Dedicated queues own resources • Free resources can be borrowed by pending jobs for which there are not enough per queue resources • Borrowed resources are grouped in a shared queue • Borrowed resources can be reclaimed by the lending queue • Reclaiming is done by checkpointing jobs which hold borrowed resources

  7. Outline • The sum of the resources assigned to jobs in a dedicated queue does not exceed the resource limits of the queue • The difference between the total amount of resources and the resources currently assigned to the dedicated queues represents opportunity for scheduling jobs for which there are not enough per-queue resources • Each user belongs to a group and each group is authorized to submit jobs to some dedicated queues as well as to the shared queue

  8. Dedicated Resources Job 1 in queue 1: 1 x resource 1 + 2 x resource 2 Job 2 in queue 2: 2 x resource 1 + 1 x resource 2 Resource 1 Resource 2 Queue 1 Queue 2

  9. Borrowed Resources Job 3 in queue 1: 1 x resource 1 + 2 x resource 2 Resource 1 Resource 2 Queue 1 Queue 2

  10. Resource Reclaiming Job 4 in queue 2: 1 x resource 1 + 2 x resource 2 Resource 1 Resource 2 Queue 1 Queue 2

  11. Paths of a Job Submit queue new job Dedicated queue Dedicated queue Dedicated queue finished job Shared Queue

  12. Job Taxonomy • master job has resource requirements which can be satisfied from the free resources available to the queue • fittable job uses resources which can be satisfied by reclaiming some resources borrowed by the shared queue • movable job has resource requirements which exceed the amount of resources owned by the queue and not already allocated to jobs; however, the requirements of such a job may be satisfied from the system-wide free resources • blocked job there are not enough resources, either owned by its queue or available in other queues, that can satisfy the job's requirements. Or the job is not checkpointable

  13. Job State Transition Diagram Preempt slave new job pending job enough per-queue resources movable job master job fittable job preempt slave slave job start job running master running slave finished job

  14. scheduler ( ) { queues = sort_dedicated_queues(); while ( scheduling_is_on ) { new_jobs = get jobs_in_submit queue(); dispatch_to_dedicated_queue(new_jobs); foreach queue in ( queues ) { jobs = get_pending_jobs(queue); order_jobs (jobs); foreach job in ( jobs ) { type = get_job_type(job); resources = get_job_resources(job); if ( type == master || type == fittable ) { if ( type == fittable) { victim_jobs = reclaim(resources); re_queue(preempted jobs); } allocate_resources(resources); start_job(job); } else if (type == movable ) { ok = system_resources(resources); if ( ok ) { move_to_shared_queue(job); start_job(job); } } } } } Pseudocode

  15. Job Statistics • SGI Origin 2000 with 108 CPUs and 48 GB of main memory • Resources are partitioned among six dedicated queues defined for six groups of users • Average system load, including short interactive jobs ~ 94 • Total jobs running 33, CPUs allocated=103, memory=39 GB • Slave jobs running 11, CPUs allocated =22, memory=11 GB • Jobs Waiting 3 • Checkpoints/day per slave job ~1.5

  16. Advantages • Combine the advantages of space sharing and time sharing scheduling • Space sharing gives resource allocation for the duration of the job and predictable execution time • Time sharing improves resource utilization • We combine space sharing with job preemption • Selection of which jobs are preempted is made in terms of the current usage of the resources, rather than based on a static job priority

  17. Evaluation • Complexity: O(J N R + J log J) J = number of pending or slave jobs, N = number of supernodes; R = number of types of resources • Reduce the waiting time of the jobs by harnessing resources not used by the dedicated queues • Reduce job execution-time by reserving resources for all but the slave the jobs • No job fitting in a dedicated queue can be prevented from running by a slave job

More Related