Process Scheduling

Process Scheduling 國立中正大學資訊工程研究所羅習五　老師

Outline • OS schedulers • Unix scheduling • Linux scheduling • Linux 2.4 scheduler • Linux 2.6 scheduler • O(1) scheduler • CFS

Introductionpreemptive & cooperative multitasking • A multitasking operating system is one that can simultaneously interleave execution of more than one process. • Multitasking operating systems come in two flavors: cooperative multitasking and preemptive multitasking. • Linux provides preemptive multitasking • MAC OS 9 and earlier being the most notable cooperative multitasking .

UNIX Scheduling Policy • Scheduling policy determines what runs when • fast process response time (low latency) • maximal system utilization (high throughput) • Processes classification: • I/O-bound processes: spends much of its time submitting and waiting on I/O requests • Processor-bound processes: spend much of their time executing code • Unix variants tends to favor I/O-bound processes, thus providing good process response time

Linux scheduler – Process Priority • Linux’s priority-based scheduling • Rank processes based on their worth and need for processor time. • processes with a higher priority also receive a longer timeslice. • Both the user and the system may set a process's priority to influence the scheduling behavior of the system. • Dynamic priority-based scheduling • Begins with an initial base priority • Then enables the scheduler to increase or decrease the priority dynamically to fulfill scheduling objectives. • E.g., a process that is spending more time waiting on I/O will receive an elevated dynamic priority.

Linux scheduler – Priority Ranges • Two separate priority ranges. • nice value, from -20 to +19 with a default of 0. • Larger nice values correspond to a lower priority. (you are being nice to the other processes on the system). • real-time priority, by default range from 0 to 99. • All real-time processesare at a higher prioritythan normal processes. • Linux implements real-time priorities in accordance with POSIX standards on the matter.

scheduler – priority

Timeslice • The timesliceis the numeric value that represents how long a task can run until it is pre-empted. • too short => large overhead of switching process • too long => poor interactive response • Linux’s CFS scheduler does not directly assign timeslices to processes. • CFS assigns processes a proportion of the processor. • the amount of processor time that a process receives is a function of the load of the system

2.4 scheduler

2.4 scheduler - SMP busy run queue busy

2.4 scheduler - SMP IDLE search & estimate run queue busy

2.4 scheduler - SMP busy run queue busy

2.4 scheduler • Non-preemptiblekernel • Set p->need_resched if schedule() should be invoked at the ‘next opportunity‘ (kernel => user mode). • Round-robin • task_struct->counter: number of clock ticks left to run in this scheduling slice, decremented by a timer.

2.4 scheduler • Check if schedule() was invoked from interrupt handler (due to a bug) and panic if so. • Use spin_lock_irq() to lock ‘runqueue_lock’ • Check if a task is ‘runnable’ • in TASK_RUNNING state • in TASK_INTERRUPTIBLE state and a signal is pending • Examine the ‘goodness’ of each process • Context switch

2.4 scheduler – ‘goodness’ • ‘goodness’: identifying the best candidate among all processes in the runqueue list. • ‘goodness’= 0: the entity has exhausted its quantum. • 0 < ‘goodness’< 1000: the entity is a conventional process/thread that has not exhausted its quantum; a higher value denotes a higher level of goodness.

2.4 scheduler – ‘goodness’(to improve multithreading performance) if (p->mm == prev->mm) return p->counter + p->priority + 1; else return p->counter + p->priority; • A small bonus is given to the task pif it shares the address space with the previous task.

2.4 scheduler - SMP Examine the processor field of the processes and gives a consistent bonus (that is PROC_CHANGE_PENALTY, usually 15) to the process that was last executed on the ‘this_cpu’ CPU.

Recalculating Timeslices(kernel 2.4) • Problems: • Can take a long time. Worse, it scales O(n) for n tasks on the system. • Recalculation must occur under some sort of lock protecting the task list and the individual process descriptors. This results in high lock contention. • Nondeterminism is a problem with deterministic real-time programs.

Processes classification • Definition: • I/O-bound processes: spends much of its time submitting and waiting on I/O requests • Processor-bound processes: spend much of their time executing code • Linux tends to favor I/O-bound processes, thus providing good process response time • How to classify processes?

Time quantum = 0 (CPU bound) Time quantum ≠ 0 I/O bound tq=0 tq=0 tq=0 High priority tasks tq=0 tq≠0 tq≠0 tq≠0 tq≠0 tq=???

Scheduling policy time_quantumnew = bonusI/O + timestatic time_quantumnew = time_quantumold/2 + time_quantum_table[static_priority] & dynamic_priority ≈ time_quantumnew

Time quantum = 0 (CPU bound) Time quantum ≠ 0 I/O bound tq=0 tq=0 tq=0 tq=? tq≠0 tq≠0 tq≠0 tq≠0 tq=???

Time quantum ≠ 0 I/O bound tq=0 tq=0 tq=0 tq=? tq≠0 tq≠0 tq≠0 tq≠0 tq=???

2.4 scheduler - performance • The algorithm does not scale well • It is inefficient to re-compute all dynamic priorities at once. • The predefined quantum is too large for high system loads (for example: a server) • I/O-bound process boosting strategy is not optimal • a good strategy to ensure a short response time for interactive programs, but… • some batch programs with almost no user interaction are I/O-bound.

2.6 scheduler

2.6 scheduler run queue task migration (put + pull) run queue

2.6 scheduler –User Preemption • User preemption can occur • When returning to user-space from a system call • When returning to user-space from an interrupt handler

2.6 scheduler –Kernel Preemption • The Linux kernel is a fully preemptive kernel. • It is possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule. • “safe to reschedule”: kernel does not hold a lock • The Linux design: • adding of a preemption counter, preempt_count, to each process's thread_info • This count increments once for each lock that is acquired and decrements once for each lock that is released • Kernel preemption can also occur explicitly, when a task in the kernel blocks or explicitly calls schedule(). • no additional logic is required to ensure that the kernel is in a state that is safe to preempt!

Kernel Preemption • Kernel preemption can occur • When an interrupt handler exits, before returning to kernel-space • When kernel code becomes preemptible again • If a task in the kernel explicitly calls schedule() • If a task in the kernel blocks (which results in a call to schedule())

O(1) & CFS scheduler • 2.5 ~ 2.6.22: O(1) scheduler • Time complexity: O(1) • Using “run queue” (an active Q and an expired Q) to realize the ready queue • 2.6.23~present: Completely Fair Scheduler (CFS) • Time complexity: O(log n) • the ready queue is implemented as a red-black tree

2.6 scheduler – O(1)

O(1) scheduler • Implement fully O(1) scheduling. • Every algorithm in the new scheduler completes in constant-time, regardless of the number of running processes. (Since the 2.5 kernel). • Implement perfect SMP scalability. • Each processor has its own locking and individual runqueue. • Implement improved SMP affinity. • Attempt to group tasks to a specific CPU and continue to run them there. • Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes. • Provide good interactive performance. • Even during considerable system load, the system should react and schedule interactive tasks immediately. • Provide fairness. • No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice. • Optimize for the common case of only one or two runnable processes, yet scale well to multiple processors, each with many processes.

The Priority Arrays • Each runqueue contains two priority arrays (defined in kernel/sched.c as structprio_array) • Active array: all tasks with timeslice left. • Expired array: all tasks that have exhausted their timeslice. • Priority arrays provide O(1) scheduling. • Each priority array contains one queue of runnable processors per priority level. • The priority arrays also contain a priority bitmap used to efficiently discover the highest-priority runnable task in the system.

The Linux O(1) scheduler algorithm

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched.cas structprio_array) • Active array: all tasks with timesliceleft. • Expired array: all tasks that have exhausted their timeslice. • Priority arrays provide O(1) scheduling. • Each priority array contains one queue of runnable processors per priority level. • The priority arrays also contain a priority bitmap used to efficiently discover the highest-priority runnable task in the system.

Each runqueue contains two priority arrays – active and expired. • Each of these priority arrays contains a list of tasks indexed according to priority runqueue Priority queue (0-139) active expired

Linux assigns higher-priority tasks longer time-slice runqueue Time quantum ≈ 1/priority tsk1 tsk2 tsk3 expired active

Linux chooses the task with the highest priority from the active array for execution. runqueue tsk1 tsk2 tsk3 expired active

runqueue tsk1 Round-robin tsk2 tsk3 expired active

runqueue tsk1 Round-robin tsk3 tsk2 expired active

runqueue tsk1 tsk2 tsk3 expired active

Most tasks have dynamic priorities that are based on their “nice” value (static priority) plus or minus 5 • Interactivity of a task ≈ 1/sleep_time runqueue dynPrio = staticPrio + bonus bonus = -5 ~ +5 bonus ≈ 1/sleep_time tsk1 tsk3 tsk2 tsk3 I/O bound expired active

When all tasks have exhausted their time slices, the two priority arrays are exchanged! runqueue tsk1 tsk3 tsk2 expired active

The O(1) scheduling algorithm sched_find_first_bit() 1 1 1 tsk1 tsk3 tsk2

The O(1) scheduling algorithm Insert O(1) Remove O(1) 1 1 1 find first set bit O(1)

find first set bit O(1) static inline unsigned long __ffs (unsigned long word) { int num = 0; #if BITS_PER_LONG == 64 if ((word & 0xffffffff) == 0) { num += 32; word >>= 32; } #endif if ((word & 0xffff) == 0) { num += 16; word >>= 16; } if ((word & 0xff) == 0) { num += 8; word >>= 8; } if ((word & 0xf) == 0) { num += 4; word >>= 4; } if ((word & 0x3) == 0) { num += 2; word >>= 2; } if ((word & 0x1) == 0) num += 1; return num; }

2.6 scheduler - CFS

2.6 scheduler –CFS • The inventor of the CFS set himself a goal of devising a scheduler capable of the fair devision of available CPU power among all tasks. • If one had an ideal multitasking computer capable of concurrent execution on N processes then every process would get exactly 1/N-th of its available CPU power.

Process Scheduling