Concurrent Programming

Concurrent Programming Introducing the principles of reentrancy, mutual exclusion and thread-synchronication

Advantages of multithreading • For multiprocessor systems (two or more CPUs), there are potential efficiencies in the parallel execution of separate threads (a computing job may be finished sooner) • For uniprocessor systems (just one CPU), there are likely software design benefits in dividing a complex job into simpler pieces (easier to debug and maintain -- or reuse)

Some Obstacles • Separate tasks need to coordinate actions, share data, and avoid competing for same system resources • Management ‘overhead’ could seriously degrade the system’s overall efficiency • Examples: • Frequent task-switching is costly in CPU time • Busy-Waiting is wasteful of system resources

Some ‘work-arounds’ • Instead of using ‘pipes’ for the exchange of data among separate processes, Linux lets ‘threads’ use the same address-space (reduces ‘overhead’ in context-switching) • Instead of requiring one thread to waste time busy-waiting while another finishes some particular action, Linux lets a thread voluntarily give up its control of the CPU

Additional pitfalls • Every thread needs some private memory that cannot be ‘trashed’ by another thread (for example, it needs a private stack for handling interrupts, passing arguments to functions, creating local variables, saving CPU register-values temporarily) • Each thread needs a way to prevent being interrupted in a ‘critical’ multi-stage action

Example of a ‘critical section’ • If interrupt occurs • Recall Disk-Drive device-programming (status-register and control-register) • Algorithm: • (1) Loop rereads status-register until ‘ready’ • (2) Write drive-command to control-register • If an interrupt occurs between these steps, another thread can send its own command

‘mutual exclusion’ • To prevent one thread from ‘sabotaging’ the actions of another, some mechanism is needed that allows a thread to temporarily ‘block’ other threads from gaining control of the CPU -- until the first thread has completed its ‘critical’ action • Some ways to accomplish this: • Disable interrupts (stops CPU time-sharing) • Use a ‘mutex’ (a mutual exclusion variable) • Put other tasks to sleep (remove from run-queue)

What about ‘cli’? • Disabling interrupts will stop ‘time-sharing’ among tasks on a uniprocessor system • But it would be ‘unfair’ in to allow this in a multi-user system (monopolize the CPU) • So ‘cli’ is a privileged instruction: it cannot normally be executed by user-mode tasks • It won’t work on a multiprocessor system

What about a ‘mutex’? • A shared global variable acts as a ‘lock’ • Initially it’s ‘unlocked’: e.g., int mutex = 1; • Before entering a ‘critical section’ of code, a task ‘locks’ the mutex: i.e., mutex = 0; • As soon as it leaves its ‘critical section’, it ‘unlocks’ the mutex: i.e., mutex = 1; • While the mutex is ‘locked’, no other task can enter the ‘critical section’ of code

Advantages and cautions • A mutex can be used in both uniprocessor and multiprocessor systems – provided it is possible for a CPU to ‘lock’ the mutex with a single ‘atomic’ instruction (requires special support by processors’ hardware) • Use of a mutex can introduce busy-waiting by tasks trying to enter the ‘critical section’ (thereby severely degrading performance)

Software mechanism • The operating system can assist threads needing mutual exclusion, simply by not scheduling other threads that might want to enter the same ‘critical section’ of code • Linux accomplishes this by implementing ‘wait-queues’ for those threads that are all contending for access to the same system resource – including ‘critical sections’

Demo programs • To show why ‘synchronization’ is needed in multithreaded programs, we wrote the ‘concur1.cpp’ demo-program • Here several separate threads will all try to increment a shared ‘counter’ – but without any mechanism for doing synchronization • The result is unpredictable – a different total is gotten each time the program runs!

How to employ a ‘mutex’ • Declare a global variable: int mutex = 1; • Define a pair of shared subroutines • void enter_critical_section( void ); • void leave_critical_section( void ); • Insert calls to these subroutines before and after accessing the global ‘counter’

Special x86 instructions • We need to use x86 assembly-language (to implement ‘atomic’ mutex-operations) • Several instruction-choices are possible, but ‘btr’ and ‘bts’ are simplest to use: • ‘btr’ means ‘bit-test-and-reset’ • ‘bts’ means ‘bit-test-and’set’ • Syntax and semantics: • asm(“ btr $0, mutex “); // acquire the mutex • asm(“ bts $0, mutex “); // release the mutex

The two mutex-functions void enter_critical_section( void ) { asm(“spin: btr $0, mutex “); asm(“ jnc spin “); } void leave_critical_section( void ) { asm(“ bts $0, mutex “); }

Where to use the functions void my_thread( int * data ) { int i, temp; for (i = 0; i < maximum; i++) { enter_critical_section(); temp = counter; temp += 1; counter = temp; leave_critical_section(); } }

‘reentrancy’ • By the way, we point out as an aside that our ‘my_thread()’ function (on the previous slide) is an example of ‘reentrant’ code • More than one process (or processor) can be safely executing it concurrently • It needs to obey two cardinal rules: • It contains no ‘self-modifying’ instructions • Access to shared variables is ‘exclusive’

In-class exercise #1 • Rewrite the ‘concur1.cpp’ demo-program, as ‘concur2.cpp’, inserting these functions that will implement ‘mutual exclusion’ for our thread’s ‘critical section’ • Then try running your ‘concur2.cpp’ on a uniprocessor system (your workstation) • Also try running your ‘concur2.cpp’ on a multiprocessor system (e.g., dept server)

The x86 ‘lock’ prefix • In order for the ‘btr’ instruction to perform an ‘atomic’ update (when multiple CPUs are using the same bus to access memory simultaneously), it is necessary to insert an x86 ‘lock’ prefix, like this: asm(“ spin: lock btr $0, mutex “); • This instruction ‘locks’ the shared system-bus during this instruction execution -- so another CPU cannot intervene

In-class exercise #2 • Add the ‘lock’ prefix to your ‘concur2.cpp’ demo, and then try executing it again on the multiprocessor system • Use the Linux ‘time’ command to measure how long it takes for your demo to finish • Observe the ‘degraded’ performance due to adding the ‘mutex’ functions – penalty for achieving a ‘correct’ parallel program

The ‘nanosleep()’ system-call • Your multithreaded demo-program shows poor performance because your threads are doing lots of ‘busy-waiting’ • When a thread can’t acquire the mutex, it should voluntarily give up control of the CPU (so another thread can do real work) • The Linux ‘nanosleep()’ system-call allows a thread to ‘yield’ its time-slice

In-class exercise #3 • Revice your ‘concur3.cpp’ program so that a thread will ‘yield’ if it cannot immediately acquire the mutex (see our ‘yielding.cpp’ demo for header-files and call-syntax) • Use the Linux ‘time’ command to compare the performance of ‘concur3’ and ‘concur2’ • On a uniprocessor platform • On a multiprocessor platform

Concurrent Programming