Multiprocessors, Threads and Microkernels

Multiprocessors, Threadsand Microkernels Fred Kuhns

Motivation for Multiprocessors • Enhanced Performance - • Concurrent execution of tasks for increased throughput (between processes) • Exploit Concurrency in Tasks (Parallelism within process) • Fault Tolerance - • graceful degradation in face of failures

Basic MP Architectures • Single Instruction Single Data (SISD) • conventional uniprocessor designs. • Single Instruction Multiple Data (SIMD) • Vector and Array Processors • Multiple Instruction Single Data (MISD) • Not Implemented. • Multiple Instruction Multiple Data (MIMD) • conventional MP designs

MIMD Classifications • Tightly Coupled System- all processors share the same global memory and have the same address spaces (Typical SMP system). • Main memory for IPC and Synchronization. • Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer). • Message passing for IPC and synchronization.

CPU CPU CPU CPU cache MMU cache MMU cache MMU cache MMU Interconnection Network MM MM MM MM MP Block Diagram

Memory Access Schemes • Uniform Memory Access (UMA) • Centrally located • All processors are equidistant (access times) • NonUniform Access (NUMA) • physically partitioned but accessible by all • processors have the same address space • NO Remote Memory Access (NORMA) • physically partitioned, not accessible by all • processors have own address space

Other Details of MP • Interconnection technology • Bus • Cross-Bar switch • Multistage Interconnect Network • Caching - Cache Coherence Problem! • Write-update • Write-invalidate • bus snooping

MP OS Structure - 1 • Separate Supervisor - • all processors have own copy of the kernel. • Some share data for interaction • dedicated I/O devices and file systems • good fault tolerance but bad for concurrency • Master/Slave Configuration • Master: monitors status and assigns work • Slaves: schedulable pool of resources • master can be bottleneck • poor fault tolerance

MP OS Structure - 2 • Symmetric Configuration - Most Flexible. • all processors are autonomous, treated equal • one copy of the kernel executed concurrently across all processors • Synchronized access to shared data structures: • Lock entire OS - Floating Master • Mitigated by dividing OS into segments that normally have little interaction • multithread kernel and control access to resources (continuum)

MP Overview MultiProcessor SIMD MIMD Shared Memory (tightly coupled) Distributed Memory (loosely coupled) Symmetric (SMP) Clusters Master/Slave

SMP OS Design Issues • Threads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency. • Process Synchronization - disabling interrupts is not sufficient. • Process Scheduling - efficient, policy controlled, task scheduling. Issues: • Global versus Local (per CPU) • Task affinity for a particular CPU • resource accounting • inter-thread dependencies

SMP OS design issues - cont. • Memory Management - complication of shared main memory. • cache coherence • memory access synchronization • balancing overhead with increased concurrency • Reliability and fault Tolerance - degrade gracefully in the event of failures

Main Memory Typical SMP System CPU CPU CPU CPU 500MHz cache MMU cache MMU cache MMU cache MMU System/Memory Bus • Issues: • Memory contention • Limited bus BW • I/O contention • Cache coherence I/O subsystem 50ns Bridge INT ether System Functions (timer, BIOS, reset) scsi • Typical I/O Bus: • 33MHz/32bit (132MB/s) • 66MHz/64bit (528MB/s) video

Some Useful Definitions • Parallelism: degree to which a multiprocessor application achieves parallel execution • Concurrency: Maximum parallelism an application can achieve with unlimited processors • System Concurrency: kernel recognizes multiple threads of control in a program • User Concurrency: User space threads (coroutines) provide a natural programming model for concurrent applications.

Introduction to Threads Multithreaded Process Model Single-Threaded Process Model Thread Thread Thread Thread Control Block Thread Control Block Thread Control Block Process Control Block User Stack Process Control Block User Stack User Stack User Stack User Address Space Kernel Stack User Address Space Kernel Stack Kernel Stack Kernel Stack

Process Concept Embodies • Unit of Resource ownership - process is allocated a virtual address space to hold the process image • Unit of Dispatching- process is an execution path through one or more programs • execution may be interleaved with other processes • These two characteristics are treated independently by the operating system

Threads • Effectiveness of parallel computing depends on the performanceof the primitives used to express and control parallelism • Separate notion of execution from Process abstraction • Useful for expressing the intrinsic concurrency of a program regardless of resulting performance • We will discuss three examples of threading: • User threads, • Kernel threads and • Scheduler Activations

Threads cont. • Thread : Dynamic object representing an execution path and computational state. • One or more threads per process, each having: • Execution state (running, ready, etc.) • Saved thread context when not running • Execution stack • Per-thread static storage for local variables • Shared access to process resources • all threads of a process share a common address space.

Thread States • Primary states: • Running, Ready and Blocked. • Operations to change state: • Spawn: new thread provided register context and stack pointer. • Block: event wait, save user registers, PC and stack pointer • Unblock: moved to ready state • Finish: deallocate register context and stacks.

User Level Threads • User level threads - supported by user level threads libraries • Examples • POSIX Pthreads, Mach C-threads, Solaris threads • Benefits: • no modifications required to kernel • flexible and low cost • Drawbacks: • can not block without blocking entire process • no parallelism (not recognized by kernel)

Kernel Level Threads • Kernel level threads - directly supported by kernel, thread is the basic scheduling entity • Examples: • Windows 95/98/NT/2000, Solaris, Tru64 UNIX, BeOS, Linux • Benefits: • coordination between scheduling and synchronization • less overhead than a process • suitable for parallel application • Drawbacks: • more expensive than user-level threads • generality leads to greater overhead

Scheduler Activations • Attempt to combine benefits of both user and kernel threading support • blocking system call should not block whole process • user space library should make scheduling decisions • efficiency by avoiding unnecessary user, kernel mode switches. • Kernel assigns a set of virtual processors to each process. User library then schedules threads on these virtual processors.

Scheduler Activations • An activation: • execution context for running thread • Kernel passes new activation to library when upcall is performed. • Library schedules user threads on activations. • space for kernel to save processor context of current user thread when stopped by kernel • upall performed when one of the following occurs: • user thread performs blocking system call • blocked thread belonging to process, then its library is notified allowing it to either schedule a new thread or resume the preempted thread.

Pthreads • a POSIX standard (IEEE 1003.1c) API for thread creation and synchronization. • API specifies behavior of the thread library, implementation is up to development of the library. • Common in UNIX operating systems.

UNIX Support for Threading • BSD: • process model only. 4.4 BSD enhancements. • Solaris • user threads, kernel threads, LWPs and in 2.6 Scheduler Activations • Mach • kernel threads and tasks. Thread libraries provide semantics of user threads, LWPs and kernel threads. • Digital UNIX - extends MACH to provide usual UNIX semantics. • Pthreads library.

Solaris Threads • Supports: • user threads (uthreads) via libthread and libpthread • LWPs, abstraction that acts as a virtual CPU for user threads. • LWP is bound to a kthread. • kernel threads (kthread), every LWP is associated with one kthread, however a kthread may not have an LWP • interrupts as threads

Solaris kthreads • Fundamental scheduling/dispatching object • all kthreads share same virtual address space (the kernels) - cheap context switch • System threads - example STREAMS, callout • kthread_t, /usr/include/sys/thread.h • scheduling info, pointers for scheduler or sleep queues, pointer to klwp_t and proc_t

Solaris LWP • Kernel provided mechanism to allow for both user and kernel thread implementation on one platform. • Bound to akthread • LWP data (see /usr/include/sys/klwp.h) • user-level registers, system call params, resource usage, pointer to kthread_t and proc_t • All LWPs in a process share: • signal handlers • Each may have its own • signal mask • alternate stack for signal handling • No global name space for LWPs

Solaris User Threads • Implemented in user libraries • library provides synchronization and scheduling facilities • threads may be bound to LWPs • unbound threads compete for available LWPs • Manage thread specific info • thread id, saved register state, user stack, signal mask, priority*, thread local storage • Solaris provides two libraries: libthread and libpthread. • Try man thread or man pthreads

Solaris Thread Data Structures proc_t p_tlist kthread_t t_procp t_lwp klwp_t t_forw lwp_thread lwp_procp

L L L L ... ... ... P P P Solaris Threading Model (Combined) Process 2 Process 1 user Int kthr kernel hardware

Solaris User Level Threads Stop Wakeup Runnable Continue Stop Stopped Sleeping Preempt Dispatch Stop Active Sleep

Solaris Lightweight Processes Timeslice or Preempt Stop Running Dispatch Wakeup Blocking System Call Runnable Stopped Continue Wakeup Stop Blocked

Solaris Interrupts • One system wide clock kthread • pool of 9 partially initialized kthreads per CPU for interrupts • interrupt thread can block • interrupted thread is pinned to the CPU

Solaris Signals and Fork • Divided into Traps (synchronous) and interrupts (asynchronous) • each thread has its own signal mask, global set of signal handlers • Each LWP can specify alternate stack • fork replicates all LWPs • fork1 only the invoking LWP/thread

Mach • Two abstractions: • Task - static object, address space and system resources called port rights. • Thread - fundamental execution unit and runs in context of a task. • Zero or more threads per task, • kernel schedulable • kernel stack • computational state • Processor sets - available processors divided into non-intersecting sets. • permits dedicating processor sets tasks

Mach c-thread Implementations • Coroutine-based - multiple user threads onto a single-threaded task • Thread-based - one-to-one mapping from c-threads to Mach threads. Default. • Task-based - One Mach Task per c-thread.

Digital UNIX • Based on Mach 2.5 kernel • Provides complete UNIX programmers interface • 4.3BSD code and ULTRIX code ported to Mach • u-area replaced by utask and uthread • proc structure retained • Threads: • Signals divided into synchronous and asynchronous • global signal mask • each thread can define its own handlers for synchronous signals • global handlers for asynchronous signals

Windows 2000 Threads • Implements the one-to-one mapping. • Each thread contains - a thread id - register set - separate user and kernel stacks - private data storage area

Linux Threads • Linux refers to them as tasks rather than threads. • Thread creation is done through clone() system call. • Clone() allows a child task to share the address space of the parent task (process)

4.4 BSD UNIX • Initial support for threads implemented but not enabled in distribution • Proc structure and u-area reorganized • All threads have a unique ID • How are the proc and u areas reorganized to support threads?

Microkernel • Transition to Microkernel discussion

Microkernel • Small operating system core • Contains only essential operating systems functions • Many services traditionally included in the operating system are now external subsystems • device drivers • file systems • virtual memory manager • windowing system and security services

Microkernel Benefits • Portability • isolate port specific code to microkernel • Reliability • modular design, small microkernel, simpler validation • Uniform interface • all services are provided by means of message passing • Extensibility • allows the addition of new services

Microkernel Benefits • Flexibility • existing features can be subtracted • Distributed system support • message are sent without knowing what the target machine is or where it is located • Object-oriented operating system • components are objects with clearly defined interfaces that can be interconnected to form software

Microkernel Design • Primitive memory management • mapping each virtual page to a physical page frame: grant, map and flush. • Inter-process communication • I/O and interrupt management

Multiprocessors, Threads and Microkernels

Multiprocessors, Threads and Microkernels

Presentation Transcript

Multiprocessors

Threads, SMP and Microkernels

Multiprocessors and Threads

Multiprocessors and Threads

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Multiprocessors and Threads

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Operating System 4 THREADS, SMP AND MICROKERNELS

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels

Threads, SMP and Microkernels

Threads, SMP, and Microkernels

Threads, SMP, and Microkernels