510 likes | 534 Views
This chapter provides an in-depth exploration of memory hardware organization and various memory-management techniques, including paging and segmentation. It also discusses the Intel Pentium's support for pure segmentation and segmentation with paging. Key concepts covered are background on memory access, address binding methods, logical vs. physical address space, MMU operation, dynamic loading, dynamic linking, and swapping.
E N D
Chapter 8: Memory Management • Background • Swapping • Contiguous Memory Allocation • Paging • Structure of the Page Table • Segmentation • Example: The Intel Pentium
Objectives • To provide a detailed description of various ways of organizing memory hardware • To discuss various memory-management techniques, including paging and segmentation • To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging
Background • Program must be brought from disk into main memory and placed within a process for it to be run • Main memory and registers are only storage CPU can access directly • Registers internal to CPU, accessed in one CPU cycle • Main memory accessed through memory bus, can take many CPU cycles • Causes memory stalls • Cache sits between main memory and CPU registers • Faster access, compensates for speed difference between CPU and memory • Accurate memory management and hardware-enforced memory protection are essential to safe multitasking
Background • Each process has its own memory space • Defined by base registerand limitregister • Hardware-enforced protection: CPU checks every memory access the process makes to insure it’s in the process’ space • Generates interrupt on illegal memory access • Only kernel-level privilege can supersede this for: • Operations to load/change the base & limit register addresses • Accessing other process’s memory space
Address Binding • Memory starts at 0x0, but program can start from any base address • Program code uses symbolic addresses (like function and variable names) • Compiler binds them to relocatable addresses (fixed addresses relative to the start of the program’s memory space) • Linker or loader binds relocatable addresses to fixed physical addresses in memory
Address Binding • Compile time • If memory location known a priori, absolute codecan be generated • Must recompile code if starting location changes • Load time • Compiler generates relocatable code if memory location is not known at compile time • Can be loaded at any starting location without recompiling, but must be reloaded if starting location changes during execution • Execution time • Binding delayed until run time if the process can be moved during its execution from one memory segment to another • Approach used by most OS
Logical vs. Physical Address Space • We need to differentiate two different kinds of memory addresses • A memory address in a program, as seen by the CPU, is a logicaladdress, also called a virtual address • The corresponding address in physical memory is a physicaladdress • Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme
Memory-Management Unit (MMU) • Run-time mapping from virtual to physical memory address done by special hardware: the MMU • Can see it as an interface hardware between CPU and memory • In reality, it’s part of the CPU (has been part of Intel CPU since 80286) • MMU adds the value in a relocation register (i.e. the base address) to every address generated by a user process at the time it is sent to memory • The user program only deals with logical addresses; it never sees the real physical addresses
Dynamic Loading • Routines are not loaded until it is called • Calling routine checks if called routine is in memory; load it if not • Until then, routine kept on disk in relocatable code format • Better memory-space utilization; unused routine is never loaded • Useful when large amounts of code are needed to handle infrequently occurring cases • Operating system provides dynamic loading library • User program’s responsibility to take advantage of dynamic loading
Dynamic Linking • Linking can be postponed until execution time • Dynamically linked libraries (*.DLL in Windows) • At compile time, a stub code is included in the binary instead of the routine; used to locate the appropriate library routine • At run time, stub replaces itself with the address of the routine in DLL, and executes the routine • Very advantageous • Only one copy of common routines • Saves disk/memory spaces, makes updates easier • Since routine can be loaded in another process’s protected memory space, kernel support needed to check if routine is in memory
Swapping • A process can be swapped temporarily out of memory to a backing store (typically, hard drive), and then brought back into memory for continued execution • Roll out, roll in: swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed • Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped • System maintains a special ready queue of runnable processes which have memory images on disk
Memory Protection • Several user processes loaded into memory at once • Registers used to protect user processes from each other, and from changing operating-system size due to transient code • Limit register contains range of logical addresses; each logical address must be less than the limit register or interrupt is generated • Relocation register contains value of base address in memory • MMU maps logical address to physical address dynamically
Contiguous Allocation • Multiple Partition Method • Memory divided in fixed-size partitions; each partition contains exactly one process • When a process terminates, its partition is freed and allocated to a new process • One of the earliest memory allocation techniques, used in IBM OS/360 (circa 1964) • Clearly limited • Limits multitasking to the number of partitions • Limits size of processes to size of partitions OS process 5 process 8 process 9 process 2
Contiguous Allocation • Variable Partition Scheme • OS keeps track of which parts of memory are occupied/free • Initially memory is empty; as processes are added and then removed, holes of available memory of various sizes form • When a process is added to memory, it is allocated memory from a hole large enough to accommodate it • If there isn’t one, OS can wait (reduce multitasking) or allocate a smaller process (risk starvation) • This is dynamic allocation OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2
Contiguous Allocation • Dynamic allocation problem: Given several possible holes of different sizes, how to pick the one to allocate process into? • First-fit: Allocate the first hole that is big enough • Fastest method • Best-fit: Allocate the smallest hole that is big enough • Must search entire list (unless ordered by size) • Produces the smallest leftover hole (least memory wasted) • Worst-fit: Allocate the largest hole • Must search entire list (unless ordered by size) • Produces the largest leftover hole (which can then be allocated to another process) • Good idea, but typically the worst of the three methods both in speed and memory use
Fragmentation • External Fragmentation • Total memory space exists to satisfy a request, but it is not contiguous • On average, a third of memory lost to fragmentation • Internal Fragmentation • Typically, memory is divided in blocks and allocated by block, rather than byte • Allocated blocks may be larger than requested memory (i.e. last block not entirely needed); the extra memory in the block is lost • Unused memory internal to a partition • Reduce external fragmentation by compaction • Shuffle memory contents to place all free memory together in one large hole • Done at execution time; only possible if dynamic relocation is allowed • Can be a very expensive operation OS process 5 process 9 process 10 process 2
Paging • Non-contiguous allocation eliminates problems of external fragmentation and compaction; also makes swapping easier • Paging • Supported by hardware • Physical memory divided into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 16 MB) • Logical memory divided into blocks of same size called pages • OS keeps track of all free frames • To run a program of size n pages, need to find n free frames and load program • Page table used to translate logical to physical addresses • Internal fragmentation is still a problem though
Paging Address Translation • Address generated by CPU is divided into: • Page number (p) used as an index into the page table which contains base address of each frame (f) in physical memory • Page offset (d) combined with base address to define the physical memory address that is sent to the memory unit • For given logical address space 2m divided in page size2n, a logical address translates to: page number page offset p d m - n n
Paging Address Translation 16-byte of logical memory and 4-byte pages/frames: m = 4 and n = 2 page number page offset 10 01 1001 = 9 (contains j) Page number 10 (2, directs to frame 1, which begins at 1x4 = 4 bytes in physical memory) Offset 01 (1, means +1byte) Total: 1x4 +1 = 5 bytes in physical memory (contains j)
Paging • No external fragmentation • Any free frame can be allocated • Internal fragmentation capped at almost 1 frame per process • Worst case: process needs n frames + 1 byte, meaning OS allocates n+1 frames and wastes almost entire last one • So we’d like frames as small as possible • Overhead for a page table for each process, which is in main memory • A page table entry is 4 bytes (32 bits) large • Size of page table proportional to the number of pages it contains • So we’d like pages as large as possible • Typically, pages 4KB in size • Can be larger – Solaris pages are 8Kb to 4MB in size
Frames • Process sees its memory space as a contiguous set of pages • In reality, OS loads it page by page in whichever frames are available • OS needs to maintain a frame table • How many frames there are • Which frames are allocated • Which frames are free
Frames After allocation Before allocation
Implementation of Page Table • Hardware implementation • Page table stored in CPU in dedicated registers • Fast address translation • Modified and updated only by the kernel’s dispatcher • Changing page table requires changing all registers • Number of registers limits number of page table entries; typically less than 256 • Not practical for modern computers, with millions of page table entries • Software implementation • Page table stored in main memory • CPU only stores the page table base register (PTBR) • Only one register to modify to change the page table • Every data/instruction access requires two memory accesses, one for the page table and one for the data/instruction • Slower translation
Translation Look-aside Buffer • The two-memory-access problem can be solved by the use of a special fast-lookup hardware cache called translation look-aside buffer(TLB) • TLB entry: <page number, frame number> • Special hardware search all keys simultaneously • Very fast • Very expensive, so limited to 64 – 1024 entries • If page is found (TLB hit), this takes less than 10% longer than direct memory access! • If not found (TLB miss) • Look for it in the page table • Add it to the TLB, clearing another entry if TLB is full • Some TLB include an address space identifier (ASID) with each entry, identifying the process to which it belongs • TLB can safely store pages from multiple processes • Without this info, TLB needs to be flushed each time a new page table is loaded
Translation Look-aside Buffer • The Effective Access Time(EAT) is the time to access an address in physical memory • For example • It takes 100 ns to access data in physical memory once the address is known • It takes 20 ns to look up an address in the TLB • 90% TLB hit rate • In the 10% TLB misses, we then need to access the page table in memory (an extra 100ns) • EAT for accessing memory directly (as in Real Mode) = 100ns • EAT for paging = 0.9 * (20 + 100) + 0.1 * (20 + 100 + 100) = 130ns
Memory Protection • Memory protection implemented by associating protection bits with each frame • Example: read / write / execute bits • Illegal operations caught by CPU and cause interrupt • Valid-invalid bit attached to each entry in the page table • “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page • “invalid” indicates that the page is not in the process’ logical address space
Valid or Invalid Bit Internal fragmentation page table length register
Shared Pages • Pages give us a new option for shared memory: shared pages • If reentrant code, execute-only code that cannot be modified, is used by several processes, we can put this code in shared pages that are part of all processes • Code is not modifiable, so no race condition / critical section problem • Shared code must appear in same location in the logical address space of all processes
Structure of the Page Table • Hierarchical Paging • Tables of page tables • Hashed Page Tables • Page table sorted by hash index • Inverted Page Tables • Table entries represent frames, maps to pages
Hierarchical Page Tables • Given a modern computer with large (32 bit) logical address space and small (4KB) pages, each process may have a million pages • Each page is represented by a 4 byte entry in the page table, giving a 4MB page table per process • Would be a lot to allocate in a contiguous block in main memory • Break up the logical address space into multiple page tables • A simple technique is a two-level page table • Outer page table point to (inner) page table • Page table points to page, as before
Two-Level Paging Translation • A logical address (on 32-bit machine with 4K page size) is divided into: • a page number consisting of 20 bits • a page offset (d) consisting of 12 bits • Since the page table is paged, the page number is further divided into: • a 10-bit page number (p1) • a 10-bit page offset (p2) • Thus, a forward-mappedlogical address translates as follows:
Three-Level Paging Scheme • Given a 64-bit processor and logical space, two-level paging may be insufficient • To keep inner page table size constant, outer page table grows considerably • Alternative is three-level paging • Second outer page table maps to outer page table, which maps to inner page table, which maps to memory • Second outer page still big, we could split it again • And again, and again… • But for each split, we need one more memory access to translate the address, which slows down the system
Hashed Page Tables • Hash function converts large & complex data into “unique” & uniformly-distributed integer values • Unique in theory, but in practice hash collisions occur where different data to map to the same hash value • Hashed page table • Apply hash function to page number • Entry in hashed page table is a linked list of <page#, frame#> entries, to account for collisions • Find the entry that matches the page number, and retrieve the frame number
Inverted Page Table • Page table has one entry for each page in logical memory, consisting the address of the frame storing it • Inverted page table has one entry for each frame in physical memory • Each entry consists of the address of the page stored in that memory location • With information about the process that owns that page • No need for one page table per process – only one table for the system • Use hash function to improve search
Segmentation • Segmentationis a memory-management scheme similar to a user view of memory • Memory is separated in segments such as “the code”, “the data”, “the subroutine library”, “the stack” • No real order or relation between them • Memory access is done by specifying two information, segment number and offset
1 4 2 3 Segmentation 1 2 3 4 physical memory user view
Segmentation Architecture • Logical address consists of <segment-number, offset> • Segment tableit to a physical addresses • Segment base: starting physical address • Segment limit: length of the segment
Example: The Intel Pentium • Part of the Intel IA-32 architecture • Pentium combines segmentation and paging • CPU generates logical address • Given to segmentation unit (GDT), which produces linear addresses • Given to paging unit, which generates physical address in main memory • Segmentation unit + paging unit MMU
Pentium Segmentation • Logical address space of a process divided in two partitions • Private segments to the process; information in local descriptor table • Shared segments between all processes; information stored in globaldescriptor table • Logical address is <selector, offset> • Selector is <segment, global/local, privilege level> • Converted to linear address • CPU • Six segment registers (CS, DS, ES, SS, FS, GS) • Six 8-byte cache registers to hold the corresponding descriptors; no need to fetch them from memory each time
Pentium Paging • Two-level paging scheme • 32-bit linear address • Page directory(outer page table) entry pointed by bits 22-31, points to an inner page table • Inner page table entry pointed by bits 12-21, points to 4KB page • Offset inside page pointed by bits 0-11 • Gives the physical address • CPU • Page directory of current process pointed to by Page Directory Base Register, contained in CR3 register linear address physical address
Example: Linux on Pentium • Only uses six segments • Kernel code; kernel data; user code; user data; task state segment (TSS); default LDT segment • All processes share user code/data segments, but have their own TSS and may create a custom LDT segment • Linux only recognises two privilege levels, kernel (00) and user (11)
Example: Linux on Pentium • Three-level paging scheme • Logical address is <global directory, middle directory, page table, offset> • But Pentium only has a two-level paging scheme! • On Pentium, Linux middle directory is 0 bits (i.e. non-existent)
Review • What is the difference between external and internal fragmentation? • What is the difference between logical and physical memory? • Compare the memory allocation schemes of contiguous variable-size memory allocation, contiguous segmentation, and paging, in terms of external fragmentation, internal fragmentation, and memory sharing.
Exercises • Read everything • If you have the “with Java” textbook, skip the Java sections and subtract 1 to the following section numbers • 8.1 • 8.2 • 8.3 • 8.5 • 8.6 • 8.9 • 8.11 • 8.13 • 8.14 • 8.15 • 8.17 • 8.23 • 8.25 • 8.26 • 8.27 • 8.28 • 8.29 • 8.32