270 likes | 350 Views
Paging. Segmentation helps deal with: modularity, sharability, protection, relocation Paging can help with: insufficient memory, external fragmentation, efficiency of memory use Physical memory divided into fixed size page-frames e.g. 4096 bytes Virtual pages mapped into page-frames:.
E N D
Operating Systems: Paging/1 Paging Segmentation helps deal with: • modularity, sharability, protection, relocation Paging can help with: • insufficient memory, external fragmentation, efficiency of memory use • Physical memory divided into fixed size page-frames e.g. 4096 bytes • Virtual pages mapped into page-frames: CPU page offset frame offset frame page table physical memory
Operating Systems: Paging/1 page table page 0 1 0 1 2 3 4 5 6 7 page 1 4 page 0 • Virtual address : • total virtual address space of 2p+d , 2p pages, each of 2d bytes • e.g. p = 20, d = 12 : 232 bytes virtual space : 220 pages of 4096 bytes each • virtual space usually very much larger than amount of physical memory • 4Gb physical memory soon possible and common? • in the near future, 64-bit addressing to extend virtual address space: • Intel Itanium, AMD Clawhammer, DEC Alpha etc. page 2 3 virtual memory page 3 6 page 2 physical memory page 1 page 3 . . . . . . . . page number page offset p bits d bits
Operating Systems: Paging/1 • Page table the same length as the number of virtual pages • can get extremely long • e.g. 220 pages, each page table entry 4 bytes typically = 4Mb • 252 pages in a 64-bit virtual memory = 254 bytes! • bigger pages a possible solution? • e.g. 1Mb pages still gives page table length of 244 • usually have a page table limit hardware register : • hardware checks page number within the limit for each translation • limits the length of the page table • but reduces the size of virtual space – undesirable! • Each process has its own virtual space • so each needs its own page table • may be hundreds of processes around even on a single user workstation • e.g. scheduling, device driver, comms system processes etc. • though many of these will be small and not need a large page table • Need a better system – multi-level page tables described later
Operating Systems: Paging/1 • Not all virtual pages can be mapped into physical memory at once • a Presence (or Valid) bit part of each page table entry • P=1 : page present in memory • P=0 : page not present in memory – page-frame number invalid • page may be temporarily stored on hard disc • may not even exist – no problem with gaps in virtual space! • if the page table entry corresponding to a presented virtual address has P=0 • a page-fault interrupt generated by address translation hardware P page 0 1 1 0 1 2 3 4 5 6 7 page 1 0 page 0 page 2 1 3 virtual memory page 3 1 6 page 2 physical memory page 3 . . . . . . . .
Operating Systems: Paging/1 • Demand Paging : • invoked when a page-fault interrupt occurs • a policy in which : • the operating system automatically retrieves the page from hard disc • allows the process to continue with just a short hiatus • transparent to the user • steps: • check that the missing page actually exists in virtual space (not in a gap) • find a spare page-frame in memory • may involve removing an existing page according to some page replacement policy • initiate transfer of missing page from hard disc into spare page-frame • wait until transfer complete • schedule another process in the meantime • update the page table entry with page-frame number • set P = 1 in the page table entry • put process back on the Run Queue • when process dispatched onto the CPU, instruction causing page-fault is retried • this time no page-fault interrupt will occur
Operating Systems: Paging/1 page table A • Multiple processes in physical memory : page A0 1 1 page A1 0 physical memory page A2 1 3 0 1 2 3 4 5 6 7 8 9 10 11 process A virtual memory page A3 1 6 page A0 . . . . . . . . page A2 page B1 page A3 page table B page B0 1 8 page B0 page B1 1 4 page B2 0 page B3 process B virtual memory page B3 1 10 . . . . . . . .
Operating Systems: Paging/1 • Pages shared in memory • e.g. a text editor used by two processes – each with their own data page table A editor0 1 1 editor1 1 6 physical memory editor2 1 10 0 1 2 3 4 5 6 7 8 9 10 11 process A virtual memory editor0 data A 1 3 . . . . . . . . data A editor1 page table B editor0 1 1 data B editor1 1 6 editor2 1 10 editor2 process B virtual memory data B 1 8 . . . . . . . .
Operating Systems: Paging/1 • Benefits of Paging : • avoids external fragmentation – no unusable holes in memory • some internal fragmentation • the last part of the last page in a sequence may be wasted • more waste the larger the page size • small page size : • less internal fragmentation – more efficient memory utilisation • larger page tables and other kernel tables e.g. free page list • higher kernel overheads – dealing with more individual pages • larger page size : • longer disc transfer times • hidden internal fragmentation - less of each page may actually be used • smaller Translation Lookaside Buffer needed • small pages often grouped together by OSes to make larger effective pages • saves some overheads • e.g. Sun SPARC 4Kb pages, Intel x86: 4Kb pages (with 4Mb option)
Operating Systems: Paging/1 • Benefits of Paging : • gets around lack of physical memory • a large virtual address space mapped into whatever physical memory is available • up to the OS to achieve acceptable performance with demand paging • the more page-frames allocated to a process the fewer page-faults and consequent demand page-in delays • page-frames must be shared equitably between the processes demanding memory space – memory management • Page sharing : • usually more convenient to share at the module level • make the whole module sharable, rather than individual pages • Protection : • better organised at module level • give the whole module the same protection rather than for individual pages
Operating Systems: Paging/1 • Allocation of Page-frames : • a free list of unused page-frames • any page-frame is as good as any other when allocating • though may wish to avoid using a page-frame if there is a chance its contents may be needed again in near future • recapture using memory as a large cache of pages (Windows 2000) • page-frames put back on the free list as they are released from use • except perhaps : • pages known to be finished with on front of free list • e.g. stack pages from a terminated process or thread • pages which might possible be used again on end of free list • maximise chance of them still being un-reused when needed again • Consequence of large page-tables : • need to be stored in main memory (far too long to be held in CPU registers) • every virtual address access requires two memory accesses? • one to access page table + one to access required physical memory location
Operating Systems: Paging/1 • Translation Lookaside Buffer (TLB) • a fast associative memory close to the CPU • stores a set of translations from virtual page number to page-frame number • each TLB entry compared concurrently CPU page offset frame offset page num frame num TLB hit physical memory TLB frame page table
Operating Systems: Paging/1 • TLB translations much faster than going via a page table in memory • aim is to achieve as high a hit ratio as possible • can get an effective access time using a weighted average : • for: m = main memory access timet = TLB access timeh = hit ratio • effective access time = ( h*t + (1-h)*(t+m) ) + m • e.g. h = 0.95, m = 100ns, t = 10ns : e.a.t. = 115nsh = 0.99, m = 25ns, t = 2ns : e.a.t. = 27.25ns • the larger the TLB, the higher the likely hit ratio • examples: • Motorola 68030 : 22 entries • Intel 486 : 32 entries (claimed 98% hit ratio) • Intel Pentium 4 : 128 entry instruction, 64 entry data (4-way associative) • PowerPC 601 : 256 entry (2-way associative) • AMD Athlon : 512 entry (2-level)
Operating Systems: Paging/1 • hit ratio optimised by keeping most recently used translations in the TLB • pages just accessed likely to be accessed again soon • each entry has a tag which is updated at every translation • e.g. Sun UltraSparc : 64 entries with 6 bit tag field • scheme: if entry with tag value n matched : tag for this entry set to 0 entries with tag < n : incremented by 1 entries with tag > n : unaffected • lowest value tags are most recently used translations • highest tags are least recently used (LRU) and can be discarded first virtual page tag page-frame 0 1 2 61 62 63
Operating Systems: Paging/1 • TLB entries must be invalidated when: • a page table entry is modified to change a virtual to physical mapping • the running process is changed • same virtual address in different processes means different physical address • privileged instructions usually provided to invalidate one entry or all entries • used by OS kernel dispatcher and memory manager • a process ID number could be prepended to the virtual page number to disambiguate the same virtual page numbers in different processes • Sun SPARC • effectively creates a larger virtual address space in which all processes live • Changing the running process can cause significant performance loss • first translations to be made are not available in the TLB yet • need to go to page tables in memory • almost doubles the normal access time • kernel dispatchers must try to limit frequency of changing processes
Operating Systems: Paging/1 • Caches • high speed memory closer to the processor than main memory • Level 1 cache • small, closest to processor, on same chip as CPU, highest access speed • Level 2 cache • larger, between level 1 cache and memory, usually on same chip also now • when data from a memory location is needed: • level 1 cache searched first for that location • if missing, level 2 cache searched; transfer data to level 1 cache if found • if missing from level 2 cache, fetch data from main memory to level 2 and level 1 caches • blocks of contiguous memory cached e.g. 16 byte ‘lines’ • most recently used lines maintained in caches by hardware • some architectures allow a preload cache line by software • effective memory access time reduced • similar calculation as for TLB
Operating Systems: Paging/1 • various strategies for writing data to memory • write-through – data written back to main memory immediately • useful for multi-processor systems • write-back – data only written back to main memory when block discarded from cache • less memory writing potentially • strategy can often be selected by software • bus-snooping may be necessary for multi-processor systems: • each cache snoops on bus addresses from other CPUs and either: • invalidates its own copy of any lines being written to main memory • captures line (or data within a line) and updates its own cache CPU CPU CPU cache cache cache main memory bus
Operating Systems: Paging/1 tag index within-line • N-way Set Associative Caching : • address : • N sets of lines per index • tag compared associatively with all tags at that row index • e.g. Pentium 4 Level 1 data cache : 8Kb, 4-way associative, 64 byte lines • 32 rows of 4*64 bytes : 5 bits index, 6 bits within-line offset • Pentium 4 Level 2 unified cache : 256Kb, 8-way associative, 64 byte lines • 512 rows of 8*64 bytes • Ref. IA-32 Intel Architecture Software Developer’s Manual, Vol 3 index tag data block tag data block tag data block tag data block 64
Operating Systems: Paging/1 • most often, physical addresses are used for cache line matching • i.e. virtual address translated to physical first : • some overlapping of translation and cache matching possible: • virtual address offset can be used to start indexing into the level 1 cache whilst TLB translation taking place • because the offset is not altered - just concatenated to the page-frame number • requires index+within-line bits < page offset bits • OK for Pentium 4 level 1 data cache : 11 bits index+within < 12 bits page offset • tag matching can only take place after translation completed Main Memory CPU page offset Level 1 cache Level 2 cache frame offset page num frame num TLB hit data value TLB frame page table
Operating Systems: Paging/1 • Caching using virtual addresses also possible e.g. Sun SPARC • virtual addresses matched before translation • translation can be started concurrently with cache match • aborted if cache match found • need to have process ID bits prepended to the virtual address before match • otherwise only one processes data lines could be in the cache at once • and whole cache would need to be invalidated on every process change • not necessary with physical address caching • possible snag: • one process could, in theory, use two different virtual addresses for the same physical address • which line is updated on write? • Very significant performance penalty when process on CPU changed • probably none of the data in the caches relates to the new process • much more significant than loss of TLB context • most of the high performance of recent processors comes from caching • kernel scheduler and dispatcher need to avoid process changing too often
Operating Systems: Paging/1 • Flag Bits – in each page table entry • Presence bit : when set to 1, this page is present in memory • page-frame number is valid • page-fault interrupt caused when page accessed with this bit 0 • sometimes set to 0 even though page is present in memory • acts as a guard page • causes an artificial page fault interrupt into the kernel when page accessed • Referenced bit : set whenever the page is accessed, read or write • cleared to 0 by paging manager at start of a processes residence in memory • and at successive time intervals thereafter • indicates which pages have been accessed during the previous interval • sometimes called strobing • used by the kernel’s memory manager when deciding which pages can be removed from memory when space is needed for something else • useful to know what is currently being used • various page replacement schemes possible e.g. LRU
Operating Systems: Paging/1 • Modified (or Dirty) bit : set whenever a page is written to • set value maintained throughout a memory residence • through successive time intervals • pages with this bit set must be saved somewhere e.g. hard disc, to avoid information loss when the space they occupy is needed for something else • pages with this bit not set can just be discarded • assuming a copy of the original exists on hard disc somewhere • Cache Disable bit : do not cache information in this page • useful for memory-mapped I/O addresses • data needs to go straight to output device • needed for semaphores in multi-processor system • semaphore stored in main memory • must not be held up in a local cache • Cache Write-through : write data straight through cache to memory • instead of waiting until cache line re-used • depends on particular architecture – possible on Intel Pentium
Operating Systems: Paging/1 use data yes • Demand Paging Flow : virtual address Page in TLB? Data in Cache? Generate Physical Address Update Cache Access Main Memory for data yes no no Access Page Table Presence bit set? Update TLB yes Hardware Software no A page-frame free? Dirty bit set? Start Transfer of Page to Disc Initiate Page Fault Interrupt Select Page for Replacement yes no no yes Page Transfer Completed Start Transfer of Page from Disc Page Transfer Completed Run other processes whilst waiting Update Page Table and set Presence bit
Operating Systems: Paging/1 • Multi-level Page Tables • very large virtual address spaces becoming common • e.g. 64-bit addressing on DEC Alpha, Intel Itanium, AMD Clawhammer • single-level page tables get excessively long • 4kb pages = 252 entry page tables • 1Mb pages = 244 entry page tables • and a page table for every process! • whole page table cannot be held in physical memory at once • needs to be sub-divided and somehow paged in and out as required • Two-level Paging • level 1 page table entries point to one of many level 2 page tables • level 2 page table entries contain page-frame numbers Virtual Address index into level 1 page table index into level 2 page table offset within page
Operating Systems: Paging/1 level 2 page tables main memory level 1 page table • level 2 page tables can be paged in and out • presence bit in level 1 page table entries indicates whether level 2 page table present or not • virtual address partition usually organised to fit a single page table into a page • e.g. for a 32-bit machine, partitions of 10 bits, 10 bits and 12 bits: • index partition sizes of 10 bits with 4 bytes per entry gives a page table size of 4kb
Operating Systems: Paging/1 • effective access time = ( h*t + (1-h)*(t+2*m) ) + m • e.g. h = 0.95, m = 100ns, t = 10ns : e.a.t. = 120ns (up from 115ns)h = 0.99, m = 25ns, t = 2ns : e.a.t. = 27.5ns (up from 27.25ns) • Three and more level Page Tables • more partitions of the virtual address space: • likely to be necessary with 64-bit virtual addressing • page tables at any intermediate level can be paged in and out • using presence bits at each level 1st index 2nd index 3rd index 4th index offset
Operating Systems: Paging/1 • Inverted Page Tables • a page table with one entry per page-frame of physical memory • each entry contains the virtual page number of the page in that frame • plus a process ID to disambiguate virtual page numbers • memory manager inserts data into table when allocating page-frames • table has to be searched for a PID/virtual page number combination • index of matching position is the page-frame number • if PID/virtual page number not found, page-fault interrupt triggered • memory manager may also keep traditional page tables for its own use PID virtual page number page-frame number . . . .
Operating Systems: Paging/1 • linear table searching entry by entry will be slow and inefficient • use Hash Table searching by hardware : • hashing clashes dealt with by chaining entries with same hash index together • hash index calculation, PID/page no. comparison and chaining done by hardware • a TLB is still required • searched associatively before inverted page table consulted virtual address v page no. offset chain hash index page-frame offset physical address hash table inverted page table