1 / 66

Mysteries of Windows Memory Management Revealed

WCL406. Mysteries of Windows Memory Management Revealed. Mark Russinovich Technical Fellow Windows Azure (created jointly with Dave Solomon). Goals. Deep dive on: Process virtual and physical memory usage Operating system virtual and physical memory usage

grant
Download Presentation

Mysteries of Windows Memory Management Revealed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WCL406 Mysteries of Windows Memory Management Revealed Mark Russinovich Technical Fellow Windows Azure (created jointly with Dave Solomon)

  2. Goals • Deep dive on: • Process virtual and physical memory usage • Operating system virtual and physical memory usage • Crisply define memory-related terminology • Highlight tools that reveal memory usage • Describe ‘dark spots’ in memory analysis counters and tools

  3. Agenda • Virtual Memory • Address Space Usage • Process Commit • System Commit • Physical Memory • Working Sets • Paging Lists • Hard to Track Memory

  4. Tools We’ll Use • Task Manager • Sysinternals Process Explorer • Sysinternals Vmmap • Process virtual and physical memory usage • Sysinternals Rammap • System physical memory usage • Sysinternals Testlimit • Test program to leak different kinds of memory Sysinternals tools are free at www.sysinternals.com

  5. Virtual Memory

  6. Memory Management Fundamentals • Windows has demand-paged memory management • Processes “demand” memory as needed • There is no swapping • A page is 4 KB (8 KB on Itanium) • Allocations must align on 64 KB boundaries • Large pages are available for improved TLB usage • x86: 4 MB • X64 and x86 PAE: 2 MB • Itanium: 16 MB • There is NO (will, almost no) connection between virtual memory and physical memory

  7. 32-bit x86 Address Space • 32-bits = 2^32 = 4 GB • /3GB and /USERVA can extend process address up to 3 GB • Process must be marked “large address space aware” to use memory above 2 GB Default 3 GB user space 3 GB Per-Process Space 2 GB Per-Process Space 2 GB System Space 1 GB System Space

  8. 64-bit Address Spaces • 64-bits = 2^64 = 17,179,869,184 GB • x64 today supports 48 bits virtual = 262,144 GB = 256 TB • IA-64 today support 50 bits virtual = 1,048,576 GB = 1024 TB • 64-bit Windows supports 44 bits = 16,384 GB = 16 TB x64 32-bit process on x64 8 TB Per-Process Space 4 GB Per-Process Space 8 TB System Space 8TB System Space

  9. Virtual Address Space Components • Committed: in-use • Reserved: reserved for future use • Address space breakdown • Private (e.g. process heap) • Reserved or committed • Shareable (e.g. EXE, DLL, shared memory, other memory mapped files) • Reserved or committed • Free (not yet defined)

  10. Why Reserve Memory? • Reserved memory lets an application lazily commit contiguous memory • Used for stack and heap expansion Stack Grows Down Committed Committed Thread Stack Guard Reserved Guard Reserved Before Expansion After Expansion

  11. Viewing Address Space Breakdown • Task Manager only lets you see private bytes • Before Vista: column called “VM Size” • Vista and later: column called “Commit Size” • Process Explorer shows both virtual size and private bytes • Add 2 columns to process list • Virtual Size • Private Bytes • Run Testlimit twice • Testlimit -r • Testlimit -m • Note: if on 64-bit Windows, 32-bit Testlimit can grow to 4GB

  12. Understanding Process Address Space Usage • Most virtual memory problems are due to a process leaking private committed memory • Heap, GC heap, language heaps (CRT) • Private Bytes only tells part of the story • Doesn’t account for shareable memory that’s not shared (e.g. DLLs loaded only by this process) • Fragmentation can be an issue • Address space can effectively be exhausted prematurely • Basic performance counters don’t provide enough information to troubleshoot Fragmented Address Space

  13. Viewing Processes with VMMap • VMMap shows detailed breakdown of process address space: • Private process memory • Copy-on-write • Private (VirtualAlloc) • Heap and GC Heap • Stack • Shareable process memory • Image - executables • Shareable – shareable memory • Mapped File – memory mapped files • Page table – page table pages • Unusable – gap between allocation and next allocation boundary • Note that “shareable” types can have private commitment • Read/write pages in shared memory • Copy-on-write pages

  14. Viewing Fragmentation • Fragmentation is visible by selecting Options->Show Free Regions, selecting the Free type, and sorting by size • Largest free block is largest allocation possible • Clickable fragmentation map in View->Fragmentation View • Run testlimit -t on 64-bit Windows • Threads need 256 KB 64-bit stack and 1 MB 32-bit stack

  15. File Mappings • File mapping enables an application to read and write file data through memory operations • File mappings are used for • Image (.EXE and .DLL) loading: “Image” in VMMap • Data files access (e.g. NLS files): “Mapped File” in VMMap • “Pagefile-backed” shared memory: “Shareable” in VMMap • Entire file doesn’t have to be mapped • Allows for “windows” into the file Database.db Address Space

  16. Tracing File Mapping with Process Monitor • Procmon can trace image loader activity

  17. VMMap Differencing • Press F5 to refresh the view • VMMap keeps all snapshots • Use the timeline to select snapshots to compare

  18. Tracing with VMMap • You can launch a process with profiling • Detours tracks virtual and heap activity

  19. The System Commit Limit • System committed virtual memory must be backed either by physical memory or stored in the paging file • Sum of (most of) physical memory and current paging files • Allocations charged against the system commit limit: • Process private bytes • Pagefile-backed shared memory • Copy-on-write pages • Read/write file pages • System paged and nonpaged code and data • When limit is reached, virtual memory allocations fail • Processes may crash (or corrupt data)

  20. Changing the System Commit Limit • You can increase the system commit limit by adding RAM or increasing the pagefile size • The system commit limit can grow if paging files configured to expand • So the system commit limit might be the current limit, not the maximum • Default configuration (“System Managed”): • Minimum: 1.5x RAM if RAM < 1 GB; RAM otherwise • Maximum: 3x RAM or 4 GB, whichever is larger • Maximum system commit limit should be based on system commit peak for extreme workload

  21. Viewing System Commit Usage • Performance Counters: • Committed Bytes • Commit Limit • Task Manager • XP: commit charge labeled “PF Usage” • Vista: commit charge labeled “Page File” • Win7: commit charge labeled “Commit” • Vista and Win7 show commit limit after slash

  22. Viewing the System Commit Limit • Process Explorer shows commit charge (with history), commit limit, and commit peak • No built-in tool shows peak any more

  23. Exhausting the System Commit Limit • On 32-bit system, run “Testlimit –m” multiple times until system commit limit exhausted • On 64-bits, “Testlimit64 –m” will exhaust the system commit limit before its address space:

  24. Sizing the Paging File • If you enough RAM to support your commit needs, why even have one? • System can page out unused, modified private pages vs keeping them in RAM • More RAM available for useful stuff • Many recommendations use a formula based on RAM (1.5x, 2x, etc.) • Actually, the more RAM, the smaller the paging file needed • Should be based on workload usage of committed virtual memory • Look at commit peak after workload has run • Pre-Vista: Task Manager • Vista+: Process Explorer • Apply a formula to that to give buffer (1.5x or 2x) • Make sure it’s big enough to hold a kernel crash dump

  25. Physical Memory

  26. Working Set List • All the physical pages “owned” by a process • E.g. the pages the process can reference without incurring a page fault • A process always starts with an empty working set • It then incurs page faults when referencing a page that isn’t in its working set • Hard fault: resolved from file on disk (paging file, mapped file) • Soft fault: resolved from memory newer pages older pages Working Set

  27. Working Set • Each process has a default working set minimum and maximum • Can change with SetProcessWorkingSet • Working set minimum controls maximum number of locked pages (VirtualLock) • Minimum is also reserved from RAM as a guarantee to the process • Working set maximum is ignored • If there’s ample memory, process working set represents all the memory it has referenced (but not freed) • If memory is tight, working sets get trimmed

  28. When memory manager decides the process is large enough, it give up pages to make room for new pages Local page replacement policy Means that a single process cannot take over all of physical memory unless other processes aren’t using it Page replacement algorithm is least recently accessed (pages are aged when available memory is low) Working Set Replacement To standby or modified page list Working Set

  29. Working Set Breakdown • Consists of 2 types of pages: • Shareable (of which some may be shared) • Private • Four performance counters available: • Working Set Shareable • Working Set Shared (subset of shareable that are currently shared) • Working Set Private • Working Set Size (total of WS Shareable+Private) • Note: adding this up for each process overcounts shared pages • Caveats: • Working set does not include trimmed memory that is still cached • Shareable working set should be viewed as “private” if it’s not shared

  30. Viewing Working Set with Task Manager • Displays private working set size • Calls it “Memory (Private Working Set)”

  31. Viewing Working Set with Process Explorer • Process Explorer shows all the performance counters • Virtual Bytes • Private Bytes • WS Shareable Bytes • WS Shared Bytes • WS Private Bytes • Run Testlimit three times: • Testlimit -r 1024 -c 1 • Testlimit -m 1024 -c 1 • Testlimit -d 1024 -c 1 • Note how working set numbers don’t at all represent the process virtual memory usage

  32. Viewing the Working Set with VMMap • Vmmap shows working set size of each component of address space • Also shows locked pages • Copy-on-write pages will show up as Private WS in shareable regions

  33. How Copy-On-Write WorksBefore Process Address Space Process Address Space Physical memory Orig. Data Page 1 Orig. Data Page 2 Page 3

  34. How Copy-On-Write WorksAfter Process Address Space Process Address Space Physical memory Orig. Data Page 1 Mod’d. Data Page 2 Page 3 Copy of page 2

  35. Managing Physical Memory • System keeps unassigned physical pages on one of several lists • Free page list • Modified page list • Standby page lists (8 as of Vista & later) • Zero page list • ROM page list • Bad page list - pages that failed memory test at system startup • Lists are implemented by entries in the “PFN database” • Maintained as FIFO lists or queues

  36. Paging Dynamics • New pages are allocated to working sets from the top of the free or zero page list • Pages released from the working set due to working set replacement go to the bottom of: • The modified page list (if they were modified while in the working set) • The standby page list (if not modified) • Decision made based on “D” (dirty = modified) bit in page table entry • Association between the process and the physical page is still maintained while the page is on either of these lists

  37. Standby and Modified Page Lists • Modified pages go to modified (dirty) list • Avoids writing pages back to disk too soon • Unmodified pages go to standby (clean) lists • They form a system-wide cache of “pages likely to be needed again” • Pages can be faulted back into a process from the standby and modified page list • These are counted as page faults, but not page reads

  38. Modified Page Writer • When modified list reaches certain size, modified page writer system thread is awoken to write pages out • Also triggered when memory is overcommitted (too few free pages) • Does not flush entire modified page list • Two system threads • One for mapped files, one for the paging file • Pages move from the modified list to the standby list • E.g. can still be soft faulted into a working set

  39. Free and Zero Page Lists • Free Page List • Used for page reads • Private modified pages go here on process exit • Pages contain junk in them (e.g. not zeroed) • On most busy systems, this is empty • Zero Page List • Used to satisfy demand zero page faults • References to private pages that have not been created yet • When free page list has 8 or more pages, a priority zero thread is awoken to zero them • On most busy systems, this is empty too

  40. demand zero page faults page read from disk or kernel allocations (“hard” page faults) modified page writer “global valid” faults working set replacement Private pages at process exit Paging Dynamics Standby PageLists Free PageList Zero Page List Bad Page List Working Sets zero page thread “soft” page faults Modified PageList

  41. Viewing the Paging Lists with Task Manager • XP/2003: • Available = Standby + Zero + Free • System Cache = Standby + Modified + System Working Set • Vista/Server 2008: • Replaced Available with Free • Free + Zero list • System Cache relabeled Cached • Windows 7/Server 2008 R2 • Available put back

  42. Viewing the Paging Lists with Process Explorer • Process Explorer shows each paging list • Click View->System Information

  43. Total Process Private Memory Usage • Working Set size does not include: • Private memory on standby or modified lists • Page tables • Rammap shows this on Processes tab

  44. Viewing Memory Usage with Rammap • In addition to showing size of paging lists, shows usage breakdown: • Process private • Mapped file • Shared memory • Page tables • Paged pool • Nonpaged pool • System PTE • Session private • Metafile • AWE • Driver locked • Kernel stack

  45. Prioritized Standby Lists Pages removed Prioritized Standby Lists • In Vista & later, there are 8 prioritized standby lists • Pages are removed from lowest priority list first • Low memory priority process will keep re-using low priority pages • Higher priority information remains cached Pages added

  46. SuperFetch™ • Superfetch proactively repopulates RAM with the most useful data • Sets priority of pages to optimal value, based the page history and other analysis that it performs • Takes into account frequency of page usage, usage of page in context of other pages in memory • Adapts application launch patterns, in chunks of 8 hours (times a day) and weekend vs weekday • Scenarios SuperFetch improves include • Resume from hibernate and suspend • Fast user switching • Performance after infrequent or low priority tasks execute • Application launch • Windows 7: Disabled if the OS is booted of an SSD

  47. Memory Priority • Each thread has its own memory priority • 5: normal • 1: low • This determines which standby list is used for the page (when/if it arrives on the standby list) • Thread priority comes from process memory priority • Can be changed for process or individual thread • SetPriorityClass or SetThreadPriority “background mode”

  48. Standby List Population • Priority 7 come from a static set (pre-trained at Microsoft) • Pre-populated at each boot • Includes pages related to user input that requires fast responsiveness (right-click, desktop properties, control panel, start menu, etc.) • Priority 6 are pages that SuperFetch considers important, or useful (will rarely get repurposed) • Priority 5 are standard user pages (memory priority 5) • Priority 1 are low priority user pages (memory priority 1) • Priority 0-4 may be Superfetch decayed, cache manager read-ahead and pagefault clustering

  49. How Much of the Standby List has Been Consumed? • RAMMap shows the amount of memory repurposed off each standby list since boot:

More Related