1 / 4

Limiting memory consumption

Limiting memory consumption. The problem. Some sites would like to limit the memory consumption in order to avoid jobs decreasing performance of WNs What is really important is whether some processes require heavy swapping RSS is not a good metrics

lorne
Download Presentation

Limiting memory consumption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Limiting memory consumption

  2. The problem • Some sites would like to limit the memory consumption in order to avoid jobs decreasing performance of WNs • What is really important is whether some processes require heavy swapping • RSS is not a good metrics • For the same VMEM, RSS is larger on an empty WN! • Can we agree on that? • Is VMEM a good metrics? • Isn’t it rather swapping rate? • If pages are inactive, who cares? • Is it possible to limit swapping? • Is swapping the devil? • Some sites have no swapping space… • Why was virtual memory invented?! • Incredible waste of memory as all wrappers are in RSS! PhC

  3. What can be done? • What is the “offending” entity? • With one slot per core, it is a process, not a job! • Killing a job is just useless! • No information to user, therefore waste of resources • Confusion for the CE / WMS: the job has gone! • Which metrics? • VMEM • Deterministic for the main process • RSS • Depends on the load of the machine. If used, should anyway be set to max VMEM (in case the machine is empty) • PSS • Accounts shared pages weighted with 1/(nb of processes sharing) • In principle the best estimate, but depends also on what else is running • Limit should be max VMEM in case there is no sharing PhC

  4. What is done on LHCb Tier1s? • What is currently done, depending on the site • No limit at all (3 sites) • Works fine (never seen any related bad job performance) • Limit VMEM per process (ulimit) (2 sites, 3.8 GB) • Then kill the offending process (sending a signal) • Deterministic, but at least the framework can catch the return code and establish a diagnosis • Limit total VMEM/RSS per process group (job) (1 site, 5 GB) • This is unpredictable by the user! • No control on how much VMEM is used by wrappers etc… • Up to 1.5 GB!!! • In addition no swap space at that site! • Limit RSS per process (1 site, 4 GB) • Limit should be set to max VMEM anyway! • For 7 sites, the sampling is not too bad PhC

More Related