A previous post documented that a Linux server running a pre-2.6.24 kernel can fail to allocate large chunks of memory after its memory has been fragmented by a “thrashing” incident. In this post, I will point out some ways to prevent this problem.
Use a Newer Kernel
We have some servers running RHEL 5.9 with the kernel updated to 126.96.36.199. After a thrashing incident, these servers do not experience the same problem with allocating large blocks of memory. I think the fix is documented in the release notes for kernel 2.6.24. Section 2.4 talks about “anti-fragmentation patches” and includes a link to this article about Linux memory management, which links to this thorough documentation of the anti-fragmentation patches.(BTW, here is the full list of 2.6 kernel changelogs) My plan is to deploy RHEL 5.9 with the updated kernel to all the compute nodes in our cluster. However, this still doesn’t solve the problem of a user who requests some portion of the RAM on a node and then proceeds to consume more memory than requested. This is unfair to another user whose job is running on the same node.
Limit RAM Used By a Process
There are ways to prevent servers from thrashing in the first place. My discussion will be specific to HPC compute nodes, not the more general case of web servers, mail servers, etc. I’ll start by saying that ulimit is not the solution because, in part, its limits don’t propagate to child processes. Read this thorough discussion of the limitations of ulimit, and check out this script to limit the time and memory used by a Linux program. I haven’t evaluated that script yet, and its approach (polling run time and memory usage of the process and all of its children, grandchildren, etc.) seems a bit brute-force. I hope that process groups and Control Groups (cgroups) can be used instead. Also check out this Red Hat documentation on the memory subsystem in Linux.
It is conceivable that a compute node could have processes owned by multiple users.