What I know about process virtual size versus RSS on Linux

March 31, 2017

Up until very recently, I would have confidently told you that a Linux process's 'virtual size' was always at least as large as its resident set size. After all, how could it be otherwise? Your 'virtual size' was the total amount of mapped address space you had, the resident set size was how many pages you had in memory, and you could hardly have pages in memory without having them as part of your mapped address space. As Julia Evans has discovered, this is apparently not the case; in top terminology, it's possible to have processes with RES (ie RSS) and SHR that is larger than VIRT. So here is what I know about this.

To start with, top extracts this information from /proc/PID/statm, and this information is the same as what you can find as VmSize and VmRSS in /proc/PID/status. Top doesn't manipulate or postprocess these numbers (apart from converting them all from pages to Kb or other size units), so what you see it display is a faithful reproduction of what the kernel is actually reporting.

However, these two groups of numbers are maintained by different subsystems in the kernel's memory management system; there is nothing that directly ties them together or forces them to always be in sync. VmSize, VmPeak, VmData, and several other numbers come from per-mm_struct counters such as mm->total_vm; per Rick Branson these numbers are mostly maintained through vm_stat_account in mm/mmap.c. These numbers change when you make system calls like mmap() and mremap() (or when the kernel does similar things internally). Meanwhile, VmRSS, VmSwap, top's SHR, and RssAnon, RssFile, and RssShmem all come from page tracking, which mostly involves calling things like inc_mm_counter and add_mm_counter in places like mm/memory.c; these numbers change when pages are materialized and de-materialized in various ways.

(You can see where all of the memory stats in status come from in task_mem in fs/proc/task_mmu.c.)

I don't have anywhere near enough knowledge about the Linux kernel memory system to know if there's any way for a process to acquire a page through a path where it isn't accounted for in VmSize. One would think not, but clearly something funny is going on. On the other hand, this doesn't appear to be a common thing, because I wrote a simple brute-force checker script that compared every process's VmSize to its VmRSS, and I couldn't find any such odd process on any of our systems (a mixture of Ubuntu 12.04, 14.04, and 16.04, Fedora 25, and CentOS 6 and 7). It's quite possible that this requires a very unusual setup; Julia Evans' case is (or was) an active Chrome process and Chrome is known to play all sorts of weird games with its collection of processes that very few other programs do.

(If you find such a case it would be quite interesting to collect /proc/PID/smaps, which might show which specific mappings are doing this.)

PS: The one area of this that makes me wonder is how RSS is tracked over fork(), because there seem to be at least some oddities there. Or perhaps the child does not get PTEs and thus RSS for the mappings it shares with the parent until it touches them in some way.

Written on 31 March 2017.
« What top's SHR field means in quite modern Linux kernels
I quite like the simplification of having OpenSSH canonicalize hostnames »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Mar 31 02:01:19 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.