Wandering Thoughts archives


I quite like the simplification of having OpenSSH canonicalize hostnames

Some time ago I wrote up some notes on OpenSSH's optional hostname canonicalization. At the time I had just cautiously switched over to having my OpenSSH setup on my workstation canonicalize my hostnames, and I half expected it to go wrong somehow. It's been just over a year since then and not only has nothing blown up, I now actively prefer having OpenSSH canonicalize the hostnames that I use and I've just today switched my OpenSSH setup on our login servers over to do this and cleared out my ~/.ssh/known_hosts file to start it over from scratch.

That latter bit is a big part of why I've come to like hostname canonicalization. We have a long history of using multiple forms of shortened hostnames for convenience, plus sometimes we wind up using a host's fully qualified name. When I didn't canonicalize names, my known_hosts file wound up increasingly cluttered with multiple entries for what was actually the same host, some of them with the IP address and some without. After canonicalization, all of this goes away; every host has one entry and that's it. Since we already maintain a system-wide set of SSH known hosts (partly for our custom NFS mount authentication system), my own known_hosts file now doesn't even accumulate very many entries.

(I should probably install our global SSH known hosts file even on my workstation, which is deliberately independent from our overall infrastructure; this would let me drastically reduce my known_hosts file there too.)

The other significant reason to like hostname canonicalization is the reason I mentioned in my original entry, which is that it allows me to use much simpler Host matching rules in my ~/.ssh/config while only offering my SSH keys to hosts that should actually accept them (instead of to everyone, which can have various consequences). This seems to have become especially relevant lately, as some of our recently deployed hosts seem to have reduced the number of authentication attempts they'll accept (and each keypair you offer counts as one attempt). And in general I just like having my SSH client configuration saying what I actually want, instead of having to flail around with 'Host *' matches and so on because there was no simple way to say 'all of our hosts'. With canonical hostnames, now there is.

As far as DNS reliability for resolving CNAMEs goes, we haven't had any DNS problems in the past year (or if we have, I failed to notice them amidst greater problems). We might someday, but in general DNS issues are going to cause me problems no matter what, since my ssh has to look up at least IP addresses in DNS. If it happens I'll do something, but in the mean time I've stopped worrying about the possibility.

sysadmin/SSHCanonHostnamesWin written at 22:07:59; Add Comment

What I know about process virtual size versus RSS on Linux

Up until very recently, I would have confidently told you that a Linux process's 'virtual size' was always at least as large as its resident set size. After all, how could it be otherwise? Your 'virtual size' was the total amount of mapped address space you had, the resident set size was how many pages you had in memory, and you could hardly have pages in memory without having them as part of your mapped address space. As Julia Evans has discovered, this is apparently not the case; in top terminology, it's possible to have processes with RES (ie RSS) and SHR that is larger than VIRT. So here is what I know about this.

To start with, top extracts this information from /proc/PID/statm, and this information is the same as what you can find as VmSize and VmRSS in /proc/PID/status. Top doesn't manipulate or postprocess these numbers (apart from converting them all from pages to Kb or other size units), so what you see it display is a faithful reproduction of what the kernel is actually reporting.

However, these two groups of numbers are maintained by different subsystems in the kernel's memory management system; there is nothing that directly ties them together or forces them to always be in sync. VmSize, VmPeak, VmData, and several other numbers come from per-mm_struct counters such as mm->total_vm; per Rick Branson these numbers are mostly maintained through vm_stat_account in mm/mmap.c. These numbers change when you make system calls like mmap() and mremap() (or when the kernel does similar things internally). Meanwhile, VmRSS, VmSwap, top's SHR, and RssAnon, RssFile, and RssShmem all come from page tracking, which mostly involves calling things like inc_mm_counter and add_mm_counter in places like mm/memory.c; these numbers change when pages are materialized and de-materialized in various ways.

(You can see where all of the memory stats in status come from in task_mem in fs/proc/task_mmu.c.)

I don't have anywhere near enough knowledge about the Linux kernel memory system to know if there's any way for a process to acquire a page through a path where it isn't accounted for in VmSize. One would think not, but clearly something funny is going on. On the other hand, this doesn't appear to be a common thing, because I wrote a simple brute-force checker script that compared every process's VmSize to its VmRSS, and I couldn't find any such odd process on any of our systems (a mixture of Ubuntu 12.04, 14.04, and 16.04, Fedora 25, and CentOS 6 and 7). It's quite possible that this requires a very unusual setup; Julia Evans' case is (or was) an active Chrome process and Chrome is known to play all sorts of weird games with its collection of processes that very few other programs do.

(If you find such a case it would be quite interesting to collect /proc/PID/smaps, which might show which specific mappings are doing this.)

PS: The one area of this that makes me wonder is how RSS is tracked over fork(), because there seem to be at least some oddities there. Or perhaps the child does not get PTEs and thus RSS for the mappings it shares with the parent until it touches them in some way.

linux/VirtualSizeVersusRSS written at 02:01:19; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.