What I think I want out of a hypothetical
nfsiotop for Linux
I wish there was a version of Linux's nfsiostat that worked gracefully when you have several hundred NFS mounts across multiple NFS fileservers.
(I'm going to have to write one, aren't I.)
Linux exposes a very large array of per-filesystem NFS client
/proc/self/mountstats (see here)
and there are some programs that digest this data and report it,
such as nfsiostat(8). Nfsiostat
generally works decently to give you useful information, but it's
very much not designed for systems with, for example, over 250 NFS
mounts. Unfortunately that describes us, and we would rather like
to have a took which tells us what the NFS filesystem hotspots are
on a given NFS client if and when it's clearly spending a lot of
time waiting for NFS IO.
As suggested by the name, a hypothetical
nfsiotop would have to
only report on the top N filesystems, which raises the question of
how you sort NFS filesystems here. Modern versions of nfsiostat
sort by operations per second, which is a start, but I think that one
should also be able to sort by total read and write volume and
probably also by write volume alone. Other likely interesting things
to sort on are the average response time and the current number of
operations outstanding. An ideal tool would also be able to aggregate
things into per fileserver statistics.
(All of this suggests that the real answer is that you should be able to sort on any field that the program can display, including some synthetic ones.)
As my aside in the tweet suggests, I suspect that I'm going to have
to write this myself, and probably mostly from scratch. While
nfsiostat is written in Python and so is probably reasonably
straightforward for me to modify, I suspect that it has too many
things I'd want to change. I don't want little tweaks for things
like its output, I want wholesale restructuring. Hopefully I can
reuse its code to parse the
mountstats file, since that seems
reasonably tedious to write from scratch. On the other hand, the
current nfsiostat Python code seems amenable to a quick gut job to
prototype the output that I'd want.
(Mind you, prototypes tend to drift into use. But that's not necessarily a bad thing.)
PS: I've also run across kofemann/nfstop, which has some interesting features such as a per-UID breakdown, but it works by capturing NFS network traffic and that's not the kind of thing I want to have to use on a busy machine, especially at 10G.
PPS: I'd love to find out that a plausible nfsiotop already exists, but I haven't been able to turn one up in Internet searches so far.
Why Let's Encrypt's short certificate lifetimes are a great thing
I recently had a conversation on Twitter about what we care about in TLS certificate sources, and it got me to realize something. I've written before about how our attraction to Let's Encrypt has become all about the great automation, but what I hadn't really thought about back then was how important the short certificate lifetimes are. What got me to really thinking about it was a hypothetical; suppose we could get completely automatically issued and renewed free certificates but they had the typical one or more year lifetime of most TLS certificates to date. Would we be interested? I realized that we would not be, and that we would probably consider the long certificate lifetime to be a drawback, not a feature.
There is a general saying in modern programming to the effect that if you haven't tested it, it doesn't work. In system administration, we tend towards a modified version of that saying; if you haven't tested it recently, it doesn't work. Given our generally changing system environments, the recently is an important qualification; it's too easy for things to get broken by changes around them, so the longer it's been since you tried something, the less confidence you can have in it. The corollary for infrequent certificate renewal is obvious, because even in automated systems things can happen.
With Let's Encrypt, we don't just have automation; the short certificate lifetime insures that we exercise it frequently. Our client of choice (acmetool) renews certificates when they're 30 days from expiring, so although the official Let's Encrypt lifetime is 90 days, we roll over certificates every sixty days. Having a rollover happen once every two months is great for building and maintaining our confidence in the automation, in a way that wouldn't happen if it was once every six months, once a year, or even less often. If it was that infrequent, we'd probably end up paying attention during certificate rollovers even if we let automation do all of the actual work. With the frequent rollover due to Let's Encrypt's short certificate lifetimes, they've become things we trust enough to ignore.
(Automatic certificate renewal for long duration certificates is not completely impossible here, because the university central IT has already arranged for free certificates for the university. Right now they're managed through a website and our university-wide authentication system, but in theory there could be automation for at least renewals. Our one remaining non Let's Encrypt certificate was issued through this service as a two year certificate.)