2016-02-27
Sometimes brute force is the answer, Samba edition
Like many places, we have a Samba server so that users with various
sorts of laptop and desktop machines can get at their files. For
good reason the actual storage does not try to live
on the Samba server but instead lives on our NFS fileservers. For similarly good reasons, people
don't have separate Samba credentials; they use their regular Unix
login and password.
However, behind the scenes Samba has a separate login and password
system, so we are actually creating and maintaining two accounts
for people; a Unix one, used for most things, and a Samba one, used
for Samba. This means that when we create a Unix account, we must
also create a corresponding Samba account, which is done by using
'smbpasswd -a -n' (the password will be set later).
For a long time we've had an erratic problem with this, in that
occasionally the smbpasswd -a would fail. Not very often, but
often enough to be irritating (since fixing it took noticing and
then manual intervention). Our initial theory was that our
/etc/passwd propagation system was not
managing to update the Samba server's /etc/passwd with the new
login by the time we ran smbpasswd. To deal with this we wrote a
wrapper around smbpasswd that explicitly waited until the new
login was visible in /etc/passwd and dumped out copious information if something (still) went wrong. Surely
we had solved the problem, we figured.
You can guess what happened next: no, we hadn't. Oh, it was clear
that some of the problem was /etc/passwd propagation delays,
because every so often we could see the wrapper script report that
it had needed to wait. But sometimes smbpasswd still failed,
reporting simply:
Unable to modify TDB passwd: NT_STATUS_UNSUCCESSFUL! Failed to add entry for user <newuser>.
We could have spent a lot of time trying to figure out what was
going wrong in the depths of Samba and then how to avoid it, staring
at logs, perhaps looking at strace output, maybe reading source,
and so on and so forth. But we decided not to do that. Instead we
decided to take a much simpler approach. We'd already noticed that
every time this happened we could later run the 'smbpasswd -a -n
<newuser>' command by hand, so we just updated our wrapper script
so that if smbpasswd failed it would wait a second or two and try
again.
This is a brute force solution, or really more like a brute force workaround. We haven't at all identified the cause or what we really need to do to fix it; we've simply identified a workaround that we can execute automatically without actually understanding the real problem. But it works (so far) and it did not involve us staring at Samba for a long time; instead we could immediately move on to productive work.
Sometimes brute force and pragmatics are the right answer, under the circumstances.
(It helps that account creation is a rare event for us.)
2016-02-20
My two usage cases for Let's Encrypt certificates
As I mentioned yesterday, we unfortunately can't use Let's Encrypt certificates in production here. That doesn't mean I have no use for LE certificates, though. Instead I have two different ones.
My first usage case for LE certificates is as the first stop for temporary certificates for test machines at work. I not infrequently need to set up test versions of TLS-based services for various reasons, including testing configuration changes, operating system upgrades, and even whether or not I can make some random idea actually work. All of these cases need real, valid certificates because an ever increasing amount of software refuses to deal with self-signed certificates (at least in any reasonable way). Since it's very unlikely that I'll run a test server for anywhere close to 90 days, various sorts of LE certificate renewal issues are of little or no importance.
LE's rate limits mean that I may not be able to get a certificate from them when I want one (or renew an existing one if I'm about to recycle one of my generic virtual machines to test something else), but this is more than made up by the fact that I can try to get a LE certificate in minutes with absolutely no bureaucracy. If it works, great, I can go on with my real work; if not, either I put this particular project on the back burner for a few days or I get us to buy a commercial certificate and forget about the issue for a year.
(And when I can get a LE certificate for a general host name, I'm good for the next 90 days no matter what I'm doing with the host. Even though it's a little bit ugly, there's usually nothing I'm testing that requires a specific host name, or at least nothing that can't be fixed by hand editing a few configuration files for testing purposes.)
My second usage case is as the regular TLS certificates for my personal site, which is basically the canonical Let's Encrypt situation. Here I'm unlikely to run into rate limits and since I'm the only person getting certificates, I can coordinate with myself if it ever comes up. I do care about certificate renewal working smoothly, but on the other hand there are few enough certificates involved that if something doesn't work I can do things by hand and in an extreme case, even go back to my previous source for free TLS certificates. I'm also willing to run odd software in a custom configuration if it works for me, since I don't have to maintain things across a fleet of machines with co-workers; 'it works here for me' is good enough.
(And, while I care about my personal site, it is not 'production' in the way that work machines are. I can take risks with it that I wouldn't even dream of for work, or simply do things as experiments to see how they pan out. This is partly what Let's Encrypt is for me right now.)
These two usage cases wind up leaving me interested in different Let's Encrypt clients for each of them, but that's once again a subject for another entry.
2016-02-19
We can't use Let's Encrypt on our production systems right now
I really like Let's Encrypt, the new free and automated non-profit TLS Certificate Authority. Free is hard to beat, especially around here, and automatically issued certificates that don't require tedious interaction with websites are handy. And in general I love people who're striking a blow against the traditional CA racket. Unfortunately, despite all of that, there's basically no prospect of us using LE certificates in production around here.
The problem is not any of the traditional ones you might think of. Browsers trust the LE certificates, and that LE only does basic 'Domain Validation' (DV) certificates is not an issue because those are what we use anyways. And I have no qualms about using a free CA; CAs are in a commodity business and LE is easier to deal with than the alternatives due to their automation. It's not even the short 90-day duration on their certificates (although that's a potential issue).
The problem for us is that Let's Encrypt (currently) has relatively low rate limits, and especially it has a limit of five certificates per domain per week. Even if LE interpreted this very liberally (applying it to just our department's subdomain instead of the entire university), this is probably nowhere near enough for our usage. We have more than five different servers doing TLS ourselves, never mind all of the web servers run by research groups or even individual graduate students. This isn't just an issue of having to carefully schedule asking for certificates (and the resulting certificate renewals); it's also a massive coordination problem among all of the disparate people who could request certificates. As far as I can tell, using LE certificates in production here would mean giving a very large number of people the power to stop us from being able to renew (production) certificates. That's just not a risk we can take, especially since you have to renew LE certificates fairly often.
(Sure, we'd renew well ahead of time and if there were problems we could buy a commercial TLS certificate to replace the LE one. But if we're going to have problems very often we can save ourselves the heartburn and the fire drill by just buying commercial certificates in the first place. The university may not value staff time very highly in general but our time is still worth some actual money, and commercial certificates are cheap.)
I do feel sad about this, as I'd certainly like to be able to use LE certificates in production here (and I'd prefer to use them, especially with automatically handled renewal). But I suspect that a big university is always going to be a corner case that LE's rate limits simply won't deal with. If the university got seriously into 'TLS for all web sites', we're probably talking about at least thousands of separate servers.
(This doesn't mean that I have no use for LE certificates here. But that's another entry.)
Sidebar: my views on multiple names on the same certificate
TLS certificates can be issued with multiple names by using SANs, which means that you can theoretically cut down the number of distinct certificates you need by cramming a bunch of names on to one certificate. LE is especially generous with how many SANs you can attach to one certificate.
My personal dividing line is that I'm only willing to put multiple names into a TLS certificate when all of the names will be used on the same server. If I'm putting fifteen virtual host names into a certificate that will be used on a single web server, that's fine. If I'm jamming fifteen different web servers into one TLS certificate and so I'm going to have fifteen copies of it (and its key) on fifteen hosts, that's not fine. I should get separate certificates, so that the damage is more limited if one of those hosts gets compromised.
2016-02-11
My current views on using OpenSSH with CA-based host and user authentication
Recent versions of OpenSSH have support for doing host and user
authentication via a local CA. Instead
of directly listing trusted public keys, you configure a CA and
then trust anything signed by the CA. This is explained tersely
primarily in the ssh-keygen manpage and
at somewhat more length in articles like How to Harden SSH with
Identities and Certificates (via, via a comment by Patrick
here). As you might guess, I have some opinions on this.
I'm fine with using CA certs to authenticate hosts to users (especially
if OpenSSH still saves the host key to your known_hosts, which
I haven't tested), because the practical alternative is no initial
authentication of hosts at all. Almost no one verifies the SSH keys
of new hosts that they're connecting to, so signing host keys and
then trusting the CA gives you extra security even in the face of
the fundamental problem with the basic CA model.
I very much disagree with using CA certs to sign user keypairs and authenticate users system-wide because it has the weakness of the basic CA model, namely you lose the ability to know what you're trusting. What keys have access? Well, any signed by this CA cert with the right attributes. What are those? Well, you don't know for sure that you know all of them. This is completely different from explicit lists of keys, where you know exactly what you're trusting (although you may not know who has access to those keys).
Using CA certs to sign user keypairs is generally put forward as a
solution to the problem of distributing and updating explicit lists
of them. However this problem already has any number of solutions,
for example using sshd's AuthorizedKeysCommand to query a LDAP
directory (see eg this serverfault question).
If you're worried about the LDAP server going down, there are
workarounds for that. It's difficult for me to come up with an
environment where some solution like this isn't feasible, and such
solutions retain the advantage that you always have full control
over what identities are trusted and you can reliably audit this.
(I would not use CA-signed host keys as part of host-based
authentication with /etc/shosts.equiv. It suffers from exactly
the same problem as CA-signed user keys; you can never be completely
sure what you're trusting.)
Although it is not mentioned much or well documented, you can
apparently set up a personal CA for authentication via a cert-authority
line in your authorized_keys. I think that this is worse than
simply having normal keys listed, but it is at least less damaging
than doing it system-wide and you can make an argument that this
enables useful security things like frequent key rollover, limited-scope
keys, and safer use of keys on potentially exposed devices. If
you're doing these, maybe the security improvements are worth being
exposed to the CA key-issue risk.
(The idea is that you would keep your personal CA key more or less
offline; periodically you would sign a new moderate-duration encrypted
keypair and transport them to your online devices via eg a USB
memory stick. Restricted-scope keys would be done with special -n
arguments to ssh-keygen and then appropriate principals=
requirements in your authorized_keys on the restricted systems.
There are a bunch of tricks you could play here.)
Sidebar: A CA restriction feature I wish OpenSSH had
It would make me happier with CA signing if you could set limits
on the duration of (signed) keys that you'd accept. As it stands
right now, it is only ssh-keysign with the CA that enforces any
expiry on signed keys; if you can persuade the CA to sign with a
key-validity period of ten years, well, you've got a key that's
good for ten years unless it gets detected and revoked. It would
be better if the consumer of the signed key could say 'I will only
accept signatures with a maximum validity period of X weeks', 'I
will only accept signatures with a start time after Y', and so on.
All of these would act to somewhat limit the damage from a one-time
CA key issue, whether or not you detected it.
2016-02-06
You can have many matching stanzas in your ssh_config
When I started writing my ssh_config, years and years ago, I
basically assumed that how you used it was that you had a 'Host
*' stanza that set defaults and then for each host you might have
a specific 'Host <somehost>' stanza (perhaps with some wildcards
to group several hosts together). This is the world that looks like:
Host * StrictHostKeyChecking no ForwardX11 no Compression on Host github.com IdentityFile /u/cks/.ssh/ids/github
And so on (maybe with a generic identity in the default stanza).
What I have only belatedly and slowly come to understand is
that stanzas in ssh_config do not have to be used in just
this limited way. Any number of stanzas can match and apply
settings, not just two of them, and you can exploit this to
do interesting things in your ssh_config, including making
up for a limitation in the pattern matching that Host supports.
As the ssh_config manpage says explicitly, the first version
of an option encountered is the one that's used. Right away this
means that you may want to have two 'Host *' stanzas, one at the
start to set options that you never, ever want overridden, and one
at the end with genuine defaults that other entries might want to
override. Of course you can have more 'Host *' stanzas than this;
for example, you could have a separate stanza for experimental
settings (partly to keep them clearly separate, and partly to make
them easy to disable by just changing the '*' to something that
won't match).
Another use of multiple stanzas is to make up for an annoying
limitation of the ssh_config pattern matching. Here's where
I present the setup first and explain it later:
Host *.cs *.cs.toronto.edu [some options] Host * !*.* [the same options]
Here what I really want is a single Host stanza that applies to
'a hostname with no dots or one in the following (sub)domains'.
Unfortunately the current pattern language has no way of expressing
this directly, so instead I've split it into two stanzas. I have
to repeat the options I'm setting, but this is tolerable if I care
enough.
(At this point one might suggest that CanonicalizeHostname could
be the solution instead. For reasons beyond the scope of this entry
I prefer for ssh to leave this to my system's resolver.)
There are undoubtedly other things one can do with multiple Host
entries (or multiple Match entries) once you free yourself from
the shackles of thinking of them only as default settings plus host
specific settings. I know I have to go through my .ssh/config
and the ssh_config manpage with an eye to what I can do here.