2016-04-30
I should keep and check notes even on my own little problems
I mentioned yesterday that I had a serious issue when I installed a VMWare Workstation update, going from 12.1.0 to 12.1.1. I wound up being very grumpy about it, disrupted the remaining part of my work day, filed a support request with VMWare, and so on, and eventually VMWare support came through with the cause and a workaround..
It turns out that I could have avoided all of that, because I ran into this same problem back when I upgraded to Fedora 23. At the time I did my Internet research, found the workaround, and applied it to my machine. This means that I could have proceeded straight to re-doing the workaround if I'd remembered this. Or, more likely, if I'd kept good notes on the problem then and remembered to read them this time.
We try to make and keep good notes for problems on our production systems, or even things that we run into in testing things for our work environment; we have an entire system for it. But I generally don't bother doing the same sort of thing for my own office workstation; when I find and fix problems and issues I may take some notes, but they're generally sketchy, off the cuff, and not centrally organized. And partly because of this, I often don't think to check them; I think I just assume I'm going to remember things about my own workstation (clearly this is wrong).
So, stating the obvious: I would be better off if I kept organized notes about what I had to do to fix problems and get various things going on my workstation, and put the notes into one place in some format (perhaps a directory with text files). Then I could make it a habit to look there before I do some things, or at least when I run into a problem after I do something.
Also, when I make these notes I should make them detailed, including dates and versions of what they're about. It turns out that I actually had some very sketchy notes about this problem from when I upgraded to Fedora 23 (they were some URLs that turned out to be discussions about the issue), but they didn't have a date or say 'this applied when I upgraded to Fedora 23 with VMWare 12' or anything like that. So when I stumbled over the file and skimmed it, I didn't realize that the URLs were still relevant; I skipped that because I assumed that of course it had to be outdated.
(I'm sure that when I wrote the note file in the first place I assumed that I'd always remember the context. Ha ha, silly me, I really should know better by now. Especially since I've written more than one entry here about making just that assumption and being wrong about it.)
2016-04-20
How to get Unbound to selectively add or override DNS records
Suppose, not entirely hypothetically, that you're using Unbound and you have a situation where you want to shim some local information into the normal DNS data (either adding records that don't exist naturally or overriding some that do). You don't want to totally overwrite a zone, just add some things. The good news is that Unbound can actually do this, and in a relatively straightforward way (unlike, say, Bind, where if this is possible at all it's not obvious).
You basically have two options, depending on what you want to do with the names you're overriding. I'll illustrate both of these:
local-zone: example.org typetransparent local-data: "server.example.org A 8.8.8.8"
Here we have added or overridden an A record for server.example.org.
Any other DNS records for server.example.org will be returned
as-is, such as MX records.
local-zone: example.com transparent local-data: "server.example.com A 9.9.9.9"
We've supplied our own A record for server.example.com, but we've
also effectively deleted all other DNS records for it. If it has
an MX record or a TXT record or what have you, those records will
not be visible. For any names in transparent local-data zones, you
are in complete control of all records returned; either they're in
your local-data stanzas, or they don't exist.
Note that if you just give local-data for something without a
local-zone directive, Unbound silently makes it into such a
transparent local zone.
Transparent local zones have one gotcha, which I will now illustrate:
local-zone: example.net transparent local-data: "example.net A 7.7.7.7"
Because this is a transparent zone and we haven't listed any NS
records for example.net as part of our local data, people will
not be able to look up any names inside the zone even though we
don't explicitly block or override them. Of course if we did list
some additional names inside example.net as local-data, people would
be able to look up them (and only them). This can be a bit puzzling
until you work out what's going on.
(Since transparent local zones are the default, note that this
happens if you leave out the local-zone or get the name wrong by
mistake or accident.)
As far as I know, there's no way to use a typetransparent zone but delete certain record types for some names, which you'd use so you can do things like remove all MX entries for some host names. However, Unbound's idea of 'zones' don't have to map to actual DNS zones, so you can do this:
local-zone: example.org typetransparent local-data: "server.example.org A 8.8.8.8" # but: local-zone: www.example.org transparent local-data: "www.example.org A 8.8.8.8"
By claiming www.example.org as a separate transparent local zone,
this allows us to delete all records for it but the A record that
we supply; this would remove, say, MX entries. Since I just tried
this out, note that a transparent local zone with no data naturally
doesn't blank out anything, so if you want to totally delete a
name's records you need to supply some dummy record (eg a TXT
record).
(We've turned out to not need to do this right now, but since I worked out how to do it I want to write it down before I forget.)
2016-04-15
Unbound illustrates the Unix manpage mistake with its ratelimits documentation
Our departmental recursive nameservers are based on OpenBSD, which has recently switched from BIND to Unbound and NSD. As a result of this, we've been in the process of setting up a working Unbound configuration. In the process of this we ran into an interesting issue.
A relatively current unbound.conf manpage has this to say about
(some) ratelimiting options (I'm excerpting here):
- ratelimit: <number or 0>
- Enable ratelimiting of queries sent to the nameserver for performing recursion. If 0, the default, it is disabled. [...] For example, 1000 may be a suitable value to stop the server from being overloaded with random names, and keeps unbound from sending traffic to the nameservers for those zones.
- ratelimit-for-domain: <domain> <number qps>
- Override the global ratelimit for an exact match domain name with the listed number. [...]
So you set up an Unbound configuration that contains the following:
# apparent good practice ratelimit: 1000 # but let's exempt our own zones from it, # just in case. ratelimit-for-domain: utoronto.ca 0
Congratulations, on at least the OpenBSD version of Unbound you
have just blown your own foot off; you'll probably be unable to
resolve anything in utoronto.ca. If you watch the logs sufficiently
carefully, you can eventually spot a little mention that your query
for say the A record of www.utoronto.ca has been ratelimited.
(If you're writing a moderately complicated Unbound configuration for the first time, it may take you some time to reach this point instead of suspecting that you have screwed something up in other bits of the configuration.)
What has happened is that you have not read the manpage with the
necessary closeness for a true
Unix manpage. You see, the manpage does not come out and actually
say that ratelimit-for-domain treats a ratelimit of 0 as unlimited.
It just looks like it should, because ratelimit-for-domain is a
more specialized version of plain ratelimit so you'd certainly
assume that they treat their number argument in the same way. And
of course that would be the sensible thing to do so you can do just
what we're trying to do here.
This may or may not be an Unbound bug in either Unbound itself or
in the unbound.conf manpage. Unix's minimalistic, legalistic
'close reading' history of both reading and writing manpages makes
it impossible to tell, because this could be both intended and
properly documented.
(In my opinion it is not well documented if it is intended, but that is a different argument. Classical style Unix manpages take specification-level terseness far too far for my tastes, partly for historical reasons. However this is not a winning argument to make with someone who likes this extreme terseness and 'pay attention to every word, both present and absent' approach; they will just tell you to read more carefully.)
2016-04-12
There's a spectrum of 'pets versus cattle' in servers
One of the memes in modern operations is that of pets versus cattle. I've written about this before, but at the time I accepted the usual more or less binary pet versus cattle split that's usually put forward. I've now shifted to feeling that there is a spectrum along the line between pets and cattle, so today I'm going to write down four spots I see on that line.
Total pets (classical pets) are artisanal servers, each one created and maintained completely by hand. You're lucky if there's any real documentation on what a machine's setup is supposed to be; there probably isn't. Losing a server probably means restoring configuration files from backups in order to get it back into service. This is the traditional level that a small or disorganized place operates at (or at least is stereotyped to operate at).
At one step along the line you have a central, global store of all configuration information and build instructions; for instance, you have the master copy of all changed configuration files in that central place, and a rule that you always modify the master version and copy it to a server. However, you build and maintain machines almost entirely by hand (although following your build documents and so on). You can recreate servers easily but they are still maintained mostly by hand, you troubleshoot them instead of reinstalling, and users will definitely notice if one suddenly vanishes. Possibly they have local state that has to be backed up and restored.
(This is how we build and maintain machines.)
Moving one more step towards cattle is when you have fully automated configuration management and more or less fully automated builds, but you still care about specific servers. You need to keep server <X> up, diagnose it when it has problems, and so on; you cannot simply deal with problems by 'terminate it and spin up another', and people will definitely notice if a given server goes down. One sign of this is that your servers have names and people know them.
Total cattle is achieved when essentially all servers can be 'fixed' by simply terminating them and spinning up another copy made from scratch, and your users won't notice this. Terminate and restart is your default troubleshooting method and you may even make servers immutable once spun up (so maintaining a server is actually 'terminate this instance and spin up an updated instance'). Certainly maintenance is automated. You never touch individual servers except in truly exceptional situations.
(Total cattle is kind of an exaggeration. Even very cattle-ish places seem to accept that there are situations where you want to troubleshoot weird problems instead of trying to assume that 'terminate and restart' can be used to fix everything.)
2016-04-04
The three types of challenges that Let's Encrypt currently supports
I've recently been working to understand Let's Encrypt a bit better, and in particular to understand the different sorts of challenges (ie, ways of proving that you control a hostname) that they currently support.
(In general, Alex Peattie's A guide to creating a LetsEncrypt client from scratch has a great overview of the overall flow of the challenge process.)
Right now, there are three challenges; 'HTTP', 'DNS', and what is called 'TLS-SNI'.
- in the DNS challenge, you add a specific TXT record with specific
contents (specified by Let's Encrypt) to your DNS zone, proving
control over the DNS for the host that you want a certificate for.
One of the drawbacks of this challenge is that DNS information
doesn't necessarily update immediately, so it may take some time
before LE sees your new DNS TXT record and can issue your
certificate.
- in the 'TLS-SNI' challenge, you run a server on a specific port
(currently only port 443, HTTPS) that has a self-signed certificate
with a specific set of Server Name Indicator records; you can read
the gory details in the ACME draft.
The drawback of TLS-SNI is that it is difficult to handle the challenge through an existing HTTPS server (although not impossible). You'd basically need to configure a new virtual host for the special self-signed certificate you need to use for the TLS-SNI challenge, and it has a special magic name.
As far as I know, the TLS-SNI challenge doesn't require you to be running a real web server on port 443. The challenge simply requires a TLS server that will return the special certificate.
- in the 'HTTP' challenge (the most well-known one), you put a magic
file in /.well-known/acme-challenge/ on your web server and LE
fetches it. At the moment the initial request from LE is always
made via HTTP, but LE will accept a redirection to the HTTPS
version of the URL so you can still do a universal HTTP to HTTPS
redirection (I don't know if it verifies the HTTPS certificate
in this case, but you should probably assume that it does).
The HTTP challenge is the easiest challenge to satisfy without disrupting existing services, since you just need to serve a file at a new URL. General purpose web servers can usually do this without any reconfiguration at all; you just need a couple of new directories in the filesystem.
Under normal circumstances, only the DNS challenge can easily be done from a different machine than the one you're getting the TLS certificate for; HTTP and TLS-SNI both want to talk to the actual machine itself. However, if you're willing to play evil firewall games with port redirections, it's probably possible to satisfy HTTP and TLS-SNI challenges from another machine. You simply redirect inbound port 80 and/or port 443 traffic to the normal machine off to your challenge-handling machine, and then make sure you're running a LE client that is willing to cooperate with this.
I suspect that client support for HTTP challenges is higher than support for TLS-SNI. Thus if you only want to allow one port through your firewall to your target machine, the easiest one is probably HTTP. Note that there is no need to run a general purpose HTTP server on your machine to handle LE challenges; there are any number of clients that just run their own HTTP server for the duration of the challenge, including the official LE client.
In theory the TLS-SNI challenge could be extended to work against other TLS ports, such as IMAPS, POP3S, or SMTPS; this might allow your existing IMAP server or MTA to handle a LE challenge without needing additional ports opened or additional services running (some of the time) on the machine. In practice I suspect that this is not a Let's Encrypt priority and is unlikely to happen any time soon.
(It would also require SNI support in your IMAP or SMTP server, and having them reconfigured more or less on the fly to serve the magic TLS-SNI certificate to people who ask for the right server name.)
2016-04-03
Let's Encrypt certificates can be used for more than HTTPS
The Let's Encrypt website basically only talks about using its certificates for (HTTPS) websites, and their FAQ is a little bit silent on this. So let me say it out loud:
Let's Encrypt certificates can be used for pretty much any TLS service, not just HTTPS websites.
In particular, you can absolutely use Let's Encrypt certificates for IMAP servers and MTAs (for SMTP). The LE documentation won't tell you how to set this up, the official client doesn't have any support for it as far as I know, and the LE 'prove that you control this host' challenge process doesn't have any provisions for doing it through IMAP or SMTP servers, but it can certainly be done. And if you already have a certificate issued to a host for HTTPS, you can also use that certificate for your IMAP server, your SMTP server, and so on.
Based on my brief experience, the thing that may give you the most annoyance is wrangling certificate chain issues. Web browsers are used to filling in the blanks on their own and web servers are generally willing to accept just about any old set of certificates as your certificate chain. Other server software can be much pickier (such as insisting on only necessary certificates and in the correct order), and things like IMAP clients may be less willing to fetch intermediate certificates on their own. Complicating this is how LE has multiple certificate chains (or at least they used to, right now you may just use their X3 intermediate certificate).
(I didn't take notes the last time I had to do this, so I don't have any specific directions for things like Dovecot or Exim.)
Of course, just as with web servers you'll need to arrange to handle the relatively rapid LE certificate rollovers. Some servers are nice enough to automatically notice new certificates and just start using them; others will require restarting or signalling, which you'll need to connect up to whatever system you're using for this (I have my own opinions here). If you're counting on the official client's magical handling of this for some web servers, well, now you get to do some work.
(In time I'm sure that third party clients will start supporting various non-HTTPS servers, both generating the certificate setups they require and knowing how to restart them. I suppose the support may even appear in the official client.)