2013-04-29
My practical problem with preconfigured virtual machine templates
In comments on my entry on some VM mistakes I made, people suggested setting up template VM images that I would clone or copy to create live images. With that, every time I wanted a new VM I'd at least have a chance to think about its settings and if cloning was easy enough I'd avoid the temptation to reuse an existing VM for some theoretically quick (and not very important) test.
As it happens I've sort of started to toy with this idea but I think there's a practical roadblock in our environment: OS package updates. Most of the actual machines that I deal with (and thus most of the test images I build) are not frozen at a point in time but instead are continuously kept up to date with Ubuntu's package updates. If I have base starter images I need to either keep those base starter images up to date or immediately apply all of the pending updates after I clone the base image to make a new working VM. Neither of those options seem entirely attractive, although I should probably give it a real try just to see.
(There is also the subtle issue that cloning a preconfigured base image and then updating the packages is not quite the same thing as a from-scratch rebuild. If I want to be absolutely sure that a machine can be rebuilt from scratch, I'm not sure I trust anything short of a from-scratch build. But I could probably save a bunch of time by doing the preliminary build testing with cloned images and only doing a from-scratch reinstall in the final test run.)
PS: every so often we make a meaningful change to the base install scripts and system; such changes would force me to rebuild all of the preconfigured images (well, strongly push me towards doing so). But I suppose those are relatively rare changes so I'm kind of making excuses to not try this.
(Which argues that I should try it, if only to understand what I really don't like about the idea instead of what I tell myself my concerns are. Yes, I'm sort of using my blog to talk out loud to myself.)
Sidebar: if you're working on VMs, give yourself more disk space
One of the smartest things I did recently for encouraging this sort of playing around with VMs is that I threw a third drive into my office workstation (to go along with the main mirrored system disks). Honestly, I should have done this ages ago; having to worry about how much disk space I had to spare for VMs is for the birds when 500 GB SATA drives are basically popcorn.
(The drive is unmirrored and un-backed-up, but if it dies I'll just lose expendable VM images and related bits and pieces. I keep the important VMs on my mirrored drives.)
2013-04-24
Two mistakes I made with VMs today
For reasons that kind of boil down to 'laziness', I only rarely delete or create VMs in my use of virtualization. Instead I mostly recycle or re-purpose already created VMs by reinstalling OSes on them, or sometimes not even reinstalling but just slapping some additional packages on to the existing VM image. When I'm in much doubt about the state of a VM or need it to be in a different one, I reinstall. Usually this works well, but today I discovered that I'd wound up with two accidents.
The smaller discovery was that both of my primary VMs (currently both being used for some testing) had lingering disk snapshots from months ago when each of them was being used for very different things. At a minimum this was taking up extra disk space. It may also have been slowing down disk IO due to copy-on-write issues, although after months of churn and OS reinstalls that may have wound up a non-issue.
The larger discovery was, well, let me put it this way: ZFS on Linux turns out not to be very happy when you try to run it on a VM with a 32-bit kernel and only 512 MB of RAM, especially if you're also using multipathed iSCSI. In retrospect I'm kind of impressed that the ZFS code didn't detonate on contact with that environment (although it did start producing kernel panics when I put it under enough load). Such is the drawback of repurposing existing VMs without paying much attention to their configuration.
(You might wonder how I could possibly get into that situation. The short answer is that it all started when I was doing some testing of low-memory web server setups and reused the same VM for a quick 'does it actually run' test of ZFS on Linux. Then later I came back to do more testing without actively noticing the VM's configuration and thinking about it.)
I don't have any clever ways of avoiding this in the future; it's just something that I'll have to keep an eye out for every so often, especially if I (temporarily) configure a VM into an unusual state (such as having low memory).
2013-04-23
Goodbye, djb dnscache
I've been using djb's dnscache for what is now a very long time;
the dates on some old scripts suggest that I started using it on
both my home and office machines no later than some time in the
summer of 2004. At the time I switched to using it as my local
recursive DNS server it was for the same reason that I imagine any
number of other people have; to put it simply I was tired of Bind
being a pig. Dnscache promised (and delivered) much lower and more
efficient memory use, which very much mattered on the machines that
I had in 2004.
This weekend, I turned dnscache off on my home machine (it's been off on my office machine for some time). There wasn't any particular immediate reason to do so, no specific thing I cared about the dnscache was failing me at, no unpatched security hole (that I know about), nothing like that. My direct reason for making the switch was that I've been worried for some time about how dnscache was going to deal with the growing new worlds of IPv6 and DNSSEC, or more accurately I was pretty sure that it wasn't going to do so very well.
But the larger reason is that djb's software is effectively dead software, dnscache included. Perhaps there are some people hacking on it somewhere, but the canonical source (djb himself) has walked away from it. As I wrote about qmail, the reality is that software on the Internet rots if not actively maintained because the Internet itself keeps changing. It was clear to me that I could either wait quietly until dnscache blew up in some obvious way or I could change over to something else. The something else might not be as pure or as minimal as dnscache but it wouldn't be quietly rotting, and some of the minimal purity of dnscache no longer matters on today's machines.
On my office machine I made the switch in late 2010 (judging from the last timestamps on dnscache's query logs and, now that I look, this old entry). I dragged my feet on my home machine for various reasons, partly laziness, but finally decided that it was time this weekend. There's a part of me that regrets this because it likes the purity and minimalism of dnscache, but the greater part of me knows that this is the sensible course. Still, I'll miss dnscache a bit. And it certainly served faithfully for all of these years.
(For those that are curious, I switched to Unbound, as suggested in that old entry.)
PS: I'm still running djb's tinydns for some primary DNS serving, but I suppose I should look into a replacement. It's just that I've hated all of the primary DNS servers I've ever looked at even more than I hate the various recursive caching nameservers. And there's also the security issues. My recursive nameservers are not exposed to the Internet; my primary DNS servers necessarily are.
2013-04-21
RCS should not be your first choice for version control
A while back I wrote that sysadmins should version control everything and kind of advocated using RCS for it. Based on one link to the entry I think that I may have confused people here about the merits of RCS, so let me clarify a bit.
RCS is a very old and rather primitive single-file version control system. Today it only has two real virtues: it works on single files and it's trivial to start using it on text files. It's not your best choice for a version control system in most circumstances, merely the easiest to start using so that you have some version control. And you want to have version control. If you're working on whole directories (and you should be if possible) any modern VCS is both better and more convenient than RCS. I'm partial to Mercurial for this but anything that you and your coworkers like will work fine. If you're already using git for other things and thus know it well, go with git.
(There are some situations where RCS may be a better choice. If you run into them you'll know it.)
Similarly, if you have a choice between using RCS on your configuration files and setting up good automation using Chef, Puppet, CFEngine, or whatever, you should set up automation. Full scale automation will get you much further than simply version controlling your configuration files in place on all of your different systems. Again, using RCS is the basic first step; you should start using it because you can do so right now and RCS is better than nothing. But once you've taken the first step you should keep moving towards an even better environment, not stop.
(I'm simplifying the issues surrounding full scale automation a lot, but again this is basic advice. Most people are better off automating and if you're going to automate, most of the time you're better off using a standard framework for it.)
2013-04-05
Authoritative, non-recursive DNS servers now need ratelimiting
Pretty much all of the coverage about the recent DNS amplification DDoS attacks, including advice to sysadmins, has been about open recursive DNS servers and how they are bad. I followed the issue enough to check that none of our subnets appeared in the recently-available databases of open recursive DNS servers and otherwise ignored it.
It turns out that this is not good enough, because authoritative DNS servers can be used for DNS amplification DDoS attacks too. Attackers prefer open recursive DNS servers because it's easy to use them to create big replies, but many DNS zones have enough records for various things that an authoritative DNS server can be coaxed into giving relatively big replies (on the order of 500 bytes and up) to small query packets. This is not theoretical. My awakening about this came about because someone appears to have done a test run against our authoritative DNS server, achieving around a 20x amplification (partly through a very small query packet); the resulting temporary traffic spike was picked up by university-wide network monitoring and then reported to us.
Open recursive DNS servers are generally easy to fix; you close them, because very few of them are intended to be used by the world. Authoritative DNS servers can't be fixed this way because their entire purpose is answering queries from the world about your zones. Instead our only option is to implement some sort of query ratelimiting. Unfortunately this is likely to become essential (especially for people handling big, complex zones) because DDoSes seem unlikely to go away any time soon.
(While people are working on adding ratelimiting to various DNS servers, this is nowhere yet generally available or ready. That leaves you with some form of firewall-based rate limiting as the only option, if your firewall supports it. It's only important to ratelimit UDP DNS queries because those are the only ones useful for DNS amplification attacks.)
It's worth noting that one of the drawbacks of ratelimiting is that, well, you have to figure out what rate to limit things at (and also what time interval to do it over). There is no generic answer (disbelieve anyone who offers you one) so your only real choice is to either measure ahead of time or experiment to see what blocks trigger when.
(You can try bandwidth limits instead of or in addition to query limits. Again you'll want to measure actual normal and peak DNS bandwidth usage.)