2016-02-22
We've permanently disabled overlayfs on our servers
Oh look, yet another Linux kernel local exploit in the overlayfs module. Time to permanently blacklist it on all of our machines.
Today's bugs are CVE-2016-1576 and CVE-2016-1575 (via). There have been others before, and probably more that my casual Internet searches aren't turning up.
Based on my experiences so far, the two most common ingredients in exploitable kernel security issues we've been seeing Ubuntu announcements for are overlayfs and user namespaces. As far as I know, we can't do anything to turn off user namespaces without rebuilding and maintaining our own kernel packages, but overlayfs is (just) a loadable kernel module. A kernel module that we don't use.
So now we have an /etc/modprobe.d/cslab-overlayfs.conf file on
all of our servers that says:
# Permanently stop overlayfs from being loaded # because it keeps having security issues and # we don't use it. blacklist overlayfs install overlayfs /bin/false
Pretty soon this will be in our install framework, which means that future machines will probably be like this for several Ubuntu LTS versions to come. I feel some vague regret, but not very much. I'm done putting up with the whole 'surely we'll get this right someday' approach to making these subsystems not create security issues.
By the way, I don't find issues in either subsystem to be particularly surprising given what they do. User namespaces especially are a recipe for trouble in practice, because they let you create environments that break long standing Unix security assumptions. Sure, they are supposed to only do this in a way that is still secure, but in practice, no, things keep slipping through the cracks. In a sane world it would be possible to disable user namespaces at runtime on distribution kernels. Sadly we're not in that world.
2016-02-21
Why the Ubuntu package update process makes me irritated
We have fifty or so Ubuntu machines in our fleet, which means that when Ubuntu comes out with package updates we have to update them all. And this is rather a problem. First, obviously, fifty machines is too many to do updates on by hand; we need to automate this. Unfortunately Ubuntu (or Debian) has made a series of decisions about how packages and package updates work that cause significant amounts of pain here.
(Although Debian and Ubuntu share the same package format, I don't have enough experience with Debian to speak for it here. I wouldn't be surprised if much of this applied there, but I don't know for sure.)
To start with, Ubuntu package updates are slow. Well, okay, everyone's package updates are slow, but when you have fifty-odd machines the slowness means that doing them one by one can take a significant time. Even a modest package update can take an hour or so, which means that you'd really like to do them in parallel in some way.
(This does not bite too hard for normal updates, because you can do those once a day in the morning while you're getting coffee, reading email, and so on, and thus sort of don't care if they take a while. This really bites for urgent security updates that you want to apply right when you read Ubuntu's announcement emails, especially if those emails show up just before you're about to leave for the day.)
Next, Ubuntu package updates via apt-get are very verbose, and
worse they're verbose in unpredictable ways, as packages feel free
to natter about all sorts of things as they update. It may be nice
to know that your package update is making progress, but without
any regularity in the output it's extremely hard for a program to
look through it and find problem indicators (and people have
problems too). Other systems are much
better here, because their normal output is very regular and you
can immediately pick out deviations from that (either in a program
or by eye). If you're going to automate a package update process
in any way, you'd really like to be able to spot and report problems.
(Also, if you're updating a bunch of machines, it's a great thing to be able to notice problems on one machine before you go on to update another machine and have the same problems.)
In addition, Ubuntu packages are allowed to ask questions during package updates, which insures that some of them do (sometimes for trivia). A package update that pauses to ask you questions is not a package update that can be automated in a hands-off way. In fact it's even hard to automate in any way (at least well), because you need to build some way for a human to break into the process to answer those questions. Yes, in theory I think you can turn off these questions and force the package to take its default answer, but in practice I've felt for years that this is dangerous because nothing forces the default to be the right choice, or even sensible.
(I also expect that the verbosity of package updates would make it hard to spot times where a package wanted to ask a question but you forced it to take its default. These are cases where you really want to spot and review the decision, in case the package screwed up.)
The ideal hands off update process will never ask you questions and can be made to report only actual problems and what packages were updated. This makes it easy to do automatically and in parallel; you fire off a bunch of update processes on a bunch of machines, collect the output, and then can easily report both problems and actual packages updated. If you wanted to, you could use only limited parallelism, scan updates for problems, and freeze the whole process if any turn up. The only way that the Ubuntu package update process resembles this is that it's relatively easy to parse the output to determine what packages were updated.
2016-02-08
Clearing SMART disk complaints, with safety provided by ZFS
Recently, my office machine's smartd began complaining about problems
on one of my drives (again):
Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Device: /dev/sdc [SAT], 5 Offline uncorrectable sectors
As it happens, I was eventually able to make all of these complaints go away (I won't say I fixed the problem, because the disk is undoubtedly still slowly failing). This took a number of steps and some of them were significantly helped by ZFS on Linux.
(For background, this disk is one half of a mirrored pair. Most of it is in a ZFS pool; the rest is in various software RAID mirrors.)
My steps:
- Scrub my ZFS pool, in the hopes that this would make the problem go
away like the first iteration of
smartdcomplaints. Unfortunately I wasn't so lucky this time around, but the scrub did verify that all of my data was intact. - Use
ddto read all of the partitions of the disk (one after another) in order to try to find where the bad spots were. This wound up making four of the five problem sectors just quietly go away and did turn up a hard read error in one partition. Fortunately or unfortunately it was my ZFS partition.The resulting kernel complaints looked like:
blk_update_request: I/O error, dev sdc, sector 1362171035 Buffer I/O error on dev sdc, logical block 170271379, async page read
The reason that a ZFS scrub did not turn up a problem was that ZFS scrubs only check allocated space. Presumably the read error is in unallocated space.
- Use the kernel error messages and carefully iterated experiments
with
dd'sskip=argument to make sure I had the right block offset into/dev/sdc, ie the block offset that would makeddimmediately read that sector. - Then I tried to write zeroes over just that sector with '
dd if=/dev/zero of=/dev/sdc seek=... count=1'. Unfortunately this ran into a problem; for some reason the kernel felt that this was a 4k sector drive, or at least that it had to do 4k IO to/dev/sdc. This caused it to attempt to do a read-modify-write cycle, which immediately failed when it tried to read the 4k block that contained the bad sector.(The goal here was to force the disk to reallocate the bad sector into one of its spare sectors. If this reallocation failed, I'd have replaced the disk right away.)
- This meant that I needed to do 4K writes, not 512 byte writes, which
meant that I needed the right offset for
ddin 4K units. This was handily the 'logical block' from the kernel error message, which I verified by running:dd if=/dev/sdc of=/dev/null bs=4k skip=170271379 count=1
This immediately errored out with a read error, which is what I expected.
- Now that I had the right 4K offset, I could write 4K of /dev/zero
to the right spot. To really verify that I was doing (only) 4K of
IO and to the right spot, I ran
ddunderstrace:strace dd if=/dev/zero of=/dev/sdc bs=4k seek=170271379 count=1
- To verify that this
ddhad taken care of the problem, I redid theddread. This time it succeeded. - Finally, to verify that writing zeroes over a bit of one side of my ZFS pool had only gone to unallocated space and hadn't damaged anything, I re-scrubbed the ZFS pool.
ZFS was important here because ZFS checksums meant that writing
zeroes over bits of one pool disk was 'safe', unlike with software
RAID, because if I hit any in-use data ZFS would know that the chunk
of 0 bytes was incorrect and fix it up. With software RAID I guess
I'd have had to carefully copy the data from the other side of the
software RAID, instead of just using /dev/zero.
By the way, I don't necessarily recommend this long series of somewhat hackish steps. In an environment with plentiful spare drives, the right answer is probably 'replace the questionable disk entirely'. It happens that we don't have lots of spare drives at this moment, plus I don't have enough drive bays in my machine to make this at all convenient right now.
(Also, in theory I didn't need to clear the SMART warnings at all.
In practice the Fedora 23 smartd whines incessantly about this
to syslog at a very high priority, which causes one of my windows
to get notifications every half hour or so and I just couldn't stand
it any more. It was either shut up smartd somehow or replace the
disk. Believe it or not, all these steps seemed to be the easiest way
to shut up smartd. It worked, too.)
2016-02-02
A justification for some odd Linux ARP behavior
Years ago I described an odd Linux behavior which attached the wrong source IP to ARP replies and said that I had a justification for why this wasn't quite as crazy as it sounds. The setup is that we have a dual-homed machine on two networks, call them net-3 and net-5. If another machine on net-3 tries to talk to the dual-homed machine's net-5 IP address, it would send out an ARP request on net-3 of the form:
Request who-has <net-3 client machine IP address> tell <net-5 IP address>
As I said at the time, this was a bit surprising as normally you'd expect a machine to send ARP requests with the 'tell ...' IP address set to an IP address that is actually on the interface that the ARP request is sent out on.
What Linux appears to be doing instead is sending the ARP request with the IP address that will be the source IP of the eventual actual reply packet. Normally this will also be the source IP for the interface the ARP request is done on, but in this case we have asymmetric routing going on. The client machine is sending to the dual homed server's net-5 IP address, but the dual homed machine is going to just send its replies directly back out its net-3 interface. So the ARP request it makes is done on net-3 (to talk directly to the client) but is made with its net-5 IP address (the IP address that will be on the TCP packet or ICMP reply or whatever).
This makes sense from a certain perspective. The ARP request is caused by some IP packet to be sent, and at this point the IP packet presumably has a source IP attached to it. Rather than look up an additional IP address based on the interface the ARP is on, Linux just grabs that source IP and staples it on. The resulting MAC to source IP address association that many machines will pick up from the ARP request is even valid, in a sense (in that it works).
(Client Linux machines on net-3 do pick up an ARP table entry for the dual homed machine's net-5 IP, but they continue to send packets to it through the net-3 to net-5 gateway router, not directly to the dual homed machine.)
There is probably a Linux networking sysctl that will turn this
behavior off. Some preliminary investigation suggests that
arp_announce is probably what we want, if we care enough to
set any sysctl for this (per the documentation).
We probably don't, since the current behavior doesn't seem to be
causing problems.
(We also don't have very many dual-homed Linux hosts where this could come up.)