2016-03-26
The sensible update for my vintage 2011 home machine
As it happens, I think that there is a sensible, boring answer to what I should do about my current home computer, one that bypasses all of my concerns over a potential replacement machine. It's just an answer I've been somewhat reluctant to do, partly because it's nowhere near as interesting as putting together a new machine and partly because it's going to require some awkward gyrations.
What I should really do is get a pair of 500 GB SSDs and use them to replace the current pair of system disks. A relatively small amount of space would go to an ext4 software RAID mirrored root filesystem (with all system stuff in it); the majority of it would go to a new ZFS pool where I would put my home directory and a few other filesystems where I want fast access (for example because I compile source code there). This would get me both ZFS and a speedup on a bunch of things that I do.
(At this point, some people are probably incredulous that I haven't already shifted my primary storage to SSDs.)
Following my experience with my work machine, I should probably also take a deep breath and switch my machine from 16 GB of RAM to 32 GB, because this will make ZFS happier. The RAM will not be reusable in a future machine, unlike the SSDs, and it's still surprisingly expensive, so it's tempting to skip this and see if I can make do (especially with ZFS on Linux ARC improvements).
While upgrading my entire machine is the attractive thing to do, at this point it seems very likely that the largest single practical speedup I could do is moving frequently used data to from HDs to SSDs. And I don't need a whole new machine to do that, by any means. There are limitations to what this will do for me (it won't speed up processing photos, for example), but I think it's likely to make a difference.
(Having looked at the CPU compatibility list for my Asus P8P67 LE motherboard, I don't think there's any point in trying to upgrade the CPU. In fact it basically seems like the LGA 1155 CPUs that my motherboard uses are now so old that they're basically legacy hardware that you pay extra for. Even if my perception here is wrong, it doesn't look like I can get much of a CPU upgrade.)
PS: My plan to put the root filesystem on the SSDs comes partly from necessity (it's currently on the HDs that the SSDs would replace) and partly because that's what I wound up doing on my work machine after contemplating my dilemma about this.
Sidebar: the irrational reason I've been avoiding doing things like this
For no particularly good reason, I've always found it easier to talk myself into buying a whole new machine every so often instead of progressively upgrading a machine piece by piece. As best I can figure it out, spending more money on something that's progressively becoming more and more obsolete just feels like a waste.
(As you may guess, I'm not one of the people who enjoys progressively upgrading machines piece by piece. In fact I think the only hardware changes I've done in machines to date has been replacing dead HDs.)
2016-03-23
Wayland and graphics card uncertainty
Yesterday I wrote about several points of general technology churn that make me reluctant to think about a new PC. But as it happens there's an additional Linux specific issue that I worry about, and that's Wayland. More exactly, it's what Wayland is likely to require from graphics cards.
I use a very old fashioned X based environment, which means that all I need is old fashioned basic X. I think I run some things that like to see OpenGL, but probably not very many of them, and there's basic OpenGL support in even basic X these days. This has left me relatively indifferent to graphics cards and graphics card support levels; even what was a low end card at the time was (and is) good enough for my desktop.
I would like to keep on using my basic X environment for the lifetime of my next machine, but the forces behind Wayland are marching on with sufficient force that I don't think I can assume that any more. People are really trying to ship Wayland based desktops within the next couple of years on major Linux distributions (in particular, on Fedora), and once that happens I suspect that my sort of X environment will only have a few more years of viable life before toolkits and major programs basically stop supporting it.
(Oh, they'll technically 'support' X. But probably no one will be actively maintaining the code and so steadily increasing numbers of pieces will break.)
At that point, switching to Wayland will be non-optional for me (even if it results in upending my environment and makes me unhappy). In turn that means I'll need a graphics system that can handle Wayland well. Wayland is a modern system, so as far as I know it really wants things like hardware composition, hardware alpha-blending, and so on. Using Wayland without OpenGL-level hardware acceleration may be possible, but it's not likely to be pleasant.
What sort of graphics card (or integrated graphics) does this call for? I have no real idea, and I'm not sure anyone knows yet what you'll want to have for a good Wayland experience. That uncertainty makes me want to postpone buying a graphics card until we know more, which will probably need some brave Linux distribution to enable Wayland by default so that a lot of people run it.
(Of course I may be overthinking this as part of mostly not wanting to replace my current machine.)
2016-03-14
How RPM handles configuration files, both technically and socially
In comments on my entry on Ubuntu packages playing 'hide the configuration file', James asked how RPM handles user editable configuration files (and then guaq gave a good answer). This is an important enough thing to talk about that I'm going to write an entry about it, because RPM's approach is both technical and social.
On the technical side, the first thing is that if the file hasn't changed between the old package and the new package update, RPM knows this and doesn't need to do anything; it doesn't touch the installed file and doesn't bother putting the stock version anywhere. If the configuration file has changed between the old and the new package and you've also edited the configuration file, most of the time the package will write out the new configuration file as <file>.rpmnew (for exact details of what happens when, see Jon Warbrick's writeup; while it's from some time ago, I don't believe RPM's behavior has changed since).
If this isn't enough, RPM packages can also do arbitrary things on package upgrades in their postinstall scripts. I believe that this has occasionally been used to do things like edit installed configuration files to modify or comment out configuration directives that have unavoidably changed or disappeared between software versions. However, this is where we shade into the social side of things.
Because RPM packages can't really try to do sophisticated things
with configuration files, they have a strong social pressure to not
need them. As a packager, part of your job is to make sure that the
package's configuration files work right even within these restrictions.
Generally you must keep existing configuration files working correctly
even over package updates; resorting to pushing installed configuration
files aside as .rpmsave files is strongly discouraged (as far as
I know; it's certainly uncommon). This requires more up front work,
is harder, and inevitably requires some tradeoffs (where people
with edited configuration files don't magically get the benefit of
new configuration options), but in my opinion it produces better
long term benefits for sysadmins.
(It also encourages schemes to split configuration files up in various ways, because then it's more likely that most of the split configuration files will be unedited. RPM based systems are not alone in this, of course, and sometimes they don't do it as well as other systems.)
As a result, while you can do crazy things in RPM postinstall scripts, it's effectively discouraged. I would be surprised if Fedora packaging review was very receptive to mangling configuration files in postinstall scripts, for example (although it's not covered explicitly in the Fedora packaging guidelines). And of course the technical restriction that you can't ask the user any questions in your postinstall script limits the amount of things it's sensible to even try to have an ultra-smart script do.
This also affects what sort of default configuration it's sensible to set up in your package's stock files. Generally it's going to be better to create something minimal but stable instead of a complicated setup with lots of options that has to be managed by hand. Sure, many sysadmins will have to change your minimal setup, but on the other hand your package itself hopefully won't have to change and revise the setup from version to version. This keeps all of the changes, customizations, and updates in the hands of sysadmins, instead of making some of them your responsibility.
RPM's technical approach could not work without the social approach (although of course its technical limitations combined with sane people create the social approach). As a result, any RPM based system is going to have a terrible time with a program that needs configuration file customization and also changes its configuration files around in incompatible ways (either as for what's allowed in them or as to how they're organized). Fortunately there are not many programs like this, because they're unpopular with sysadmins even without RPM packaging issues.
2016-03-11
Why it's irritating when Ubuntu packages don't include their configuration files
I'll start with my tweets:
This is my angry face when Ubuntu openssh-server generates a fixed /etc/ssh/sshd_config on the fly in postinst vs making it a config file.
Ubuntu dovecot also does the 'generate static configuration files at install time' thing and now I want to set it on fire.
(Twitter has become my venue of choice for short rants.)
Both of these packages have the same bad idea, although their
mechanisms differ somewhat. Specifically, the installed and running
software has some configuration files (here /etc/ssh/sshd_config
and /etc/dovecot/conf.d/*), but these files are not actually
packaged in the .debs; instead they are created at install time
when the package is installed from scratch.
(The behavior may be there in the Debian versions as well, but I don't run Debian so I can't easily check.)
Years ago I wrote about wishing to be able to retrieve the stock
version of some package file. At the time
I gave a recipe for doing this by hand on a Debian based system;
you apt-get download the .deb, use dpkg -x to unpack it in a
temporary directory, and fish the file out. Naturally this only
works if the .deb actually packages the file. If it does not,
you have two options. First, you can find, read, and probably
simulate by hand the process the .postinst script uses to create
the file (the postinst script itself is likely in /var/lib/dpkg/info).
Second, you have to install an uncustomized version of the package
somehow, either on a test machine or by purging and then reinstalling
the package (which is likely to be very disruptive if you have any
important local customizations).
This is what I call 'not helpful'. There are all sorts of reasons to want to retrieve the original version of a configuration file that you've altered (or merely may have altered). Even when this is possible, how to do it is package specific, which means that you need package specific knowledge or research. Ubuntu packaging should do better here.
(I don't expect things to ever change, though. I rather expect that this is inherited from Debian and Debian has some intricate reason of their own for doing it this way (and for different packages doing it differently). Debian sometimes makes me angry with their apparent love of robot logic, technical intricacies, and apparent inability to ever do something in a single unified way.)
2016-03-07
Apt-get and its irritating lack of easy selective upgrades
One of my many irritations with apt-get
is that it doesn't easily allow you to only apply some of the pending
updates. Sure, often you want to apply all of the updates (at
least all of the unheld updates), but there
are any number of cases where you want to be more selective. Sometimes
you are in a rush and you want to apply only a few very urgent
updates. Sometimes you want to apply updates in a specific order,
updating some packages before others. Sometimes you want to apply
most updates but temporarily exclude some that you consider low
priority or disruptive.
With a package manager like yum (or now dnf) you can easily do
either of these. If you just want to exclude some packages, you do
that with '--exclude'; if you only want to upgrade some packages,
you do that by supplying them as explicit arguments. And it's
harmless to be a bit broad in your explicit arguments, because
you're specifically only upgrading existing packages; you'll never
install new ones out of nowhere.
apt-get does not support this usage as far as I can see. apt-get
upgrade takes no package list and has no way of excluding some
packages; it is an all or nothing operation, where the only way you
have of being selective is to hold packages in advance in order to
block their upgrades. In order to upgrade packages selectively, you
must turn to 'apt-get install', probably with '--only-upgrade'
so that you don't accidentally install new packages. And as far as
I can tell this has no equivalent of yum's --exclude, so there's
no easy way I can see of saying 'upgrade everything except the
following packages'.
(apt-get install does at least support wildcards, or more exactly
POSIX regular expressions. I don't know why they decided to use
regular expressions instead of shell globbing, but it feels like a
very Debian decision, especially the detail that it defaults to
matching substrings.)
'apt-get install --only-upgrade PKG ...' solves about half of the
problem (although clumsily) so I'm less disgruntled than I was at
the start of writing this entry, but it's still not particularly
great.
2016-03-05
What happens when a modern Linux system boots without /bin/sh
It turns out that a whole lot of things explode when your system boots up with /bin/sh not working for some mysterious reason.
Here the mysterious reason was that there was an unresolved dynamic
library symbol, so any attempt to run /bin/sh or /bin/bash died
with an error message from the ELF interpreter.
The big surprise for me was just how far my systemd-based Fedora
23 machine managed to get despite this handicap. I certainly saw a
cascade of unit failures in the startup messages so I knew that
something bad had happened, but the first inkling I had of just how
bad it was came when I tried to log in as root on the (text)
console and the system just dumped me back at the login: prompt.
Most of the system services had managed to start because their
systemd .service files did not need /bin/sh to run; only a few
things (some of them surprising) had failed, although a dependency
chain for one of them wound up blocking the local resolving DNS
server from starting.
The unpleasant surprise was how much depends on /bin/sh and
/bin/bash working. I was able to log in as myself because I use
a different shell, but
obviously root was inaccessible, my own environment relies on a
certain amount of shell scripts to be really functional, and a
surprising number of standard binaries are shell scripts these days
(/usr/bin/fgrep, for example). In the end I got somewhat lucky
in that my regular account had sudo access and sudo can be used
to run things directly, without needing /bin/sh or root's shell
to be functioning.
(I mostly wound up using this to run less to read logs and
eventually reboot. If I'd been thinking more carefully, I could
have used sudo to run an alternate shell as root, which would
have been almost as good as being able to log in directly.)
Another pretty useful thing here is how systemd captured a great deal of the error output from startup services and recorded it in the systemd journal. This gave me the exact error messages, for example, which is at least reassuring to have even if I don't understand what went wrong.
What I don't have here is an exciting story of how I revived a
system despite its /bin/sh being broken. In the end the problem
went away after I rebooted and then power cycled my workstation.
Based on the symptoms I suspect that a page in RAM got scrambled
somehow (which honestly is a bit unnerving).
As a side note, the most surprising thing that failed to start was
udev trying to run the install command for the sound card drivers
(specifically snd_pcm). I suspect that this is used to restore
the sound volume settings to whatever they were the last time the
system was shut down, but I don't know for sure because things
didn't report the exact command being executed or whatever.
(My system has a 90-alsa-restore.rules udev rules file that tries
to run alsactl. It's not clear to me if udev executes RUN+=
commands via system(), which would have hit the issue, or in
some more direct way. Maybe it depends on whether the RUN command
seems to have anything that needs interpretation by the shell. I'm
pretty certain that at least some udev RUN actions succeeded.)
Sidebar: What exactly was wrong
This was on my Fedora 23 office machine, where /bin/sh is bash, and
bash was failing to start with a message to the effect of:
symbol lookup error: /bin/bash: undefined symbol: rl_unix_line_disc<binary garbage>
Bash does not mention a symbol with that exact name, but it does
want to resolve and use rl_unix_line_discard. Interestingly,
this is an internal symbol (it's both used and defined in bash);
despite this, looking it up goes via the full dynamic linker symbol
resolution process (as determined with the help of LD_DEBUG).
My guess is that the end of the symbol name was overwritten in RAM
with some garbage and that this probably happened in the Linux
kernel page cache (since it kept reappearing with the same message,
it can't have been in a per-process page).
Assuming I'm reading things correctly, the bytes of garbage are (in hex):
ae 37 d8 5f bf 6b d1 45 3a c0 d9 93 1b 44 12 2d 68 74
(less displays this as '<AE>7<D8>_<BF>k<D1>E:<C0>ٓ^R-ht', which
doesn't fully capture it. I had to run a snippet of journalctl's
raw output through 'od -t c -t x1' to get the exact hex.)
2016-02-22
We've permanently disabled overlayfs on our servers
Oh look, yet another Linux kernel local exploit in the overlayfs module. Time to permanently blacklist it on all of our machines.
Today's bugs are CVE-2016-1576 and CVE-2016-1575 (via). There have been others before, and probably more that my casual Internet searches aren't turning up.
Based on my experiences so far, the two most common ingredients in exploitable kernel security issues we've been seeing Ubuntu announcements for are overlayfs and user namespaces. As far as I know, we can't do anything to turn off user namespaces without rebuilding and maintaining our own kernel packages, but overlayfs is (just) a loadable kernel module. A kernel module that we don't use.
So now we have an /etc/modprobe.d/cslab-overlayfs.conf file on
all of our servers that says:
# Permanently stop overlayfs from being loaded # because it keeps having security issues and # we don't use it. blacklist overlayfs install overlayfs /bin/false
Pretty soon this will be in our install framework, which means that future machines will probably be like this for several Ubuntu LTS versions to come. I feel some vague regret, but not very much. I'm done putting up with the whole 'surely we'll get this right someday' approach to making these subsystems not create security issues.
By the way, I don't find issues in either subsystem to be particularly surprising given what they do. User namespaces especially are a recipe for trouble in practice, because they let you create environments that break long standing Unix security assumptions. Sure, they are supposed to only do this in a way that is still secure, but in practice, no, things keep slipping through the cracks. In a sane world it would be possible to disable user namespaces at runtime on distribution kernels. Sadly we're not in that world.
2016-02-21
Why the Ubuntu package update process makes me irritated
We have fifty or so Ubuntu machines in our fleet, which means that when Ubuntu comes out with package updates we have to update them all. And this is rather a problem. First, obviously, fifty machines is too many to do updates on by hand; we need to automate this. Unfortunately Ubuntu (or Debian) has made a series of decisions about how packages and package updates work that cause significant amounts of pain here.
(Although Debian and Ubuntu share the same package format, I don't have enough experience with Debian to speak for it here. I wouldn't be surprised if much of this applied there, but I don't know for sure.)
To start with, Ubuntu package updates are slow. Well, okay, everyone's package updates are slow, but when you have fifty-odd machines the slowness means that doing them one by one can take a significant time. Even a modest package update can take an hour or so, which means that you'd really like to do them in parallel in some way.
(This does not bite too hard for normal updates, because you can do those once a day in the morning while you're getting coffee, reading email, and so on, and thus sort of don't care if they take a while. This really bites for urgent security updates that you want to apply right when you read Ubuntu's announcement emails, especially if those emails show up just before you're about to leave for the day.)
Next, Ubuntu package updates via apt-get are very verbose, and
worse they're verbose in unpredictable ways, as packages feel free
to natter about all sorts of things as they update. It may be nice
to know that your package update is making progress, but without
any regularity in the output it's extremely hard for a program to
look through it and find problem indicators (and people have
problems too). Other systems are much
better here, because their normal output is very regular and you
can immediately pick out deviations from that (either in a program
or by eye). If you're going to automate a package update process
in any way, you'd really like to be able to spot and report problems.
(Also, if you're updating a bunch of machines, it's a great thing to be able to notice problems on one machine before you go on to update another machine and have the same problems.)
In addition, Ubuntu packages are allowed to ask questions during package updates, which insures that some of them do (sometimes for trivia). A package update that pauses to ask you questions is not a package update that can be automated in a hands-off way. In fact it's even hard to automate in any way (at least well), because you need to build some way for a human to break into the process to answer those questions. Yes, in theory I think you can turn off these questions and force the package to take its default answer, but in practice I've felt for years that this is dangerous because nothing forces the default to be the right choice, or even sensible.
(I also expect that the verbosity of package updates would make it hard to spot times where a package wanted to ask a question but you forced it to take its default. These are cases where you really want to spot and review the decision, in case the package screwed up.)
The ideal hands off update process will never ask you questions and can be made to report only actual problems and what packages were updated. This makes it easy to do automatically and in parallel; you fire off a bunch of update processes on a bunch of machines, collect the output, and then can easily report both problems and actual packages updated. If you wanted to, you could use only limited parallelism, scan updates for problems, and freeze the whole process if any turn up. The only way that the Ubuntu package update process resembles this is that it's relatively easy to parse the output to determine what packages were updated.
2016-02-08
Clearing SMART disk complaints, with safety provided by ZFS
Recently, my office machine's smartd began complaining about problems
on one of my drives (again):
Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Device: /dev/sdc [SAT], 5 Offline uncorrectable sectors
As it happens, I was eventually able to make all of these complaints go away (I won't say I fixed the problem, because the disk is undoubtedly still slowly failing). This took a number of steps and some of them were significantly helped by ZFS on Linux.
(For background, this disk is one half of a mirrored pair. Most of it is in a ZFS pool; the rest is in various software RAID mirrors.)
My steps:
- Scrub my ZFS pool, in the hopes that this would make the problem go
away like the first iteration of
smartdcomplaints. Unfortunately I wasn't so lucky this time around, but the scrub did verify that all of my data was intact. - Use
ddto read all of the partitions of the disk (one after another) in order to try to find where the bad spots were. This wound up making four of the five problem sectors just quietly go away and did turn up a hard read error in one partition. Fortunately or unfortunately it was my ZFS partition.The resulting kernel complaints looked like:
blk_update_request: I/O error, dev sdc, sector 1362171035 Buffer I/O error on dev sdc, logical block 170271379, async page read
The reason that a ZFS scrub did not turn up a problem was that ZFS scrubs only check allocated space. Presumably the read error is in unallocated space.
- Use the kernel error messages and carefully iterated experiments
with
dd'sskip=argument to make sure I had the right block offset into/dev/sdc, ie the block offset that would makeddimmediately read that sector. - Then I tried to write zeroes over just that sector with '
dd if=/dev/zero of=/dev/sdc seek=... count=1'. Unfortunately this ran into a problem; for some reason the kernel felt that this was a 4k sector drive, or at least that it had to do 4k IO to/dev/sdc. This caused it to attempt to do a read-modify-write cycle, which immediately failed when it tried to read the 4k block that contained the bad sector.(The goal here was to force the disk to reallocate the bad sector into one of its spare sectors. If this reallocation failed, I'd have replaced the disk right away.)
- This meant that I needed to do 4K writes, not 512 byte writes, which
meant that I needed the right offset for
ddin 4K units. This was handily the 'logical block' from the kernel error message, which I verified by running:dd if=/dev/sdc of=/dev/null bs=4k skip=170271379 count=1
This immediately errored out with a read error, which is what I expected.
- Now that I had the right 4K offset, I could write 4K of /dev/zero
to the right spot. To really verify that I was doing (only) 4K of
IO and to the right spot, I ran
ddunderstrace:strace dd if=/dev/zero of=/dev/sdc bs=4k seek=170271379 count=1
- To verify that this
ddhad taken care of the problem, I redid theddread. This time it succeeded. - Finally, to verify that writing zeroes over a bit of one side of my ZFS pool had only gone to unallocated space and hadn't damaged anything, I re-scrubbed the ZFS pool.
ZFS was important here because ZFS checksums meant that writing
zeroes over bits of one pool disk was 'safe', unlike with software
RAID, because if I hit any in-use data ZFS would know that the chunk
of 0 bytes was incorrect and fix it up. With software RAID I guess
I'd have had to carefully copy the data from the other side of the
software RAID, instead of just using /dev/zero.
By the way, I don't necessarily recommend this long series of somewhat hackish steps. In an environment with plentiful spare drives, the right answer is probably 'replace the questionable disk entirely'. It happens that we don't have lots of spare drives at this moment, plus I don't have enough drive bays in my machine to make this at all convenient right now.
(Also, in theory I didn't need to clear the SMART warnings at all.
In practice the Fedora 23 smartd whines incessantly about this
to syslog at a very high priority, which causes one of my windows
to get notifications every half hour or so and I just couldn't stand
it any more. It was either shut up smartd somehow or replace the
disk. Believe it or not, all these steps seemed to be the easiest way
to shut up smartd. It worked, too.)
2016-02-02
A justification for some odd Linux ARP behavior
Years ago I described an odd Linux behavior which attached the wrong source IP to ARP replies and said that I had a justification for why this wasn't quite as crazy as it sounds. The setup is that we have a dual-homed machine on two networks, call them net-3 and net-5. If another machine on net-3 tries to talk to the dual-homed machine's net-5 IP address, it would send out an ARP request on net-3 of the form:
Request who-has <net-3 client machine IP address> tell <net-5 IP address>
As I said at the time, this was a bit surprising as normally you'd expect a machine to send ARP requests with the 'tell ...' IP address set to an IP address that is actually on the interface that the ARP request is sent out on.
What Linux appears to be doing instead is sending the ARP request with the IP address that will be the source IP of the eventual actual reply packet. Normally this will also be the source IP for the interface the ARP request is done on, but in this case we have asymmetric routing going on. The client machine is sending to the dual homed server's net-5 IP address, but the dual homed machine is going to just send its replies directly back out its net-3 interface. So the ARP request it makes is done on net-3 (to talk directly to the client) but is made with its net-5 IP address (the IP address that will be on the TCP packet or ICMP reply or whatever).
This makes sense from a certain perspective. The ARP request is caused by some IP packet to be sent, and at this point the IP packet presumably has a source IP attached to it. Rather than look up an additional IP address based on the interface the ARP is on, Linux just grabs that source IP and staples it on. The resulting MAC to source IP address association that many machines will pick up from the ARP request is even valid, in a sense (in that it works).
(Client Linux machines on net-3 do pick up an ARP table entry for the dual homed machine's net-5 IP, but they continue to send packets to it through the net-3 to net-5 gateway router, not directly to the dual homed machine.)
There is probably a Linux networking sysctl that will turn this
behavior off. Some preliminary investigation suggests that
arp_announce is probably what we want, if we care enough to
set any sysctl for this (per the documentation).
We probably don't, since the current behavior doesn't seem to be
causing problems.
(We also don't have very many dual-homed Linux hosts where this could come up.)