Wandering Thoughts archives


Making my Yubikey work reasonably with my X screen locking

When I moved from unencrypted SSH keys to encrypted SSH keys held in a running ssh-agent process, I arranged things so that the keys would be removed when I locked my screen (which I do frequently) and then unlocked and added again when I unlocked my screen; I wrote this up as part of this entry. Soon after I started playing around with having SSH keys in my Yubikey, it became clear to me that I needed to do the same thing with the Yubikey's SSH keys. More specifically, I needed to automatically re-add the Yubikey's keys when I unlocked the screen, which means (automatically) providing the Yubikey's PIN code to ssh-add instead of being constantly prompted for it every time I unlocked my screen. Typing two passwords at screen unlock time is just a bit too irritating for me; inevitably it would discourage me from routinely using the Yubikey.

(Removing the Yubikey keys from ssh-agent happens automatically when I run 'ssh-add -D' as part of starting the screen locker, although I've added specifically removing the PKCS#11 SSH agent stuff as well. You actually want to do this because otherwise the PKCS#11 SSH agent stuff gets into a weird state where it's non-functional but loaded, so you can't just do 'ssh-add -s' to get it going again.)

As I sort of mentioned in passing in my entry on how I set up SSH keys on my Yubikey, the Yubikey's PIN code allows more or less full alphanumerics, so in theory I could just make the PIN code the same as my regular SSH key password and then use the obvious extension of the Perl script from this entry to also feed it to ssh-add when I re-enable PKCS#11 stuff. However, after thinking about it I decided that I wasn't entirely comfortable with that; too many tools for dealing with the Yubikey are just a little bit too casual with the PIN code for me to make it something as powerful and dangerous as my regular password.

(For example, a number of them want the PIN provided in plain text on the command line. I'm not doing that with my regular password.)

This left me with the problem of going from my regular password to the Yubikey PIN. The obvious answer is to encrypt a file with the PIN in it with my regular password, then decrypt it on the fly in order to feed it to ssh-add. After some searching I settled on doing this with ccrypt, which is packaged for Fedora and which has an especially convenient mode where you can feed it the key as the first line of input, with the encrypted file following immediately afterwards.

So now I have a little script that takes my regular password on standard input (fed from the Perl script I run via xlock's -pipepassCmd argument) and uses it to decrypt the PIN file and feed it to ssh-add. It looks like this:

# drop PKCS#11 stuff; required to re-add it
ssh-add -e /usr/lib64/opensc-pkcs11.so >/dev/null 2>&1
# give ssh-add no way to ask us for the passphrase
(sed 1q; cat $CRYPTLOC) | ccat -k - | \
   notty ssh-add -s /usr/lib64/opensc-pkcs11.so

The one peculiar bit is notty, which is a little utility program to run another program without a controlling terminal. If you run ssh-add this way, it reads the PKCS#11 PIN from standard input, which is just what I want here. I need to use notty at all because the Perl script runs this script via (Perl) Expect, which means that it's running with a pty.

(There are alternate ways to arrange things here, but right now I prefer this approach.)

(See my first Yubikey entry for a discussion of when you need to remove and re-add the PKCS#11 SSH agent stuff. The short version is any time that you remove and reinsert the Yubikey, drop SSH keys with 'ssh-add -D' (as we're doing during screen locking), or run various commands to poke at the Yubikey directly.)

PS: I've come around to doing 'ssh-add -e' before almost any attempt to do 'ssh-add -s'. It's a hack and in an ideal world it wouldn't be necessary, but there's just too many situations where ssh-agent can wind up with PKCS#11 stuff loaded but non-functional and the best (and sometimes only) way to clean this up is to remove it and theoretically start from scratch again. Maybe someday all of this will be handled better. (Perhaps gpg-agent is better here.)

linux/YubikeyAndScreenLocking written at 23:10:52; Add Comment


Why we care about long uptimes

Here's a question: why should we care about long uptimes, especially if we have to get these long uptimes in somewhat artificial situations like not applying updates?

(I mean, sysadmins like boasting about long uptimes, but this is just boasting. And we shouldn't make long uptimes a fetish.)

One answer is certainly 'keeping your system up avoids disrupting users'. Of course there are many other ways to achieve this, such as redundancy and failure-resistant environments. The whole pets versus cattle movement is in part about making single machine uptime unimportant; you achieve your user visible uptime by a resilient environment that can deal with all sorts of failures, instead of heroic (and artificial) efforts to keep single machines from rebooting or single services from restarting.

(Note that not all environments can work this way, although ours may be an extreme case.)

My answer is that long uptimes demonstrate that our systems are fundamentally stable. If you can keep a system up and stable for a long time, you've shown that (in your usage) it doesn't have issues like memory leaks, fragmentation, lurking counter rollover problems, and so on. Even very small issues here can destabilize your system over a span of months or years, so a multi-year uptime is a fairly strong demonstration that you don't have these problems. And this matters because it means that any instability problems in the environment are introduced by us, and that means we can control them and schedule them and so on.

A system that lacks this stability is one where at a minimum you're forced to schedule regular service restarts (or system reboots) in order to avoid unplanned or unpleasant outages when the accumulated slow problems grow too big. At the worst, you have unplanned outages or service/system restarts when the system runs itself into the ground. You can certainly deal with this with things like auto-restarted programs and services, deadman timers to force automated reboots, and so on, but it's less than ideal. We'd like fundamentally stable systems because they provide a strong base to build on top of.

So when I say 'our iSCSI backends have been up for almost two years', what I'm really saying is 'we've clearly managed to build an extremely stable base for our fileserver environment'. And that's a good thing (and not always the case).

sysadmin/LongUptimesImportance written at 23:55:29; Add Comment

How I managed to shoot myself in the foot with my local DNS resolver

I have my home machine's Twitter client configured so that it opens links in my always-running Firefox, and in fact there's a whole complicated lashup of shell scripting surrounding this in an attempt to the right thing with various sorts of links. For the past little while, clicking on some of those links has often (although not always) been very slow to take effect; I'd click a link and it'd be several seconds before I got my new browser window. In the beginning I wrote this off as just Twitter being slow (which it sometimes is) and didn't think too much about it. Today this got irritating enough that I decided to investigate a bit, so I ran Dave Cheney's httpstat against twitter.com, expecting to see that all the delay was in either connecting to Twitter or in getting content back.

(To be honest, I expected that this was something to do with IPv6, as has happened before. My home IPv6 routing periodically breaks or malfunctions even when my IPv4 routing is fine.)

To my surprise, httpstat reported that it'd spent just over 5000 milliseconds in DNS lookup. So much for blaming anyone else; DNS lookup delays are pretty much all my fault, since I run a local caching resolver. I promptly started looking at my configuration and soon found the problem, which comes in two parts.

First, I had (and have) my /etc/resolv.conf configured with a non-zero ndots setting and several search (sub)domains. This is for good historical reasons, since it lets me do things like 'ssh apps0.cs' instead of having to always specify the long fully qualified domain. However, this means that every reasonably short website name, like twitter.com, was being checked to see if it was actually a university host like twitter.com.utoronto.ca. Of course it isn't, but that means that I was querying our DNS servers quite a lot, even for lookups that I conceptually thought of having nothing to do with the university.

Second, my home Unbound setup is basically a copy of my work Unbound setup, and when I set it up (and copied it) I deliberately configured explicit Unbound stub zones for the university's top level domain that pointed to our nameservers. At work, the intent of this was to be able to resolve in-university hostnames even if our Internet link went down. At home, well, I was copying the work configuration because that was easy and what was the harm in short-cutting lookups this way?

In case you are ever tempted to this, the answer is that you have to be careful to keep your list of stub zone nameservers up to date, and of course I hadn't. As long as my configuration didn't break spectacularly I didn't give it any thought, and it turned out that one of the IP addresses I had listed as a stub-addr server doesn't respond to me at all any more (and some of the others may not have been entirely happy with me). If Unbound decided to send a query for twitter.com.utoronto.ca to that IP, well, it was going to be waiting for a timeout. No wonder I periodically saw odd delays like this (and stalls when I was trying to pull from or check github.com, and so on).

(Twitter makes this much more likely by having an extremely short TTL on their A records, so they fell out of Unbound's cache on a regular basis and had to be re-queried.)

I don't know if short-cut stub zones for the university's forward and reverse DNS is still a sensible configuration for my office workstation's Unbound, but it definitely isn't for home usage. If the university's Internet link is down, well, I'm outside it at home; I'm not reaching any internal servers for either DNS lookups or connections. So I've wound up taking it out of my home configuration and looking utoronto.ca names up just like any other domain.

(This elaborates on a Tweet of mine.)

Sidebar: The situation gets more mysterious

It's possible that this is actually a symptom of more than me just setting up a questionable caching DNS configuration and then failing to maintain and update it. In the process of writing this entry I decided to take another look at various university DNS data, and it turns out that the non-responding IP address I had in my Unbound configuration is listed as an official NS record for various university subdomains (including some that should be well maintained). So it's possible that something in the university's DNS infrastructure has fallen over or become incorrect without having been noticed.

(I wouldn't say that my Unbound DNS configuration was 'right', at least at home, but it does mean that my configuration might have kept working smoothly if not for this broader issue.)

sysadmin/LocalDNSConfigurationFumble written at 02:17:43; Add Comment


ZFS's 'panic on on-disk corruption' behavior is a serious flaw

Here's a Twitter conversation from today:

@aderixon: For a final encore at 4pm today, I used a corrupted zpool to kill an entire Solaris database cluster, node by node. #sysadmin

@thatcks: Putting the 'fail' in 'failover'?

@aderixon: Panic-as-a-service. Srsly, "zpool import" probably shouldn't do that.

@thatcks: Sadly, that's one of the unattractive sides of ZFS. 'Robust recovery from high-level on-disk metadata errors' is not a feature.

@aderixon: Just discovering this from bug reports. There will be pressure to go back to VXVM now. :-(

Let me say this really loudly:

Panicing the system is not an error-recovery strategy.

That ZFS is all too willing to resort to system panics instead of having real error handling or recovery for high level metadata corruption is a significant blemish. Here we see a case where this behavior has had a real impact on a real user, and may cause people to give up on ZFS entirely. They are not necessarily wrong to do so, either, because they've clearly hit a situation where ZFS can seriously damage their availability.

In my jaundiced sysadmin view, OS panics are for temporary situations where the entire system is sufficiently corrupt or unrecoverable that there is no way out. When ZFS panics on things that are recoverable with more work, it's simply being lazy and arrogant. When the issue is with a single pool, ZFS panicing converts a single-pool issue into an entire-server issue, and servers may have multiple pools and all sorts of activities.

Panicing due to on-disk corruption is even worse, as it converts lack of error recovery into permanent unavailability (often for the entire system). A temporary situation at least might clear itself when you panic the system and reboot, as you can hope that a corrupted in-memory data structure will be rebuilt in non-corrupted form when the system comes back up. But a persistent condition like on-disk corruption will never go away just because you reboot the server, so there is very little hope that ZFS's panic has worked around the problem. At the best, it's still lurking there like a landmine waiting to blow your system up later. At the worst, in single server situations you can easily get the system locked into a reboot loop, where it boots and starts an import and panics again. In clustering or failover environments, you can wind up taking the entire cluster down (along with all of its services) as the pool with corruption successively poisons every server that tries to recover it.

Unfortunately none of this is likely to change any time soon, at least in the open source version of ZFS. ZFS has been like this from the start and no one appears to care enough to fund the significant amount of work that would be necessary to fix its error handling.

(It's possible that Oracle will wind up caring enough about this to do the work in a future Solaris version, but I'm dubious even of that. And if they do, it's not like we can use it.)

(I had my own experience with this sort of thing years ago; see this 2008 entry. As far as I can tell, very little has changed in how ZFS reacts to such problems since then.)

solaris/ZFSPanicOnCorruptionFlaw written at 02:08:26; Add Comment


Watch out for web server configurations that 'cross over' between sites

We have a long-standing departmental web server that dates back to the days when it wasn't obvious that the web was going to be a big thing. Naturally, one of the things that it has is old-style user home pages, in the classical old Apache UserDir style using /~<user>/ URLs. Some of these are plain HTML pages in directories, some reverse proxy to user run web servers, and some have suexec CGIs. The same physical server and Apache install also hosts a number of other virtual hosts, some for users and some for us, such as our support site.

Recently we noticed a configuration problem: UserDirs were active on all of the sites hosted by Apache, not just our main site. Well, they were partially active. On all of the other virtual hosts, you only got the bare files for a /~<user>/ URL; CGIs didn't run (instead you got the contents of the CGI file itself) and no reverse proxies were in effect. We had what I'll call a 'crossover' configuration setting, where something that was supposed to apply only to a single virtual host had leaked over into others.

Such crossover configuration leaks are unfortunately not that hard to wind up with in Apache, and I think I've managed to do this in Lighttpd as well. Generally this happens when you set up some configuration item without being careful to explicitly scope it; that's what Ubuntu's /etc/apache2/mods-available/userdir.conf configuration file did (and does), as it has the following settings:

<IfModule mod_userdir.c>
  UserDir public_html
  UserDir disabled root

(This is actually a tough problem without very split apart configuration files. Presumably Ubuntu wants a2enmod userdir to automatically enable userdirs in at least your default site, not just simply turn the module on and require you to add explicit UserDir settings to things.)

In Lighttpd you can get this if you put any number of things outside a carefully set up host-specific stanza:

$HTTP["host"] == "..." {
   # everything had better be here
# oops:
alias.url += ( "/some" => "/fs/thing" )

And it's not like a web server necessarily wants to absolutely forbid global settings. For instance, in my Lighttpd setup I have the following general stanza:

# Everyone shares the same Acme/Let's Encrypt
# challenge area for convenience.
alias.url += ( "/.well-known/acme-challenge/" => "/var/run/acme/acme-challenge/" )

This is quite handy, because it means the tool I use needs no website-specific configuration; regardless of what website name it's trying to verify, it can just stick files in /var/run/acme/acme-challenge. And that makes it trivial to get a LE certificate for another name, which definitely encourages me to do so.

I do wish that web servers at least made it harder to do this sort of 'crossover' global setting by accident. Perhaps web servers should require you to explicitly label configuration settings with their scope, even if it's global. You might still do it, but at least it would be clearer that you're setting something that will affect all sites you serve.

(In the mean time, I guess I have another rainy day project. I have to admit that 'audit all global Apache configuration settings' is not too thrilling or compelling, so it may be quite some time before it gets done. If ever.)

web/ApacheSiteConfigurationCrossover written at 02:10:38; Add Comment


How I've set up SSH keys on my Yubikey 4 (so far)

There are a fair number of guides out on the Internet for how to set up a Yubikey that holds a SSH key for you, like this one. For me, the drawback of these is that they're all focused on doing this through GPG, a full set of PGP keys, and gpg-agent. I don't want any of that. I have no interest in PGP, I'd rather not switch away from ssh-agent to gpg-agent, and I definitely don't want to get a whole set of PGP keys that I have to manage and worry about. I just want the Yubikey to hold a SSH key or two.

Fortunately this turns out to be quite possible and not all that complicated. I wound up finding and using two main references for this, Wikimedia's Yubikey-SSH documentation and then Thomas Habets' Yubikey 4 for SSH with physical presence proof. Also useful is Yubico's official documentation on this. I'm doing things somewhat differently than all of these, and I'm going to go through why I'm making the choices I am.

(I've done all of this on Fedora 24 with a collection of packages installed. You need the Yubico tools and OpenSC; I believe both of those are widely available for at least various Linux flavours. FreeBSD appears to have the necessary Yubico PIV tool in their ports, presumably along with the software you need to talk to the actual Yubikey.)

The first step is to change the default Yubikey PIN, PUK, and management key. You won't be using the PUK and management key very much so you might as well randomly generate them, as Wikimedia advises, but you'll be using the PIN reasonably frequently so you should come up with an 8-character alphanumeric password that you can remember.

# In a shell, I did:
key=$(dd if=/dev/urandom bs=1 count=24 2>/dev/null | hexdump -v -e '/1 "%02X"')
puk=$(dd if=/dev/urandom bs=1 count=6 2>/dev/null | hexdump -v -e '/1 "%u"'|cut -c1-8)
pin=[come up with one]
# Record all three values somewhere

yubico-piv-tool -a set-mgm-key -n $key
yubico-piv-tool -a change-pin -P 123456 -N $pin
yubico-piv-tool -a change-puk -P 12345678 -N $puk

Changing the management key is probably not absolutely required, because I don't think an attacker can use knowledge of the management key to compromise things. Even increasing the retry counters requires more than just the management key. I may wind up resetting my Yubikey's management key back to the default value for convenience.

(We don't need to do anything with ykpersonalize, because current Yubikeys come from the factory with all their operating modes already turned on.)

Next we'll create two SSH keys, one ordinary one and one that requires you to touch the Yubikey 4's button to approve every use. Both will require an initial PIN entry in order to use them; you'll normally do this when you load them into your ssh-agent.

SSH keypair creation goes like this:

  • tell the Yubikey to generate the special touch-always-required key.

    yubico-piv-tool -k $key -a generate --pin-policy=once --touch-policy=always -s 9a -o public.pem

    Note that the PIN and touch policies can only be set when the key is generated. If you get them wrong, you get to clear out the slot and start all over again. The default key type on the Yubikey 4 is 2048-bit RSA keys, and I decided that this is good enough for me for SSH purposes.

    A Yubikey 4 has four slots that we can use for SSH keys; these are the PIV standard slots 9a, 9c, 9d, and 9e. In theory the slots have special standardized meanings, but in practice we can mostly ignore that. I chose slot 9a here because that's what the Wikipedia example uses.

    (A Yubikey 4 also has a bunch of additional slots that we can set keys and certificates in, those being 82 through 95. However I've been completely unable to get the Fedora OpenSSH and PKCS#11 infrastructure to interact with them. It would be nice to be able to use these slots for SSH keys and leave the standard slots for their official purposes, but it's not possible right now.)

  • Use our new key to make a self-signed certificate. Because we told the Yubikey to require touch authentication when we use the key, you have to touch the Yubikey during the self-signing process to approve it.

    yubico-piv-tool -a verify-pin -P $pin -a selfsign-certificate -s 9a -S '/CN=touch SSH key/' --valid-days=1000 -i public.pem -o cert.pem

    I don't know if the Yubikey does anything special once the self-signed certificate expires, but I didn't feel like finding out any time soon. SSH keypair rollover is kind of a pain in the rear at the best of times.

    (We don't really need a self-signed certificate, since we only care about the keypair. But apparently making a certificate is required in order to make the public key fully usable for PKCS#11 and OpenSSH stuff.)

  • Load our now-generated self-signed certificate back into the Yubikey.

    yubico-piv-tool -k $key -a import-certificate -s 9a -i cert.pem

  • Finally, we need to get the SSH public key in its normal form and in the process verify that OpenSSH can talk to the Yubikey. The shared library path here is for 64-bit Fedora 24.

    ssh-keygen -D /usr/lib64/opensc-pkcs11.so -e

    This will spit out a 'ssh-rsa ...' line that is the public key in the usual format, suitable for adding to authorized_keys and so on.

    (Also, yes, configuring how to do PKCS#11 things by specifying a shared library is, well, it's something.)

The process for creating and setting up our more ordinary key is almost the same thing. We'll set a different touch policy and we'll extract the SSH public key from the public.pem file instead of using ssh-keygen -e, because ssh-keygen gives you no sign of which key is which once you have more than one key on the Yubikey. We'll use slot 9c for this second key. You could probably use any of the other three slots, but 9c is the slot I happened to have used to test all of this so I know it works.

yubico-piv-tool -k $key -a generate --pin-policy=once --touch-policy=never -s 9c -o public.pem
yubico-piv-tool -a verify-pin -P $pin -a selfsign-certificate -s 9c -S '/CN=basic SSH key/' --valid-days=1000 -i public.pem -o cert.pem
yubico-piv-tool -k $key -a import-certificate -s 9c -i cert.pem

ssh-keygen -i -m PKCS8 -f public.pem

Note that you absolutely do not want to omit the --pin-policy bit here. Otherwise you'll inherit the default PIN policy for this slot and things will wind up going terribly wrong when you try to use this key through ssh-agent.

The ssh-keygen invocation here came from this Stackexchange answer, which also has what you need to extract this information from a full certificate. This is a useful thing to know, because you can retrieve specific certificates from the Yubikey with eg 'yubico-piv-tool -a read-certificate -s SLOT', and you can see slot information with 'yubico-piv-tool -a status' (this includes the CN data we set up above, so it's useful to make it distinct).

With all of this set up, you can now add your Yubikey keys to ssh-agent with:

ssh-add -s /usr/lib64/opensc-pkcs11.so

You'll be prompted for your PIN. After it's accepted, you can use the basic Yubikey SSH key just as you would any other SSH key loaded into ssh-agent. The touch-required key is also used normally, except that you have to remember to touch the Yubikey while it's flashing to get your attention (fortunately the default timeout is rather long).

In an ideal world, everything would now be smooth sailing with ssh-agent. Unfortunately this is not an ideal world. The first problem is that you currently have to remove and re-add the PKCS#11 SSH agent stuff every time you remove and reinsert the Yubikey or purge your ssh-agent keys. More significantly, various other things can also break ssh-agent's connection to the Yubikey, forcing you to go through the same thing. One of these things is using yubico-piv-tool to do anything with the Yubikey, even getting its status. So if you do a SSH thing and it reports:

sign_and_send_pubkey: signing failed: agent refused operation

What this means is 'remove and re-add the PKCS#11 stuff again'. Some of the time, doing a SSH operation that requires your PIN such as:

ssh -I /usr/lib64/opensc-pkcs11.so <somewhere that needs it>

will reset things without the whole rigmarole.

You don't have to use the Yubikey keys through ssh-agent, of course; you can use them directly with either ssh -I /usr/lib64/opensc-pkcs11.so or by setting PKCS11Provider /usr/lib64/opensc-pkcs11.so in your .ssh/config (perhaps only for specific hosts or domains). However the drawback of this is that you'll be challenged for your Yubikey PIN every time you use a Yubikey-hosted SSH key (this happens regardless of what the setting of --pin-policy is). Using an agent means that you're only challenged once every so often. Of course in some circumstances, being challenged for your PIN on every use may be a feature.

(I have a theory about what's going on and going wrong in OpenSC, but it's for another entry. Ssh-agent has its own bugs here too, and it's possible that using gpg-agent instead would make things significantly nicer here. I have no personal experience with using gpg-agent as a SSH agent and not much interest in experimenting right now.)

While I haven't tested more than two SSH keys, I believe you could fill up all four slots with four different SSH keys just in case you wanted to segment things that much. Note that in general there's no simple way to tell which specific SSH key you're being requested to authorize; all keys share the same PIN and if you have more than one key set to require touch, you can't tell which key you're doing touch-to-approve for.

(Also, as far as I know the PKCS#11 stuff will make all keys available whenever it's used, including for ssh-agent. You can control which keys will be offered to what hosts by using IdentitiesOnly, but that's a limit imposed purely by the SSH client itself. If you absolutely want to strongly control use of certain keys while others are a bit more casual, you probably need multiple Yubikeys.)

Sidebar: working with PKCS#11 keys and IdentitiesOnly

The ssh_config manpage is very specific here: if you set IdentitiesOnly, keys from ssh-agent and even keys that come from an explicit PKCS11Provider directive will be ignored unless you have an IdentityFile directive for them. Which normally you can't have, because the Yubikey won't give you the private key. Fortunately there is a way around this; you can use IdentityFile with only the public key file. This is a rare case where doing this makes perfect sense and is the only way to get what you want if you want to combine Yubikey-hosted keys with selective identity offering.

sysadmin/Yubikey4ForSSHKeys written at 01:58:15; Add Comment


I have yet to start using any smartphone two-factor authentication

Now that I have a smartphone, in theory I could start using two-factor authentication to improve my security. In practice I have yet to set up my phone for this for anything (although I did download an app for it). There turn out to be several reasons for this.

First, the whole area is fairly confusing and partly populated by people that I don't really trust (hi, Google). Perhaps I am looking in the wrong places, but when I went looking at least the first time around there was a paucity of documentation on what is actually going on in the whole process, how it worked, what to expect, and so on. What I could find was mostly glossy copy and 'then some magic happens'. I'm a sysadmin; I don't like magic.

(The confusing clutter of apps didn't help things either, although I suspect that people who know what they're doing here have an easier time cutting through the marketing copy everyone has.)

Then, well, it's early days with my smartphone and I'm nervous about really committing to it for something as crucial as authentication. Pretty much everything I've read on 2FA contains scary warnings about what happens if your phone evaporates; at the least it's a big hassle. Switching on 2FA this early feels alarmingly like jumping into the deep end. Certainly it doesn't seem like something to do casually or simply as an experiment.

(Probably there's a good way to play around with 2FA to just try it out, but I have no idea what it would be. Scratch accounts on various services? Right now I'd have to commit to 2FA on something just to find out how the apps look and work. I suspect that other people have a background clutter of less important accounts that they can use to experiment with stuff like this.)

Finally is the big, blunt issue for me: I just don't have very many accounts out there (especially on websites) that I both feel strongly about and that I'm willing to make harder to use by adding 2FA authentication. Most of my accounts are casual things, even on big-ticket sites like Facebook, and on potentially somewhat more important sites like Github I'm not very enthused about throwing roadblocks in the way of, say, pushing commits up to my public repos.

(Part of this is that I'm usually not logged in to places. And obviously things would be quite different if I worked with any important Github repos.)

All of this feels vaguely embarrassing, since after all I'm supposed to care about security and I now have this marvelous possibility for completely free two-factor authentication, yet I'm not taking advantage of it. But I've already established that I have limits on how much I care about security.

tech/TwoFactorPhoneDisuse written at 02:25:04; Add Comment


How and why the new iptables -w option is such a terrible fumble

I wrote recently about the relatively new -w option for iptables and how it will make things blow up. Unfortunately for Linux sysadmins everywhere, exactly how the iptables people introduced this option is a case study in how not to make changes like this; it is essentially backwards from what you want to do. They could probably have made the situation worse than it is now, but it would take some ingenuity.

Perhaps it is not obvious why iptables -w is so terrible (I mean, clearly it wasn't obvious to the iptables developers). To start seeing where they went so wrong, let's ask a simple question: how do you write a script (or a program) that will run on both a system without this change and a system with it?

You can't just use -w on all your iptables commands, because the old version of iptables doesn't support the option; if you add it blindly, every command will fail. You can't not use -w on systems that support it, because omitting -w will make random iptables commands that you're running fail under some circumstances (as we've seen); in practice -w is a mandatory iptables option on systems that support it unless you have a relatively unusual system.

So the answer is 'you must probe for whether or not -w is supported on this version of iptables'. Which cuts to the root of the problem:

Introducing -w this way created a flag day for all uses of iptables.

Before the flag day, you could not use -w. After the flag day, you must use -w. Or at least, you must use -w if you want your iptables commands to be reliable all the time under all circumstances, including odd ones.

That's the next failing: the flag day introduction of -w created a situation where most or all uses of plain iptables on modern systems are subtly buggy and dangerous. They aren't obviously broken so that they fail all or most of the time; instead they now have a race condition. Race conditions are hard to run into (or find deliberately) and hard to diagnose, making them one of the most pernicious classes of bugs. We can see that this is the case because there are still buggy uses of iptables on Fedora.

The final failing is that the iptables developers made this use a single global lock. This maximizes the chance that iptables commands will collide with each other, even if they happen to be doing two completely unrelated things that would not interfere with each other in the least. Are you setting up IPv6 blocks in parallel with querying IPv4 ones? Tough luck, iptables will save you from yourself by making things fail.

All of this is a completely unforced set of errors on the part of the iptables developers. Faced with the underlying bug that two simultaneous iptables commands could interfere with each other in some situations, they could have solved the issue by serializing all iptables commands by default (ie, the equivalent of '-w'). This would have solved the problem without breaking all current uses of plain iptables. People who wanted their commands to fail instead of wait could have had a new 'fail immediately' option.

(I've written before about the related issue of how to deprecate things. Arguably this actually is the same issue, since in practice the iptables developers have deprecated use of iptables without -w.)

Sidebar: A bonus additional issue (fortunately rare)

If you happen to be running multiple iptables commands in parallel with -w and one stream of them is sufficiently unlucky that it waits for long enough, it will print to standard error a message like this:

Another app is currently holding the xtables lock; waiting for it to exit...

(The iptables developers have varied this message repeatedly as they've fiddled with various micro-issues around the implementation of locking, so different versions of different distributions will have somewhat different messages.)

This is not quite the total failure that printing new warning messages by default is, since you have to give a new command line option to produce this behavior. Still, it's not very helpful and of course it's not documented and it's generally hard to hit this, so you can easily write programs that don't expect this and will blow up in various ways if it ever happens.

linux/IptablesWOptionFumbles written at 23:03:07; Add Comment

The modern web is an unpredictable and strange place to develop for

Our local support site used to be not all that attractive and also not entirely well organized. Ultimately these descended from the same root cause; that iteration of the site started out life as a wiki (with a default wiki skin), then was converted to plain HTML via brute force when the wiki blew up in our faces. Recently we replaced it with a much nicer version that has a much more streamlined modern design.

As part of that more modern design, it has a menubar along the top with drop-down (sub-)menus for many of the primary options. These drop-downs are done via CSS, specifically with a :hover based style. When I tried the new site out on my desktop it all looked great, but as I was playing around with it a dim light went off in the back of my mind; I had a memory that hover events aren't supported on touch-based systems, for obvious reasons. So I went off to my work iPad Mini (then running iOS 9), fired up Safari, and lo and behold those nice drop-down menus were completely non-functional. You couldn't see them at all; if you touched the primary menu, Safari followed the link. We hacked around with a few options but decided to take the simple approach of insuring that all of the sub-menu links were accessible off the target page of the primary menu.

So far this was exactly what we'd expected. Then one of my co-workers reported that this didn't happen on her iPhone, and it emerged that she used the iOS version of Chrome instead of Safari. I promptly installed that on my iPad Mini, and indeed I saw the same Chrome behavior she did; touching or tapping the primary menu didn't follow the link, it caused the dropdown to appear. Well, okay, that wasn't too strange and it sort of made sense that different browsers might do things slightly differently here (perhaps even deliberately).

(Note that this is slightly weird on iOS because on iOS all browsers use the same underlying engine. So Safari and Chrome are using the same core engine here but are making it behave somewhat differently. The Brave browser has Chrome's behavior.)

Now things get weird. I recently got a latest-generation iPhone; naturally I wound up browsing our (new) support site in it, on Safari, and I tapped one of those primary menus. Imagine my surprise when I got a drop-down submenu instead of having Safari follow the primary menu link. I went back to the iPad Mini, made sure it was updated to the same version of iOS, and tried again. And the behavior wasn't the same as on the iPhone. On the iPad Mini, touching the primary menu followed the link. On the iPhone, touching the primary menu dropped down the sub-menu.

(On the iPhone, I can double-tap the primary menu to follow the link.)

What I took away from this experience is that developing for the modern web is stranger and more unpredictable than I can imaging. I would have never guessed that two iOS devices, running the same iOS version and using the same system browser, would have two different behaviors.

(One implication is that testing things on an iPad Mini is not a complete standin for testing things on an iPhone and vice-versa. This is unfortunate; if nothing else, it makes testing more time-consuming.)

web/ModernWebIOSDifference written at 01:03:13; Add Comment


What I think I want out of CPU performance in a new PC

If I'm going to consider building a new home PC (as I sort of am), one of the big questions is what sort of CPU I should build it around. There are a dizzying array of tradeoffs here, and you have to make some of them since features like ECC are only available in some models. In other words, you need to decide what you care about. ECC? Many cores? Advanced virtualization support? Overclocking capabilities? Thermal profile? And so on.

After thinking about it for a while, I think that I have two priorities, and one of them has wound up somewhat surprising me. The unsurprising priority is thermal efficiency and in particular a limit on how much heat I'm willing to have my CPU generate. I want a reasonably quiet PC and that means not generating too much heat, so a midrange TDP is my target (or at least my maximum). Looking at Intel CPUs, I'm probably aiming for about a 65 watt TDP.

The surprising priority for me is single-threaded CPU performance. I've traditionally had the view that CPU performance is not a big issue, especially single-threaded performance, but I've come around to the idea that this is actually wrong. Yes, I do things like compile Firefox with multiple cores, and I do try to use multiprocessing in general, but when it comes right down to it and I'm drumming my fingers waiting for my machine to get around to it, a fairly large amount of what I care about is actually single-threaded and going to stay that way. Firefox or Chrome interpreting the Javascript for a single website? That's single-threaded. Editor recalculations? Generally single-threaded. Video decoding? Single-threaded. And even for my photo processor, I believe that single-threaded CPU performance is reasonably important to make it responsive during editing.

Or the short version: I want to live in a glorious multi-processing world, but I don't think that I do right now. And I don't really think that that's going to fundamentally change over the next, oh, five years. Parallelizing CPU-heavy things is hard and there is only so much parallelism that can be extracted from some tasks; for example, I suspect that JavaScript performance for a single site is probably always going to be heavily biased towards single core performance.

(I do still care about multiple cores and multicore performance, and maybe programs will prove that I'm being too pessimistic here.)

Unfortunately, a focus on single-core performance with a midrange TDP probably means that I don't get nice things like ECC, since my impression is that one must give up CPU performance to get ECC if you want to stay with that midrange TDP.

tech/PCCPUPerformanceWants written at 01:10:57; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.