Wandering Thoughts archives


A belated realization about 'TLS suicide' and user CGIs et al

As part of my general 'web infosec' reading habit, I recently wound up going through Scott Helme's Using security features to do bad things (via). This discusses a number of ways to use HSTS and HPKP for evil, both for sniffing out what sites people have visited and for damaging sites that you've compromised. It's neat work and I like keeping up on this sort of stuff in general, but initially I didn't think it had any particular relevance to us. Then a little light went on in my mind: user CGI scripts can add HTTP headers to their responses, as can the user run web servers we use to solve the multiuser PHP problem.

We have innocently, ignorantly, and accidentally given everyone on our primary web server the ability to inflict a certain amount of what I'll call 'TLS suicide' on us. With no work at all they can use HSTS to force all future access to any part of our web server to be over TLS for some time (which isn't too big a problem, as we're not likely to drop TLS on the server). With work they can probably inflict some degree of HPKP suicide on us, although this mostly isn't something they could do by accident.

(There doesn't even have to be any malign intent, just people's ignorance or default software configurations. I can easily see someone simply following directions on how to increase the security of their site, directions that include 'add HSTS headers', and not realizing that this affects our entire site instead of just their URLs. HPKP would be harder to do this way, but it might be possible; I'm sure there are going to be canned directions on how to set up HPKP for a site that uses Let's Encrypt certificates.)

Fortunately I think the fix is simple; we just need to specifically configure Apache to strip out any (user-added) Strict-Transport-Security, Public-Key-Pins, or Public-Key-Pins-Report-Only headers in responses. The standard Apache mod_headers module will do this for you if configured appropriately, and I think the appropriate configuration is just:

Header unset Strict-Transport-Security
Header unset Public-Key-Pins
Header unset Public-Key-Pins-Report-Only

This prevents accidents, but the bad news is that people can add their own Header directives if you allow FileInfo overrides for .htaccess files. Unfortunately a ton of important Apache options are all under FileInfo; if you turn off allowing FileInfo in .htaccess, you disable things like Redirect and RewriteRule. Not to mention that there are entirely legitimate reasons to add headers in .htaccess files. Based on carefully reading the Apache documentation on configuration sections, I think we can do what we want here by putting these Header directives inside a <Location> directive, because that will be applied last. In order to be sure, I'm going to have to test this (carefully and probably on a test server, just in case).

(I suppose I can test the behavior here using a harmless X-something header instead of one of the dangerous TLS ones.)

web/TLSSuicideAndUserCGIs written at 22:37:42; Add Comment

An interesting case of NFS traffic (probably) holding a ZFS snapshot busy

We have a few filesystems on our fileservers that are considered sufficiently important that we take hourly snapshots during the working day. We use a simple naming and expiry scheme for these snapshots, where they're called <Day>-<Hour> (eg Tue-15) and the script simply deletes any old version before creating the new one. Both because it's the default and because it enables self-serve restores, we NFS-export the ZFS snapshots as well as the main filesystem. Recently that script threw up an error:

cannot destroy snapshot POOL/h/NNN@Mon-16: dataset is busy
cannot create snapshot 'POOL/h/NNN@Mon-16': dataset already exists

We believe that this ultimately happened because an hour or so two beforehand, a runaway IMAP process was traversing its way through that ZFS snapshot via the NFS export. The runaway IMAP process had been terminated well before this, but that might not have mattered enough; an NFS server doesn't know when a NFS client is done with the filehandles it has requested, so the server needs to guess and it may well guess conservatively (saying, for example, 'if I still have them in my server side cache, they're not old enough yet').

This was several weeks ago and the snapshot in question was quietly recycled a week later without any problems, so this did go away after a while. I can't even definitely say that past NFS activity in the snapshot was the problem; we haven't tried to reproduce it, and unfortunately as far as I know OmniOS lacks tools to give us visibility into this sort of thing (fuser reported nothing for the snapshot, for example, which is not surprising; there was no user-level activity on the fileserver that involved the snapshot).

This instance wasn't urgent and went away on its own. I'm not sure what we'd do if these weren't the case, because I don't know if there's any good ways of pushing the kernel to give up things like old(er) NFS filehandles and so on. Shutting down NFS service or rebooting the fileserver would probably do it, but both are rather drastic steps.

(It may be possible to write some DTrace to give us more information about why a dataset is still busy. Or, since DTrace is not always the answer to everything, possibly mdb can give us results too.)

solaris/ZFSSnapshotsNFSBusyProblem written at 00:33:48; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.