Wandering Thoughts archives

2014-04-07

Giving in: pragmatic If-Modified-Since handling for Tiny Tiny RSS

I wrote yesterday about how Tiny Tiny RSS drastically mishandles generating If-Modified-Since headers for conditional GETs, but I didn't say anything about what my response to it is. DWiki insists on strict equality checking between If-Modified-Since and the Last-Modified timestamp (for good reasons), so Tiny Tiny RSS was basically doing unconditional GETs all the time.

I could have left the situation like that, and I actually considered it. Given the conditional GET irony I was never saving any CPU time on successful conditional GETs, only bandwidth, and I'm not particularly bandwidth constrained (either here or potentially elsewhere; 'small' bandwidth allocations on VPSes seem to be in the multiple TBs a month range by now). On the other hand, these requests were using up quite a lot of bandwidth because my feeds are big and Tiny Tiny RSS is quite popular, and that unnecessary bandwidth usage irritated me.

(Most of the bandwidth that Wandering Thoughts normally uses is in feed requests, eg today 87% of the bandwidth was for feeds.)

So I decided to give in and be pragmatic. Tiny Tiny RSS expects you to be doing timestamp comparisons for If-Modified-Since, so I added a very special hack that does just that if and only if the user agent claims to be some version of Tiny Tiny RSS (and various other conditions apply, such as no If-Not-Modified header being supplied). Looking at my logs this appears to have roughly halved the bandwidth usage for serving feeds, so I'm calling it worth it at least for now.

I don't like putting hacks like this into my code (and it doesn't fully solve Tiny Tiny RSS's problems with over-fetching feeds either), but I'm probably going to keep it. The modern web is a world full of pragmatic tradeoffs and is notably lacking in high-minded purity of implementation.

web/MyIfModifiedSinceHack written at 01:06:39; Add Comment

2014-04-06

How not to generate If-Modified-Since headers for conditional GETs

Recently I looked through my syndication feed stats (as I periodically do) and noticed that the Tiny Tiny RSS program was both responsible for quite a lot of feed fetching and also didn't seem to ever be successfully doing conditional GETs. Most things in this situation aren't even attempting conditional GETs, but investigation showed that Tiny Tiny RSS was consistently sending a If-Modified-Since header with times that were generally just a bit after the actual Last-Modified timestamp of the syndication feed. For good reasons I require strict equality of If-Modified-Since values, so this insured that Tiny Tiny RSS never made a successful conditional GET.

Since I was curious, I got a copy of the current Tiny Tiny RSS code and dug into it to see where this weird If-Modified-Since value was coming from and if there was anything I could do about it. The answer was worse than I was expecting; it turns out that the I-M-S timestamp that Tiny Tiny RSS sends has absolutely nothing to do with the Last-Modified value that I sent it. Where it comes from is that whenever Tiny Tiny RSS adds a new entry from a feed to its database it records the (local) time at which it did this, then the most recent such entry timestamp becomes the If-Modified-Since value that Tiny Tiny RSS sends during feed requests.

(You can see this in update_rss_feed in include/rssfuncs.php in the TT RSS source. Technically the time recorded for new entries is when TT RSS started processing the updated feed, not the moment it added the database record for a new entry.)

This is an absolutely terrible scheme, almost as bad as simply generating random timestamps. There are a cascade of things that can go wrong with it:

  • It implicitly assumes that the clocks on the server and the client are in sync, since If-Modified-Since must be in the server's time yet the timestamp is generated from client time.

  • Tiny Tiny RSS loses if a feed publishes a new entry, TT RSS pulls the feed, and then the feed publishes a second entry before TT RSS finishes processing the first new entry. TT RSS's 'entry added' timestamp and thus the If-Modified-Since timestamp will be after the revised feed's date, so the server will 304 further requests. TT RSS will only pick up the second entry when a third entry is published or the feed is otherwise modified so that its Last-Modified date moves forward enough.

  • If the feed deletes or modifies an entry and properly updates its overall Last-Modified timestamp as a result of this, Tiny Tiny RSS will issue what are effectively unconditional GETs until the feed publishes a completely new entry (since the last time that TT RSS saw a new entry will be before the feed's new Last-Modified time).

There are probably other flaws that I'm not thinking of.

(I don't think it's a specification violation to send an If-Modified-Since header if you never got a Last-Modified header, but if it is that's another flaw in this scheme, since Tiny Tiny RSS will totally do that.)

This scheme's sole virtue is that on a server which uses timestamp comparisons for If-Modified-Since (instead of equality checks) it will sometimes succeed in getting 304 Not Modified responses. Some of these responses will even be correct and when they aren't really correct, it's not the server's fault.

web/IfModifiedSinceHowNot written at 02:19:46; Add Comment

2014-04-05

An important additional step when shifting software RAID mirrors around

After going through all of the steps from yesterday's entry to move my mirrors from one disk to another, I inadvertently discovered a vital additional step you need to take here. The additional step is:

  • After you've taken the old disk out of the mirror and shrunk the mirror (steps 4 and 5), either destroy the old disk's RAID superblock or physically remove the disk from your system. I believe that RAID superblocks can be destroyed with the following (where /dev/sdb7 is the old disk):
    mdadm --zero-superblock /dev/sdb7

Failure to do this may cause your system to malfunction either subtly or spectacularly on boot (malfunctioning spectacularly is best because that insures you notice it). The culprit here is the how a modern Linux system assembles RAID arrays on boot. Put simply, there is nothing that forces all of your RAID arrays to be assembled using your current mirrors instead of the obsolete mirrors on your old disk. Instead it seems to come down to which device is processed first. If a partition on your old disk is processed first, it wins the race and becomes the sole member of the RAID array (which may then fail to activate because it doesn't have the full device set). If you're lucky your system now refuses to boot; if you're unlucky, your system boots but with obsolete and unmirrored filesystems and anything important written to them will cause you a great deal of heartburn as you try to sort out the resulting mess.

(Linux software RAID appears to be at least smart enough to know that your two current mirror devices and the old disk are not compatible and so doesn't glue them all together. I don't know what GRUB's software RAID code does here if your boot partition is on a software RAID mirror that has had this happen to it.)

This points out core architectural flaws in both the asynchronous assembly process and the approach of removing obsolete devices by failing them first. If mdadm had a 'remove active device' operation, it could at least somehow mark the removed device's superblock as 'do not use to auto-assemble array, this device has been explicitly removed'. If the assembly process was not asynchronous the way it is, it could see that some mirror devices were more recent than others and prefer them. But sadly, well, no.

(In theory a not yet activated software RAID array could be revised to kick out the out of date device and replace it with the newer device (although there are policy issues involved). This can't be done at all once the array has been activated, or rather while the array is active.)

linux/SoftwareRaidShiftingMirrorII written at 02:05:37; Add Comment

2014-04-03

Shifting a software RAID mirror from disk to disk in modern Linux

Suppose that you have a software RAID mirror and you want to migrate one side of the mirror from one disk to another to replace the old disk. The straightforward way is to remove the old disk, put in the new disk, and resync the mirror. However this leaves you without a mirror at all for the duration of the resync so if you can get all three disks online at once what you'd like to do is add the new disk as a third mirror and then remove the old disk later. Modern Linux makes this a little bit complicated.

The core complication is that your software RAID devices know how many active mirrors they are supposed to have. If you add a device beyond that, it becomes a hot spare instead of being an active mirror. To activate it as a mirror you must add it then grow the number of active devices in the mirror. Then to properly deactivate the old disk you need to do the reverse.

Here are the actual commands (for my future use if nothing else):

  1. Hot-add the new device:
    mdadm -a /dev/md17 /dev/sdd7

    If you look at /proc/mdstat afterwards you'll see it marked as a spare.

  2. 'Grow' the number of active devices in the mirror:
    mdadm -G -n 3 /dev/md17

  3. Wait for the mirror to resync. You may want to run the new disk in parallel with the old disk for a few days to make sure that all is well with it; this is fine. You may want to be wary about reboots during this time.

  4. Take the old disk out by first manually failing it and then actually removing it:
    mdadm --fail /dev/md17 /dev/sdb7
    mdadm -r /dev/md17 /dev/sdb7

  5. Finally, shrink the number of active devices in the mirror down to two again:
    mdadm -G -n 2 /dev/md17

You really do want to explicitly shrink the number of active devices in the mirror. A mismatch between the number of actual devices and the number of expected devices can have various undesirable consequences. If a significant amount of time happened between step three and four, make sure that your mdadm.conf still has the correct number of devices configured in it for all of the arrays (ie, two).

Unfortunately marking the old disk as failed will likely get you warning email from mdadm's status monitoring about a failed device. This is the drawback of mdadm not having a way to directly do 'remove an active device' as a single action. I can understand why mdadm doesn't have an operation for this, but it's still a bit annoying.

(Looking at this old entry makes it clear that I've run into the need to grow and shrink the number of active mirror devices before, but apparently I didn't consider it noteworthy at that point.)

linux/SoftwareRaidShiftingMirror written at 19:51:05; Add Comment

The scariness of uncertainty

One of the issues that I'm facing right now (and have been for a while) is that being uncertain can be a daunting thing. As sysadmins we deal with uncertainty all of the time, of course, and if we were paralyzed by it in general we'd never get anywhere. It's usually easy enough to overcome uncertainty and move forward in small situations or important situations (for various reasons). Where uncertainty can dig in is in dauntingly big and complex projects that are not essential. If you don't have to have whatever and building anything is clearly a lot of work for an uncertain reward, it's very easy to defer and defer action in favour of various stalling measures (or other work).

All of this sounds rather hand waving, so let me tell you about my project with gathering OS level performance statistics. Or rather my non-project.

If you look around, there are a lot of options for gathering, aggregating, and graphing OS performance stats (in tools, full systems, and ecologies of tools). Beyond a certain basic level it's unclear which ones of them are going to work best for us and which ones will be crawling failures, but at the same time it's also clear that any of them that look good are going to take a significant amount of work and time to set up and try out (and I'm going to have to try them in production).

As a result I have been circling around this project for literally years now. Every so often I poke and prod at the issue; I read more about some tool or another, I look at pretty pictures, I hear about something new, and so on and so forth. But I've never sat down to really do something. I've always found higher priority things to do or other excuses.

(Here in the academy this behavior in graduate students is well known and gets called 'thesis avoidance'.)

The scariness of uncertainty is not the only reason for this, of course, but it's a significant contributing factor. In a way it raises the stakes for making a choice.

(The uncertainty comes from two directions. One is simply trying to select which system to use; the other is whether not the whole idea is going to be worthwhile. The latter is a bit stupid since we're probably not going to be left with a white elephant of a system that we ignore and then quietly abandon, but the possibility gnaws at me and feeds other uncertainties and doubts.)

I don't have any answers, but maybe writing this entry has made it more likely that I do something here. And maybe I should embrace the possibility of failure as a sign that I am finally taking enough risk.

(I feel divided about that idea but I need to think about it more and then write another entry on it.)

sysadmin/UncertaintyScariness written at 00:34:47; Add Comment

2014-04-02

I'm angry that ZFS still doesn't have an API

Yesterday I wrote a calm rational explanation for why I'm not building tools around 'zpool status' any more and said that it ended up being only half of the story. The other half is that I am genuinely angry that ZFS still does not have any semblance of an API, so angry that I've decided to stop cooperating with ZFS's non-API and make my own.

(It's not the hot anger of swearing, it's the slow anger of a blister that keeps reminding you about its existence with every step you take.)

For at least the past six years it has been blindingly obvious that ZFS should have an API so that people could build additional tools and solutions on top of it. For all that is sane, stock ZFS doesn't even have an alerting solution for pool problems. You can't miss that unless you're blind and say whatever you want about the ZFS developers, I'm sure that they're not blind. I am and have been completely agnostic about the exact format that this API could have taken, so long as it existed. Stable, documented, script-friendly output from ZFS tools? A documented C level library API? XML information dumps because everyone loves XML? A web API? Whatever. I could have worked with any of them.

Instead we got nothing. We got nothing when ZFS was with Sun and despite some vague signs of care we continue to get exactly nothing now that ZFS is effectively with Illumos (and I'm pretty sure that Oracle hasn't fixed the situation either). At this point it is clear that the ZFS developers have different priorities and in an objective sense do not care about this issue.

(Regardless of what you say, what you actually care about is shown by what you work on.)

This situation has thoroughly gotten under my skin now that moving to OmniOS is rubbing my nose in it again. So now I'm through with tacitly cooperating with it by trying to wrestle and wrangle the ZFS commands to do what I want. Instead I feel like giving 'zpool status' and its friends a great big middle finger and then throwing them down a well. The only thing I want to use them for now is as a relatively authoritative source of truth if I suspect that something is wrong with what my own tools are showing me.

(I call zpool status et al 'relatively authoritative' because it and other similar commands leave things out and otherwise mangle what you are seeing, sometimes in ways that cause real problems.)

I will skip theories about why the ZFS developers did not develop an API (either in Sun or later), partly because I am in a bad mood after writing this and so am inclined to be extremely cynical.

solaris/ZFSNoAPIAnger written at 00:12:03; Add Comment

2014-03-31

I'm done with building tools around 'zpool status' output

Back when our fileserver environment was young, I built a number of local tools and scripts that relied on 'zpool status' to get information about pools, pool states, and so on. The problem with using 'zpool status' is of course that it is not an API, it's something intended for presentation to users, and so as a result people feel free to change its output from time to time. At the time using zpool's output seemed like the best option despite this, or more exactly the best (or easiest) of a bad lot of options.

Well, I'm done with that.

We're in the process of migrating to OmniOS. As I've had to touch scripts and programs to update them for OmniOS's changes in the output of 'zpool status', I've instead been migrating them away from using zpool at all in favour of having them rely on a local ZFS status reporting tool. This migration isn't complete (some tools haven't needed changes yet and I'm letting them be), but it's already simplified my life in various ways.

One of those ways is that now we control the tools. We can guarantee stable output and we can make them output exactly what we want. We can even make them output the same thing on both our current Solaris machines and our new OmniOS machines so that higher level tooling is insulated from what OS version it's running on. This is very handy and not something that would be easy to do with 'zpool status'.

The other, more subtle way that this makes my life better is that I now have much more confidence that things are not going to subtly break on me. One problem with using zpool's output is that all sorts of things can change about it and things that use it may not notice, especially if the output starts omitting things to, for example, 'simplify' the default output. Since our tools are abusing private APIs they may well break (and may well break more than zpool's output), but when they break we can make sure that it's a loud break. The result is much more binary; if our tools work at all they're almost certainly accurate. A script's interpretation of zpool's output is not necessarily so.

(Omitting things by default is not theoretical. In between S10U8 and OmniOS, 'zfs list' went from including snapshots by default to excluding them by default. This broke some of our code that was parsing 'zfs list' output to identify snapshots, and in a subtle way; the code just thought there weren't any when there were. This is of course a completely fair change, since 'zfs list' is not an API and this probably makes things better for ordinary users.)

I accept that rolling our own tools has some additional costs and has some risks. But I'd rather own those costs and those risks explicitly rather than have similar ones arise implicitly because I'm relying on a necessarily imperfect understanding of zpool's output.

Actually, writing this entry has made me realized that it's only half of the story. The other half is going to take another entry.

solaris/ZFSNoMoreZpoolStatus written at 23:22:29; Add Comment

Why I sometimes reject patches for my own software

I recently read Drew Crawford's Conduct unbecoming of a hacker (via), which argues that you should basically always accept other people's patches for your software unless they are clearly broken. Lest we bristle at this, he gives the example of Firefox and illustrates how many patches it accepts. On the whole I sympathize with this view, and I've even had some pragmatic experience with it; a patch to mxiostat that I wasn't very enthusiastic about initially has actually become something I use routinely. But despite this there are certain sorts of patches I will reject basically out of hand. Put simply they're patches that I think will make the program worse for me, no matter how much they might help the author of the patch (or other people).

This is selfish behavior on my part, but so far all of my public software is things that I'm ultimately developing for myself first. It's nice if other people use my programs too but I don't expect any of them to get popular enough that other people's usage is going to be my major motivation for maintaining and developing them. So my priorities come first and the furthest I'm willing to go is that I'll accept patches that don't get in the way of my usage.

(Drew Crawford's article has sort of convinced me that I should be more liberal about accepting patches in general; he makes a convincing case for 'accept now, bikeshed later'. So far this is mostly a theoretical issue for my stuff.)

By the way, this would obviously be different if I was developing things with the explicit goal of having them used by other people. In that case I should (and hopefully would) suck it up and put the patch in unless I had strong indications that it would make the program worse for a bunch of people instead of just me. Maybe someday I'll write something like that, but so far it's not the case.

programming/WhyIRejectPatches written at 02:09:04; Add Comment

2014-03-30

One of my worries: our spam filtering in the future

I've mentioned in the past that we rely on a commercial anti-spam system for our spam filtering. What I haven't mentioned is that it isn't supported on and doesn't run on any version of Ubuntu after Ubuntu 10.04 LTS. 10.04 is now rather long in the tooth and with the impending release of Ubuntu 14.04 it will fall out of support in a bit over a year. This doesn't leave us completely up the creek, as the vendor supports Red Hat Enterprise 6, but it does raise a concern: is the vendor still actually interested in this product?

(It's not as if the vendor is deliberately ignoring Ubuntu; the most recent Linux distribution that the vendor supports was released in 2011 (and that's Debian 6).)

Since I do have this concern, every so often I get to worry about how we'd replace this commercial package (either because of the vendor effectively dropping it or because of licensing problems, which have been known to happen). Right now the commercial system has three great virtues: it works quite well, it doesn't require any administration, and it's basically a black box. I suppose that it doesn't really cost us any money is a fourth virtue.

(The university has a site license, the costs for which are covered by the central mail system.)

There are probably other commercial options, but I don't know how much they'd cost or how well they work, and the thought of trying to evaluate the alternatives fills me with dread. I know that there are free alternatives (for both anti-spam and anti-virus stuff) but I suspect that they are not hands free and automatically maintained black boxes and I don't know how well they work. Evaluating the free options would be somewhat less of a hassle than evaluating commercial options (with free options there is no wrestling with vendors) but it wouldn't be a picnic either.

One part of me thinks that I should spend some time on keeping current with at least the free options for anti-spam filtering, just so I can be prepared if the worst happens. Another part of me thinks that that's a lot of work with no immediate payoff (in fact that doing the work now is probably a complete waste of time) and that I should defer it until we know we need a different anti-spam system, if ever.

I don't have any answers right now, just worries. So there you go.

spam/FutureSpamFilteringWorry written at 02:26:21; Add Comment

2014-03-28

Recovering from a drive failure on Fedora 20 with LVM on software RAID

My office workstation runs on two mirrored disks. For various reasons the mirroring is split; the root filesystem, swap, and /boot are directly on software RAID while things like my home directory filesystem are on LVM on top of software RAID. Today I had one of those two disks fail when I rebooted after applying a kernel upgrade; much to my surprise this caused the entire boot process to fail.

The direct cause of the boot failure was that none of the LVM-based filesystems could be mounted. At first I thought that this was just because LVM hadn't activated, so I tried things like pvscan; much to my surprise and alarm this reported that there were no physical volumes visible at all. Eventually I noticed that the software RAID array that LVM sits on top of being reported as inactive instead of active and that I couldn't read from the /dev entry for it.

The direct fix was to run 'mdadm --run /dev/md17'. This activated the array (and then udev activated LVM and systemd noticed that devices were available for the missing filesystems and mounted them). This was only necessary once; after a reboot (with the failed disk still missing) the array came up fine. I was led to this by the description of --run in the mdadm manpage:

Attempt to start the array even if fewer drives were given than were present last time the array was active. Normally if not all the expected drives are found and --scan is not used, then the array will be assembled but not started. With --run an attempt will be made to start it anyway.

In theory this matched the situation; the last time the array was active it had two drives and now it only had one. The mystery here is that the exact same thing was true for the other mirrors (for /, swap, and /boot) and yet they were activated anyways despite the missing drive.

My only theory for what happened is that something exists that forces activation of mirrors that are seen as necessary for filesystems but doesn't force activation of other mirrors. This something is clearly magical and hidden and of course not working properly. Perhaps this magic lives in mount (or the internal systemd equivalent); perhaps it lives in systemd itself. It's pretty much impossible for me to tell.

(Of course since I have no idea what component is responsible I have no particularly good way to report this bug to Fedora. What am I supposed to report it against?)

(I'm writing this down partly because this may sometime happen to my home system (since it has roughly the same configuration) and if I didn't document my fix and had to reinvent it I would be very angry at myself.)

linux/Fedora20LVMDriveRecovery written at 18:05:00; Add Comment

These are my WanderingThoughts
(About the blog)

Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

This is a DWiki.
GettingAround
(Help)

Search:

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.