Wandering Thoughts archives

2015-01-31

The problem with punishing people over policy violations

Back in my entry on why user-hostile polices are a bad thing I said that I believed threatening to punish people was generally not an effective thing from a business perspective. I owe my readers an explanation for that, because on the surface it seems like an attractive idea.

The problem with punishing people is that practically by definition a meaningful punishment must hurt, and generally it can't hurt retroactively. However, when you hurt people and especially when you hurt people's future with you (through bad performance reviews because of policy violations, docking their future pay, and so on), the people involved may decide to react to the hurt by just quitting and finding another job.

This means that any time you are contemplating punishing someone in a meaningful way, you must ask yourself whether whatever they did is bad enough to risk losing them over it (or bad enough that you should lose them over it). Sometimes the answer will be yes that it was really really bad; sometimes the answer will be yes because they're easy to replace. But if it was not a really bad thing and if they would be disruptive to lose and a pain to replace, well, do you want to run that risk?

Obviously, the worse your punishment is the higher the chance of this happening is. In particular, if your punishment means that they'll wind up noticeably underpaid relative to their counterparts elsewhere (whether through denial of promotion, denial of performance raises, or so on) you'd better hope that they really love working for you.

(You can always hope that they'd have a hard time finding another job (or at least another job that's as attractive as yours even after you punish them) so that they don't really have a choice but sucking it up and taking it. But for high-demand professionals this is probably not very likely. And even if it's the case now you've armed a ticking time bomb; I suspect that you're going to lose them as soon as they can go.)

(This is separate from the additional problems of punishing people at universities, where I was more focused on removal of computer or network access than a larger view of punishments in general.)

PolicyPunishmentProblem written at 23:35:59; Add Comment

Upgrades and support periods

Suppose, hypothetically, that you are a vendor and you want to push people to upgrade more frequently. No problem, you say, you will just reduce the support period for your old releases. This is a magic trick that will surely cause everyone to upgrade at least as fast as you want them to, basically at a pace that you chose, right?

Well, no, obviously not. There are clearly at least two forces operating here. On the one hand you have people's terror of lack of support; this pushes them to upgrade. On the other hand, you have people's 'terror' of the work and risk involved in upgrades; this pushes them to not upgrade. Pushing on ever shortening support from the vendor side can only get you so far because the other force is pushing back against you, and after a certain point people simply don't move any more. Once you've hit that point you can reduce your support period all you want but it won't have any effect.

Generally I think there will be diminishing returns from shorter and shorter support periods as you push more and more people to their limit of terror and they say 'well, to hell with it then'. I also suspect that this neither a linear decay nor a smooth one; there are probably inflection points where a whole lot of people will drop out at once.

Aggressively lowering your support periods will have one effect, though: you can persuade people to totally abandon your system and go find another one that isn't trying to drag them around through terror. This is a win only if you don't want users.

(By the way, the rapidly upgrading products in the real world that do this at scale don't do it by having short support periods.)

UpgradesAndSupport written at 01:31:54; Add Comment

2015-01-28

A thought about social obligations to report bugs

One of the things that people sometimes say is that you have a social obligation to report bugs when you find them. This seems most common in the case of open source software, although I've read about it for eg developers on closed source platforms. Let's set aside all of the possible objections with this for the moment, because I want to point out an important issue here that I feel doesn't get half as much attention as it should.

If users have a social obligation to report bugs, projects have a mirror social obligation to make reporting bugs a pleasant or at least not unpleasant experience.

Put flatly, this is only fair. If you are going to say that people need to go out of their way to do something for you (in the abstract and general sense), I very strongly reject the idea that you get to require them to go through unpleasant things or get abused in the process. If you try to require that, you are drastically enlarging the scope of the social obligation you are trying to drop on people, and this is inequitable. You're burdening people all out of proportion for what they are doing.

As a corollary to this, if you want to maintain that users of any particular project (especially your project) have a social obligation to report bugs to 'pay for' the software, you have the obligation of 'paying for' their bug reports by making that project's bug reporting a pleasant process. If you create or tolerate an unpleasant bug reporting process or environment while putting pressure on people to report bugs, you are what I can only describe as an asshole.

(You're also engaged in something that is both ineffective and alienating, but I'm not talking about practicalities here, I'm talking about what's just. If we're all in this together, being just is for everyone to try to make everyone else's life better. Projects make the life of users better by developing software, users make projects better by doing good bug reports, and projects make the life of users better by making bug reports as pleasant as possible.)

(This is probably one of the cases where either I've convinced you by the end of the thesis or you're never going to be convinced, but sometimes I write words anyways.)

BugReportExperienceObligation written at 01:58:20; Add Comment

2015-01-19

Why user-hostile policies are a bad thing and a mistake

One reasonable reaction to limited email retention policies being user-hostile is to say basically 'so what'. It's not really nice that policies make work for users, but sometimes that's just life; people will cope. I feel that this view is a mistake.

The problem with user-hostile policies is that users will circumvent them. Generously, let's assume that you enacted this policy to achieve some goal (not just to say that you have a policy and perhaps point to a technical implementation as proof of it). What you really want is not for the policy to be adhered to but to achieve your goal; the policy is just a tool in getting to the goal. If you enact a policy and then your users do things that defeat the goals of the policy, you have not actually achieved your overall goal. Instead you've made work, created resentment, and may have deluded yourself into thinking that your goal has actually been achieved because after all the policy has been applied.

(Clearly you won't have inconvenient old emails turn up because you're deleting all email after sixty days, right?)

In extreme cases, a user-hostile policy can actually move you further away from your goal. If your goal is 'minimal email retention', a policy that winds up causing users to automatically archive all emails locally because that's the most convenient way to handle things is actually moving you backwards. You were probably better off letting people keep as much email on the server as they wanted, because at least they were likely to delete some of it.

By the way, I happen to think that threatening punishment to people who take actions that go against the spirit or even the letter of your policy is generally not an effective thing from a business perspective in most environments, but that's another entry.

(As for policies for the sake of having policies, well, I would be really dubious of the idea that saying 'we have an email deletion policy so there's only X days of email on the mail server' will do you much good against either attackers or legal requests. To put it one way, do you think the police would accept that answer if they thought you had incriminating email and might have saved it somewhere?)

UserHostilePolicyWhyBad written at 00:22:52; Add Comment

2015-01-09

Why filesystems need to be where data is checksummed

Allegedly (and I say this because I have not looked for primary sources) some existing Linux filesystems are adding metadata checksums and then excusing their lack of data checksums by saying that if applications care about data integrity the application will do the checksumming itself. Having metadata checksums is better than having nothing and adding data checksums to existing filesystems is likely difficult, but this does not excuse their views about who should do what with checksums.

There are at least two reasons why filesystems should do data checksums. The first is that data checksums exist not merely to tell applications (and ultimately the user) when data becomes corrupt, but also to do extremely important things like telling which side of a RAID mirror is the correct side. Applications definitely do not have access to low-level details of things like RAID data, but the filesystem is at least in the right general area to be asking the RAID system 'do you happen to have any other copies of this logical block?' or the like.

The second reason is that a great many programs would never be rewritten to verify checksums. Not only would this require a massive amount of coding, it would require a central standard so that applications can interoperate in generating and checking these checksums, finding them, and so on and so forth. On Unix, for example, this would need support not just from applications like Firefox, OpenOffice, and Apache but also common programs like grep, awk, perl, and gcc. The net result would be that a great deal of file IO on Unix would not be protected by checksums.

(Let's skip lightly over any desire to verify that executables and shared libraries are intact before you start executing code from them, because you just can't do that without the kernel being very closely involved.)

When you are looking at a core service that should touch absolutely everything that does some common set of operations, the right place to put this service is in a central place so that it's implemented once and then used by everyone. The central place here is the kernel (where all IO passes through one spot), which in practice means in the filesystem.

(Perhaps this is already obvious to everyone; I'd certainly like to think that it is. But if there are filesystem developers out there who are seriously saying that data checksums are the job of applications instead of the filesystem, well, I don't know what to say. Note that I consider 'sorry, we can't feasibly add data checksums to our existing filesystem' to be a perfectly good reason for not doing so.)

FilesystemDataChecksumsWhy written at 03:55:52; Add Comment

2015-01-06

Choices filesystems make about checksums

If you are designing integrity checksums into a new filesystem or trying to adding them to an existing one, there are some broad choices you have to make about them. These choices will determine both how easy it is to add checksums (especially to existing filesystems) and also how much good your checksums do. Unfortunately these two things pull in the opposite direction from each other.

Two big choices are: do you have checksums for just filesystem metadata or both data and metadata, and are your checksums 'internal' (stored with the object that they are a checksum of) or 'external' (stored not with the object but with references to it). I suppose you can also do checksums of just data and not metadata, but I don't think anyone does that yet (partly because in most filesystems the metadata is data too, as it has things like names and access permissions that your raw bits make much less sense without).

The best option is to checksum everything and to use external checksums. The appeal of checksumming everything is hopefully obvious. The advantage of external checksums is that they tell you more than internal checksums do. Internal checksums cover 'this object has been corrupted after being written' while external checksums also cover 'this is the wrong object', ie they let you check and verify the structure of your filesystem. With internal checksums you know that you are looking at, say, an intact directory, but you don't know if it's actually the directory you think you're looking at.

On the other hand, the easiest option to add to an existing filesystem is internal checksums of metadata only. To do this all you need to do is either find or claim some unused space for a single checksum in existing metadata structures like directory disk blocks or just add a checksum on the end of them as a new revision, which you can sometimes arrange so that almost no existing code cares and no existing on-disk data is invalidated. Doing only metadata is simpler because internal checksums present a problem for on-disk data, as there simply isn't any spare room in existing data blocks; they're all full of, well, user file data. In general adding internal checksums to data blocks means that, say, 4K of user data may no longer fit in a single on disk data block, which in practice will perturb a lot of assumptions made by user code.

(Almost all user code assumes that writing data in some power of two size is basically optimal and as a result does it all over the place. There are all sorts of bad things that happen if this is not the case.)

There are two problems with external checksums that give you big heartburn if you try to add them to existing filesystems. The first is that you have to store a lot more checksums. As an example, consider a disk block of directory entries, part of a whole directory. With internal checksums this disk block needs a single checksum for itself, while with external checksums it needs one checksum per directory entry it contains (to let you validate that the inode the directory entry is pointing to is the file you think it is).

(Another way to put this is that any time a chunk of metadata points to multiple sub-objects, external checksums require you to find room for one checksum per sub-object while internal checksums just require you to find room for one, for the chunk of metadata itself. It's extremely common for a single chunk of metadata to point to multiple sub-objects because this is an efficient use of space; directory blocks contain multiple directory entries per block, files have indirect blocks that point to multiple data blocks et al, and so on.)

The second is that you are going to have to update more checksums when things change. With external checksums, any time an object changes all references to it need to have their checksums updated to its new value, and then all references to the references probably need their checksums updated in turn, and so on until you get to the top of the tree. External checksums are a natural fit for copy on write filesystems (which are already changing all references up the tree) and probably a terrible fit for any filesystem that does in-place updates. And unfortunately (for checksums) most common filesystems today do in-place updates for various reasons.

PS: the upshot of this is that on the one hand I sympathize a fair bit with filesystems like ext4 and XFS that are apparently adding metadata checksums (that sound like they're internal ones) because they have a really hard job and it's better than nothing, but on the other hand I still want more.

FilesystemChecksumOptions written at 01:01:45; Add Comment

2015-01-04

What makes a 'next generation' or 'advanced' modern filesystem, for me

Filesystems have been evolving in fits and starts for roughly as long as there have been filesystems, and I doubt that is going to stop any time soon. These days there are a number of directions that filesystems seem to be moving in, but I've come around to the view that one of them is of particular importance and is the defining characteristic of what I wind up calling 'modern', 'advanced', or 'next generation' filesystems.

By now, current filesystems have mostly solved the twin problems of performance and resilience in the face of crashes (although performance may need some re-solving in the face of SSDs, which change various calculations). Future filesystems will likely make incremental improvements, but I can't currently imagine anything drastically different.

Instead, the next generation frontier is in resilience to disk problems and improved recovery from them. At the heart of this is two things. First, a steadily increased awareness that when you write something to disk (either HD or SSD), you are not absolutely guaranteed to either get it back intact or get an error. Oh, the disk drive and everything involved will try hard, but there are a lot of things that can go wrong and especially over long amounts of time. Second, that the rate at which these problems happen has not really been going down over time. Instead they've actually been going up, because the most common models are based on a chance of error per so much data and the amount of data we store and use has kept going up and up.

The pragmatic result is that an increasing amount of people are starting to worry about quiet data loss, feel that the possibility of it goes up over time, and want to have some way to deal with it and fix things. It doesn't help that we're collectively storing more and more important things on disks (hopefully with backups, yes yes) instead of other media.

The dominant form that meeting this need is taking right now is checksums of everything on disk and filesystems that are aware of what's really happening in volume management. The former creates resilience (at least you can notice that something has gone wrong) and the latter aids recovery from it (since disk redundancy is one source of intact copies of the corrupted data, and a good idea anyways since whole disks can die).

(In this entry I'm talking only about local filesystems. There is a whole different evolutionary process going on in multi-node filesystems and multi-node object stores (that may or may not have a more or less POSIX filesystem layer on top of). And I'm not even going to think about various sorts of distributed databases that hold increasingly large amounts of data for large operations.)

PS: Part of my bias here is that resilience is what I've come to personally care about. One reason for this is that other filesystem attributes are pragmatically good enough and not subject to obvious inefficiencies and marvelous improvements (except for performance through SSDs), and another reason is that storage is now big enough and cheap enough that it's perfectly reasonable to store extra data (sometimes a lot of extra data, eg disk mirrors) to help insure that you can get your files back later.

NextGenerationFilesystem written at 02:35:15; Add Comment

2014-12-28

How I think DNSSec will have to be used in the real world

Back when I started reading about DNSSec, it seemed to be popular to assume that how DNSSec would work for clients is that if a particular DNS lookup failed DNSSec checks, the DNS resolver would tell you that the name couldn't be resolved. In light of my personal DNSSec experience and the general security false positives problem, I no longer accept this as workable approach.

The reality of the world is that there are almost certainly going to be two common reasons for DNSSec failures, namely DNSSec screwups by the origin domain and mandatory interposed DNS resolvers that tinker with the answers people get back. Neither are an attack as such, at least as users will accept, and so real users will not find failing to return DNS results in either situation to be acceptable.

Instead, I think that DNSSec results will have to be used as a reputation signal; good DNSSec results are best, while bad DNSSec results are a bit dubious. Many and perhaps most applications will ignore these reputation signals and continue to accept even bad DNSSec results. Some applications will use the DNSSec trust signal as one of a number of heuristic inputs; a browser might shift to considering such resources as less trustworthy, for example. Only a few extremely high security applications will refuse to go on entirely if the DNSSec trust results come back bad (and browsers are not one of them as they will be usually configured).

(Possibly this is already obvious to people who deal with DNSSec on a regular basis. On the other hand, it doesn't seem to be how Fedora 20's copy of Unbound comes configured out of the box.)

I'm sure that this sloppy approach will enrage a certain number of believers in secure DNS and DNSSec. They will no doubt tell us that the DNS lookup failures are a cost worth paying for secure DNS and that anyways, it's the fault of the people making configuration mistakes and so on, and then grind their teeth at my unwillingness to go along with this. I do not have a pleasant, soft way to put this, so I will just say it straight out: these people are wrong, as usual.

Sidebar: the case of intercepting DNS servers

At one level, a nosy ISP or wireless network that forces all of your DNS lookups to go through its DNS servers and then mangles the results is definitely an attacker. Preventing this sort of mass interception is likely one reason DNSSec exists, just like it's one reason HTTPS exists. However, from a pragmatic perspective it is not what most people will consider an attack; to put it bluntly, most people will want their DNS lookups to still work even in the face of their ISP or their wireless network messing with them a bit, because people would rather accept the messing than not have Internet access at all.

(If none of your DNS lookups work, you don't really have Internet access. Or at least most people don't.)

DNSSecRealWorldUsage written at 03:36:37; Add Comment

2014-12-25

DNSSec in the real world: my experience with DNSSec

In the abstract, I like the idea of secure DNS, because really who wouldn't. I've read enough criticism of DNSSec the protocol to think that it's not great and maybe should be replaced by something more less ungainly, and I've been convinced that it is not the right way to get TLS certificate information, but those are relatively moderate issues (from some jaundiced perspectives).

That's in theory. In practice, things are rather different. In practice the only thing DNSSec has ever done for me is prevent me from seeing websites, generally US government websites (I seem to especially trip over the NIH and NASA's APOD). My normal caching resolver is Unbound, which does some amount of DNSSec checking and enforcing. When I set it up initially, these checks kept keeping me from resolving IP addresses so I kept turning them down and turning them down, to the point where I've now done my best to turn them off entirely. But apparently my best isn't good enough and so periodically Unbound refuses to resolve an IP address, I kill it and start my old dnscache instance, and the IP address immediately resolves.

At one level this is not particularly surprising. DNSSec creates a whole new collection of ways to have DNS resolution screw up, either because people fumble their DNSSec implementation (even temporarily) or because your local resolver can't get a DNSSec reply that it requires to be secure. It's not surprising that this happens every so often.

At another level this is utterly fatal to DNSSec, because of the security false positives problem. For many people, actual DNS interception is vanishingly rare and they perform a very large number of DNS lookups. If the actual signal is very rare, even a very low noise rate will totally swamp it. In other words, every or almost every DNSSec failure people get will be a false positive, and people simply will not tolerate this. As has happened with me, DNSSec will become one of those things that you turn off because it's stopping you from doing things on the Internet that you want to do.

(And in turn this means that vendors cannot make DNSSec something that end users can't turn off. Forcing something that in practice screws your users is an extremely bad idea and it gets you extremely bad reactions.)

MyDNSSecExperience written at 02:02:54; Add Comment

2014-12-20

Unsurprisingly, laptops make bad to terrible desktops

In response to my entry on the security problem for public clients, Jeff Kaufman suggested laptops as an option on the grounds that they already integrate everything into one physical unit. Unfortunately, I don't think this is workable. The core problem is that laptops make terrible desktops, especially in a setting with relatively untrusted access to them. This shouldn't surprise anyone, since laptops aren't designed to be desktops.

A typical university desktop is cheap, has a relatively large screen (17" is the minimum entry point these days and it often goes larger), is turned on essentially all the time, and must be physically secured in place. It should not have any easily detached fiddly bits that can be removed and lost, because sooner or later they will be. Ideally it should be possible to set it up in a relatively ergonomic way. Some desktops need more than basic computing power; for example, the desktops of a computing lab are often reasonably capable (because the practical alternative is buying a few very capable servers and those are often really expensive). Partly because of this it's an advantage if things are at least somewhat modular.

None of these are attributes of laptops, especially in combination (for example, there are cheap laptops but they're cheap partly because they have really small screens). Your typical relatively inexpensive laptop is relatively slow, has a small screen, has historically often not been designed to run anywhere near all the time, is entirely non-modular, has external fiddly bits like power adapters, is not really particularly ergonomic, and is often hard to secure to a table. None of this is surprising because this is all part of the laptop tradeoff; you're getting convenient, lightweight portability for periodic roaming use and giving up a bunch of other stuff for it. You can use a laptop as a desktop and many people do, but it doesn't make laptops ideal for it.

(One sign that laptops are nowhere near ideal desktops is all of the aftermarket products designed to make them work better at that job, starting with laptop stands.)

If you don't care very much about offering a decently good environment, this is actually okay. A bunch of cheap laptops with cable locks in an attended environment suffice to let people do quick things like check their email on the fly, and they might be overall cheaper than custom sourced kiosk machines in enclosures with decent sized screens and so on. And this setup certainly encourages people not to linger. But if you want to offer an attractive environment I don't think that doing this with laptops is viable, especially if you have to worry about people walking off with them. At least for university provided public clients, I think it's desktops or bust.

(Whether or not universities should still try to offer client computing to people is another can of worms entirely that calls for another entry.)

(I'm not exposed enough to modern laptops to know how happy they are about being on and running for hours and hours on end. My relatively old laptop spins up its fans if left sitting powered on for very long and I'm not certain I'd want to let it sit that way for days or weeks on end; if nothing else that might burn out the fans in much shorter order. And they're not exactly really quiet.)

LaptopsBadDesktops written at 00:26:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.