2015-05-25
One of the problems with 'you should submit a patch'
Today I reported a relatively small issue in the development version of ZFS on Linux. In theory the way of open source development is that I should submit a patch with my problem report, since this is a small and easily fixed issue, and I suspect that a certain number of the usual suspects would say that I'm letting down up my end of the open source social compact by not doing this (even though the ZoL developers did not ask me for this). Well, there's a problem with this cheerful view of how easy it is to make patches:
It's only easy to make half-assed partially tested patches. Making well-tested good ones is generally hard.
In theory this issue and the fix is really simple. In practice there are a bunch of things that I don't know for sure and that I should test. Here's two examples that I should do in a 'good' patch submission:
- I should build the package from scratch and verify that it installs
and works on a clean system. My own ZFS on Linux machine is not
such a clean system so I'd need to spin up a test virtual machine.
- I should test that my understanding of what happens when an
systemd.service
ExecStartPrecommand fails is correct. I think I've correctly understood the documentation, but 'I think' is not 'I know'; instead it's superstition.
Making a patch that should work and looks good and maybe boots on my machine is about ten minutes work (ignoring the need to reboot my machine). Making a good patch, one that is not potentially part of a lurching drunkard's walk in the vague direction of a solution, is a lot more work.
(This is not particularly surprising, because it's the same general kind of thing that it takes to go from a personal program to something that can pass for a product (in the Fred Brooks sense). The distance from 'works for me' to 'it should work for everyone and it's probably the right way to do it' is not insubstantial.)
Almost all of the time that people say 'you should submit a patch' they don't actually mean 'you should submit a starting point'. What they really want is 'you should submit a finished, good to go patch that we can confidently apply and then ship'. At one level this is perfectly natural; someone has to do this work and they'd rather you be that person than them (and some of the time you're in a theoretically better position to test the patch). At another level, well, it's not really welcoming to put it one way.
(It also risks misunderstandings, along the same lines as too detailed bug reports but less obviously. If I give you a 'works for me' patch but you think that it's a 'good to go' patch, ship it, and later discover that there are problems, well, I've just burned a bunch of goodwill with the project. It doesn't help that patch quality expectations are often not spelled out.)
There are open source projects that are genuinely not like this, where the call for patches really includes these 'works for me' starting points (often because the project leadership understands that every new contributor starts small and incomplete). But these projects are relatively rare and unfortunately the well is kind of poisoned here, so if your project is one of these you're going to have to work quite hard to persuade skittish people that you really mean 'we love even starting point patches'.
(Note that this is different from saying basically 'bug reports are only accepted when accompanied by patches'. Here I'm talking about a situation where it seems easy enough to make a patch as well as a bug report, but the devil is in the details.)
2015-05-11
The problem with proportional fonts for editing code and related things
One of the eternal attractive ideas for programmers, sysadmins, and other people who normally spend a lot of time working with monospaced fonts in editors, terminal emulators, and so on is the idea of switching to proportional fonts. I've certainly considered it myself (there are various editors and so on that will do this) but I've consistently rejected trying to make the switch.
The big problem is text alignment, specifically what I'll call 'interior' text alignment. Having things line up vertically is quite important for readability and I'm not willing to do without it. At one level there's no problem; automatically lining up leading whitespace is a solved issue, and other things you can align by hand. At another level there's a big problem, because I need to interact with an outside world that uses monospace fonts; the stuff I carefully line up in my proportional fonts editor needs to look okay for them and the stuff that they carefully line up in monospace fonts needs to look okay for me. And automatically detecting and aligning things based on implied columns is a hard problem.
(I used to use a manual page reader that used proportional fonts by default. It made some effort to align things but not enough, and on some manual pages the results came out really terrible. This experience has convinced me that proportional fonts with bad alignment are significantly worse than monospaced fonts.)
This is probably not an insoluble problem. But it means that simply writing an editor that uses proportional fonts is the easy part; even properly indenting leading whitespace is the easy part. In turn this means that you need a very smart editor to make using proportional fonts really a nice experience, especially if you routinely interact with code from outside your own sphere. Really smart editors are rare and relatively prickly and opinionated; if you don't like their interface and behavior, well, you're stuck. You're also stuck if you're strongly attached to editors that don't have this kind of smarts.
(The same logic holds for things like terminal programs but even more so. A really smart terminal program that used proportional fonts would have to detect column alignment in output basically on the fly and adjust things.)
So I like the idea of using proportional fonts for this sort of stuff in theory, but I'm pretty sure that in practice I'm never going to find an environment that fully supports it that works for me.
(For those people who wonder why you'd want to consider this idea at all: proportional fonts are usually more readable and nicer than monospaced fonts. This entry is basically all plain text, so you can actually look at the DWikiText monospaced source for it against the web browser version. At least for me, the web browser's proportional font version looks much better.)
2015-05-06
Why keeping output to 80 columns (or less) is still sensible
When I talked about how monitoring tools should report timestamps and other identifying information, I mentioned that I felt that keeping output to 80 columns or less was still a good idea even if meant sometimes optionally omitting timestamps. So let's talk about that, since it's basically received wisdom these days that the 80 column limit is old fashioned, outdated, and unnecessary.
I think that there are still several reasons that short output is sensible, especially at 80 columns or less. First, 80 columns is still the default terminal window size in many environments; if you make a new one and do nothing special, 80 columns is what you get by default (often 80 by 24). This isn't just on Unix systems; I believe that eg Windows often defaults to this size for both SSH client windows and its own command line windows. This means that if your line spills over 80 columns, many people have to take an extra step to get readable results (by widening their default sized window) and they may mangle some existing output for the purposes of eg cut and paste (since many terminal windows still don't re-flow lines when the window widens or narrow).
Next, there's an increasingly popular class (or classes) of device with relatively constrained screen size, namely smartphones and small tablets. Even a large tablet might only be 80 columns wide in vertical orientation. Screen space is precious on those devices and there's often nothing the person using the device can really do to get any more of it. And yes, people are doing an increasing amount of work from such devices, especially in surprise situations where a tablet might be the best (or only) thing you have with you. Making command output useful in such situations is an increasingly good idea.
Finally, overall screen real estate can be a precious resource even on large-screen devices because you can have a lot of things competing for space. And there are still lots of situations where you don't necessarily need timestamps and they'll just add clutter to output that you're actively scanning. I won't pretend that my situation is an ordinary one; there are plenty of times where you're basically just glancing at the instantaneous figures every so often or looking at recent past or the like.
(As far as screen space goes, often my screen winds up completely covered in status monitoring windows when I'm troubleshooting something complicated. Partly this is because it's often not clear what statistic will be interesting so I want to watch them all. Of course what this really means is that we should finally build that OS level stats gathering system I keep writing about. Then we'd always be collecting everything and I wouldn't have to worry about maybe missing something interesting.)
2015-04-19
A potential path to IPv6 (again), but probably not a realistic one today
In practice, adding IPv6 to existing networks is a lot of work and is clearly going quite slowly in many places, or even not going at all. Given the economic incentives involved, this is no surprise; currently IPv6 primarily benefits people who are not on the Internet, not people who are. So what will drive adoption of IPv6, so that it becomes available in more areas? In particular, what would push http://www.cs.toronto.edu/ towards adding IPv6 to our networks?
My current answer is that the only thing that would really make this important is a noticeable amount of IPv6 only websites and other Internet resources that people here wanted to reach. If this happens and especially if it's increasing, that would create an actual win for our users for us deploying IPv6 instead of the current situation of it just being kind of nice. But where are these IPv6 only resources going to come from?
My best guess is that the most likely place to develop them are areas with large IPv6 penetration today. If you're building a business that is primarily or entirely targeting an IPv6 enabled audience (if, for example, you're targeting mobile users in a geographic area where they all get IPv6), only going with IPv6 for your servers and so on may make your life simpler.
Unfortunately there are a lot of holes in this idea. Even if you're dealing with an area where IPv6 is better than IPv4, running a dual stack environment is probably easy enough that it's cheap insurance against needing to expand into an IPv4 audience (and it means that all sorts of IPv4 only people can at least check you out). Going dual stack does increase IPv6 usage on the whole, but it doesn't turn you into an engine driving IPv6 adoption elsewhere. Beyond that, the Wikipedia page on IPv6 deployment and APNIC's numbers suggests that I've significantly overestimated how many areas of the world are strongly IPv6 enabled at the moment. If there's no real pool of IPv6 users (especially in areas that are not already saturated with IPv4 address space), well, so much for that.
All of this does make me wonder if and when large hosting and datacenter providers will start effectively charging extra for IPv4 addresses (either explicitly or by just giving you a discount if you only want IPv6 ones). That would be both a driver and a marker of a shift to IPv6.
(I wrote about a potential path to IPv6 a while back. This is kind of a new version of that idea from a different perspective, although I had forgotten my old entry when I first had this idea.)
2015-04-18
A core problem of IPv6 adoption is the lack of user benefits
I've written before about some of the economic incentives involved with IPv6 adoption, focusing on who benefits from IPv6. Today I want to touch on this economic issue from another angle. Put simply, one of the big problems is this:
In many places, adding IPv6 to your network won't improve anything for your users.
Sure, from a geeky technical side it's nice to support IPv6 on your network and see ipv6.google.com and so on. Having your network and organization be IPv6 ready and enabled is clearly the right thing, a good thing for the future, and all that. But it's not essential. In fact it's usually not even beneficial, not even a little bit. If you add IPv6 to your network today, generally almost no one will notice anything different.
(Let's pretend that there are no bugs and systems that are unprepared to deal with IPv6 addresses and so on.)
At one level this is great; it's good that you can quietly drop in another network protocol and no one notices. At another level it's catastrophic to IPv6 adoption. IPv6 adoption is a lot of work in most networks; you've got a great deal to learn, a great deal to set up, a great deal to test, and so on. Unless you have a lot of free time it's hard to justify spending a lot of effort on something that doesn't actually deliver real improvements to your users, it's just the right thing to do.
(People like working on right things, but they inevitably get a low priority and thus not very much time. They're sleepy Friday and slack day and 20% time projects, not prime time work.)
Purely from a speed of adoption perspective, it would be much better if adding IPv6 was less transparent because it suddenly let people do things that they couldn't do before. Then you'd have a much easier time of building a case for spending significant effort on it.
(In fact it's my impression that many of the IPv6 adoption stories I've heard about are exactly from situations where adopting IPv6 did deliver real, tangible benefits to the organization involved. See eg Facebook's slides about their internal IPv6 usage, where IPv6 helped them deal with real issues and made their lives better.)
2015-04-15
Illusory security is terrible and is worse than no security
One of the possible responses to my entry on how your entire download infrastructure should be using HTTPS is to say more or less 'well, at least the current insecure approach is trying, surely that's better than ignoring the whole issue'. My answer is simple: no, it's not. The current situation covered in my entry is actually worse than not having any PGP signatures (and perhaps SHA1 hashes) at all.
In general, illusory security is worse than no security because in practice, illusory security fools people and so lulls them into a false sense of security. I'm pretty sure that almost everyone who does anything at all is going to read the Joyent page, faithfully follow the directions, and conclude that they're secure. As we know, all of their checking actually means almost nothing. In fact I'm pretty sure that the Joyent people who set up that page felt that it creates security.
What makes no security better than illusory security is that it's honest. If Joyent just said 'download this tarball from this HTTP URL', everyone would have the same effective security but anyone who was worried about it would know immediately that they have a problem. No one would be getting a false sense of security; instead they would have an honest sense of a lack of security.
It follows that if you're setting up security, it's very important to get it right. If you're not confident that you've got it right, the best thing you can do is shut up about it and not say anything. Do as much as you can to not lead people into a false sense of security, because almost all of them will follow you if you do.
(Of course this is easier said than done. Most people set out to create good security instead of illusory security, so there's a natural tendency to belive that you've succeeded.)
PS: Let me beat the really security-aware people to the punch by noting that an attacker can always insert false claims of security even if you leave them out yourself; since you don't have security, your lack of claims of it is delivered insecurely and so is subject to alteration. It's my view that such alterations are likely to be more dangerous for the attacker over the long term for various reasons. (If all they need is a short-term win, well, you're up the creek. Welcome to security, land of justified paranoia.)
2015-03-11
The irritation of being told 'everyone who cares uses ECC RAM'
One of the hazards of hanging around ZFS circles is hearing, every so often, that everyone who cares about their data uses ECC RAM and if you don't, you clearly don't care (and should take your problem reports and go away). With Rowhammer in the news, this attitude may get a boost from other sources as well. Like other 'if you really care you'll do X' views, this attitude makes me reflexively angry because, fundamentally, it pretends that the world is a simple single-dimensional place.
The reality is that in the current world, picking ECC RAM on anything except server machines is generally a tradeoff. For this we may primarily blame Intel, who have carefully insured that only some of their CPUs and motherboard chipsets support ECC. Although the situation is complex, ever-changing, and hard to decode, it appears that you need either server Xeon CPUs or lower-end desktop CPUs; the current and past middle of the road desktop CPU line (i5 and i7) explicitly do not support ECC. Even with a CPU that supports ECC, you need a chipset and even a motherboard that does, and it's not clear to me what those are and how common they are.
(AMD gets its share of the blame, because apparently maybe not all AMD CPUs, AMD chipsets, and motherboards support it.)
Eliding a bunch of ranting, the upshot is that deciding you must have ECC is not trivial and will almost certainly force you to give up other valuable things in many cases. You'll probably sacrifice some combination of thermal efficiency, system performance, motherboard and system features, and sheer cost in order to get ECC, at least in the desktop space.
(These complications and tradeoffs are why my current desktop machines do not have ECC, although I would love to have it if I could. In fact I have a whole list of desired desktop motherboard features that are probably all more or less mutually exclusive, because desktop choices are suffering.)
For people to say that ECC should be your most important criteria anyways is, well, arrogance; it assumes that the world turns around the single axis of having (or not having) ECC and anything else is secondary. The real world is much more complex than that, especially given that not using ECC does not make your system aggressively dangerous in practice (even with lots of RAM). It follows that saying people who do not use ECC don't actually really care about their data is abrasively arrogant. It is the kind of remark that gets people to give you the middle finger.
It is a great way to make a lot of bug reports go away, though (and a certain amount of people with them).
This applies to pretty much any specific technology, of course. ECC is just the current bugbear (or at least mine).
PS: the corollary to this is that system designs that are actively dangerous or useless without ECC RAM are not broadly useful designs, because plenty of machines do not and will not have ECC RAM any time soon. A 'must have ECC' design is in practice a server only design, and maybe not even then; I don't know if ECC RAM is now actually mandatory on much or all server hardware designs so that, eg, our low-end inexpensive Dell 1Us will all have it.
(I'd like it if they all did, but I don't think we even thought about it when selecting the machines. We did specifically insure and get ECC RAM on our new OmniOS servers, in part because ZFS people keep banging this drum.)
2015-03-10
Why installing packages is almost always going to be slow (today)
In a comment on my entry on how package installs are what limits our machine install speed, Timmy suggested that there had to be a faster way to do package installs and updates. As it happens, I think our systems can't do much here because of some fundamental limits in how we want package updates to behave, especially ones that are done live.
The basic problem on systems today is that we want package installs and updates to be as close to atomic transactions as possible. If you think about it, there are a lot of things that can go wrong during package install. For example, you can suddenly run out of disk space halfway through; you can have the system crash halfway through; you can be trying to start or run a program from a package that is part way through being installed or updated. We want as many of these to work as possible, and especially we want as few bad things as possible to happen to our systems if something goes wrong part way through a package update. At a minimum we want to be able to roll back a partially applied package install or update if the package system discovers that there's a problem.
(On some systems there's also the issue that you can't overwrite at least some files that are in use, such as executables that are running.)
This implies that we can't just delete all of the existing files for a package (if any), upend a tarball on the disk, and be done with it. Instead we need a much more complicated multi-step operation with writing things to disk, making sure they've been synced to disk, replacing old files with new ones as close to atomically as possible, and then updating the package management system's database. If you're updating multiple packages at once, you also get a tradeoff of how much you aggregate together. If you basically do each package separately you add more disk syncs and disk IO, but if you do all packages at once you may grow both the transient disk space required and the risks if something goes wrong in the middle.
(Existing package management systems tend to be cautious because people are more willing to excuse them being slow than blowing up their systems once in a while.)
To significantly accelerate this process, we need to do less IO and to wait for less IO. If we also want this process to not be drastically more risky, we have no real choice but to also make it much more transactional so that if there are problems at any point before the final (and single) commit point, we haven't done any damage. Unfortunately I don't think there's any way to do this within conventional systems today (and it's disruptive on even somewhat unconventional ones).
By the way, this is an advantage that installing a system from scratch has. Since there's nothing there to start with and the system is not running, you can do things the fast and sloppy way; if they blow up, the official remedy is 'reformat the filesystems and start from scratch again'. This makes package installation much more like unpacking a tarball than it normally is (and it may be little more than that once the dust settles).
(I'm ignoring package postinstall scripts here because in theory that's a tractable problem with some engineering work.)
2015-02-11
Good technical writing is not characterless and bland
Recently Evan Root left a comment on my entry on a bad Linux kernel message where he said:
I believe the reason why the Yama message is cryptic and 'intriguing' is because tedious committee sanitized messages such as "AppArmor: AppArmor initialized" are at odds with the core principal behind Ubuntu "Linux for human beings"
This is not an uncommon view in some quarters but as it happens I disagree with it. It's my view that there are two things wrong here.
The largest is that clear technical writing doesn't have to be characterless. Good technical writing is alive; it has personality and character. Bland dry technical writing, the kind of writing that has been scrubbed clean of all trace of character or voice by some anodyne committee, is not good writing. You can be informative without boring people to sleep, even in messages like this. In fact, if you look around it's plain that the best technical writing does very much have a voice and is actively talking to you in that voice.
(There is technical writing where you mostly have to scrub the voice out, like technical specifications, but this is because they are very formal and have to be absolutely clear and unambiguous.)
Such writing with personality is of course harder to create than bland dry writing, which is one reason people settle for unobjectionably bland writing. Pretty much anyone can turn that out on demand just by being as boring and literal as possible. But that is not what people should be producing; we should be producing writing that is clear, informative, and has a voice, even if it takes more effort. This is possible.
(This is the same broad school of writing that produces useless code comments that say nothing at great length.)
The smaller thing wrong is that the original message of 'Yama:
becoming mindful' cannot be described as a message for human beings
(not in the sense that the Ubuntu slogan means it, at least). That
is because it is an in-joke and practically by definition in-jokes
are not particularly aimed at outsiders. Here the audience for the
in-joke is not even 'current Linux users', it is 'kernel developers
and other experts'. A relative outsider can, with work and the
appropriate general cultural background, decode the in-joke to guess
what it means, but that doesn't make it any less of an in-joke.
(And if you do not know what 'Yama' is in the context of the kernel, you will probably be completely lost.)
An in-joke may have character and voice, but it neatly illustrates that merely having character and voice doesn't make writing (or messages) good. The first goal of good writing is to be clear and informative. Then you give it voice.
(This is of course not a new or novel thing in any way; lots of people have been saying this about technical writing for years. I just feel like adding another little brick to the pile.)
2015-02-06
A thought on containerization, isolation, and deployment
It started with a series of tweets by C J Silverio:
@ceejbot: maybe this is the cold medicine, but was thinking that containerization is a reaction to a failure of operating systems. Well, unix.
@ceejbot: Or maybe it's a success, because you can do that on top of Unix, but at a cost.
@ceejbot: A app or server or "thing you want to deploy & run" can't be isolated. Config & deps are splattered all over.
@ceejbot: The container solves the problem of "isolate the consequences of installing & running this thing", which is what an OS should do.
My immediate reaction was that one big reason no OS has taken on actually doing this job is that doing it formally requires a huge formalized API.
Think of all of the abstracted things you may need to deploy an application. There's file storage, finding auxiliary files, hooking into (multiple) language environments, process startup and supervision, periodic task execution, logging, network ports, possible creation of separate limited privilege contexts, temporary scratch space, possible permanent writeable space, and so on and so forth. Today all of these are done in ad-hoc ways; you put magic files into magic places in the filesystem and things happen. To make them systematic, to make them formally part of what the OS does, you need to have a formal interface to do all of these things.
(I'm generously assuming that you're basically installing a service, instead of a command or a library or the like.)
There's a lot of things you can want to do for deployment on a general purpose OS. Really, a lot of things. An API that covered all of these things would thus be huge. Designing and building huge APIs is a big problem and they often don't work all that well.
(Regardless of what system they're in and what job they're trying to do. The stories of big APIs for anything are full of failures, perhaps because it's very hard for a coherent big API to emerge from existing practices.)
In this view, containerization is a move that's designed to reduce the size of the API by shifting complexity to inside it. Instead of being able to specify a complex, rich variety of things to do in deployment, you're restricted by the container API to a very small set of options. It's up to you to implement the remaining complexity (or at least its end result) inside the container, using whatever internal API for your own tooling or ad-hoc collection of hacks you like.
(Since currently most people base their containers on some existing Unix, they basically inherit much of the ad-hoc current ways.)
PS: the current ad-hoc magic file locations and so on are of course an implicit API, but from this perspective the two important things are that this implicit API isn't standardized across systems and it's not stable. It may or may not be documented particularly well or completely on any particular system.