2013-01-27
User mode servers versus kernel mode servers
There at least used to be a broad belief that user mode servers for things like NFS, iSCSI, and so on were generally significantly worse than the same thing implemented in kernel mode. I've reflexively had this belief myself, although I no longer think it has much of a basis in fact. Today, sparked by a comment on this entry and my reply there, I feel like working through why (to the best of my ability).
As always, let's start with a fundamental question: what are the differences between a user mode server and a kernel mode version of the same thing? What extra work does a user mode server do that a kernel mode version avoids? My answer is that there are four possible extra costs imposed by user mode: extra context switches between user and kernel mode, the overhead added by making system calls, possibly extra memory copies between user and kernel space, and possible extra CPU time (and perhaps things like TLB misses) that running in user space requires. User space memory can also get swapped out (well, paged out), but I'm going to assume that your machine is set up so that this doesn't happen.
Let's assume that all of these add some overhead. The big question is when this added overhead matters and I think that the general answer has to be 'when a kernel mode server is already running on the edge of the hardware's performance limits'. If a kernel server has lots of extra room in terms of CPU performance, memory bandwidth, and response latency then adding extra overhead to all of them by moving to user space is not likely to make an observable performance difference to outside people. This is especially so if performance is dominated by outside factors such as the network speed, disk transfer speeds, and disk latencies (yes, I have iSCSI and NFS on my mind).
Or in short if you can easily saturate things and reach the performance limits imposed by the hardware, user versus kernel mode isn't going to make a difference. Only when a kernel mode version is having trouble hitting the underlying performance limits is the extra overhead of a user mode server likely to be noticeable.
An equally interesting question is why user mode servers used to be such a bad thing and what changed between then and now. As it happens, I have some views on this which I am going to boil down to point form answers:
- CPUs are much faster and generally lower-latency than in the old days,
and especially they've gotten much faster compared to everything else
(except possibly if you're using 10GB Ethernet instead of 1GB Ethernet).
- OSes have gotten better about system call speed, both at the context switch boundaries between user and kernel space and in the general system call code inside the kernel.
- OSes have gotten better system call APIs that require less system calls and have more efficient implementations.
- OSes have devoted some work to minimizing memory copies between kernel
and user space for networking code, especially if you're willing to be
OS-specific.
- we now understand much more about writing highly efficient user level code for network servers; this is part of what has driven support for better kernel APIs.
The short, general version of this is is simply that it became much easier to hit the hardware's performance limits compared to how it was in the past.
Some of the old difference may have been (and still be) pragmatic, in that kernel developers generally have no choice but to write more careful and more efficient code than general user level code. Partly this is because the user level code can take various easy ways out that aren't available to the kernel code; by running in a constrained environment with various restrictions, the kernel forces developers to consider various important issues that user level code can just brush under the carpet.
2013-01-20
Disaster recovery for computers is a means, not an end to itself
When you draw up disaster recovery plans for your organization's computers, there is something very important to remember: the ultimate goal of a DR plan for computers is to help the organization to keep working in the face of a disaster. On the one hand, this sounds obvious. On the other hand, there is a huge difference between allowing the organization's computers to keep working after a disaster and allowing the organization to keep working after a disaster. The difference is that there are plenty of other things that your organization may (also) need in order to keep functioning.
(Of course there are organizations where computing is the most important thing about them and is basically the only thing that they need.)
How this matters is that in the broad view, there is no point in the organization's computers being back if the organization is not otherwise functioning. There is especially no point in spending money (or preallocating resources) to make computing survive when the organization doesn't. Doing so is the equivalent of planning to carefully construct and paint a single wall of a house all by itself, without the rest of the house. It's a very nice wall, very well constructed, you've thought of all of the contingencies in building it, but it has no point. All your planning effort is wasted effort.
(It's easy to overlook this if your job is to care very, very much about that one wall.)
Or in short, computing disaster recovery is just one component of overall disaster recovery. It is often not complete by itself.
One consequence of this is that if the organization doesn't or can't have a disaster recovery plan for the other things that it needs to function, a computing DR plan may be more or less pointless. Or at least you don't need a comprehensive DR plan; all you need is a DR plan that covers the contingencies where the only important thing that the organization has lost is the computers. In other words, there may well be some risks that are not worth mitigating in your computer DR plan because the risk would also destroy other things that the organization needs to function and there are no plans for how to recover from them.
(Again, disaster preparation is different from disaster recovery plans. You can be prepared to (eventually) recover from a building going up in flames without having a specific plan for it.)
On the other hand there are some organizations where the only thing that the organization really needs to keep going is its computers and maybe some people to answer the email. In these organizations, computing DR is organizational DR and it may well make sense to pay a lot of attention to a lot of risks and to try to mitigate them. Understanding what sort of organization you're in and what the organization's crucial resources actually are is a big part of good, sensible DR planning.
(The corollary of this is that there are no one size fits all answers for what risks you should consider in computing DR planning.)
2013-01-12
Runtime loading of general code usually requires general dynamic linking
Let me set the stage. You have a system that supports dynamic
linking and dynamic code loading in general, with a low-level
interface for this; for example, modern Unixes have dlopen(),
dlsym(), and so on. You also have a language runtime that's entirely statically
linked, and you want to have a program in this language environment
dynamically load some of its code at runtime, for example to implement
some sort of plugin system. One hypothetical approach to doing this is
to directly use the low-level system interface for dynamic loading; you
build the plugin code into a single object file, use system tools to
transmogrify it into a dynamically loadable one, and then dlopen() it
in your program, fish entry points out with dlsym(), and call them.
(Depending on what sort of FFI your language has, you might need some sort of low level shim to sit between your plugin's code and your main program's code.)
All of this sounds nice, but is it likely to actually work if you tried to implement it? My answer is 'probably not'. The problem is pretty simple to state: how does the plugin code get access to the runtime environment and other outside code?
Unless the plugin code does pure computation (without even memory
allocation), it needs access to the runtime code that does things like
memory allocation, operating system calls, DNS lookups, general IO
routines, and so on. Plus, it may also want access to other higher level
library code or the like.
The plugin can't safely statically link all of this code into itself
unless all of the runtime environment is carefully designed so that you
can run two (or more) independent copies of it inside the same process,
one in your main program and one in the plugin (this is often a big
problem for things like memory allocators and garbage collection). Using
the runtime environment from your main program requires connecting the
plugin to it; you need to either somehow build the plugin so that it
knows all of the right addresses in your main program (which binds it to
a particular build of your main program) or do your own relocation at
dlopen() time. You also need to insure that your main program actually
includes all of the code that plugins will need, whether or not that
code is used by your main program.
What makes runtime code loading work without hassle is general support
for dynamic linking. When the plugin code is dynamically linked, it's
already set up to do runtime lookups of outside addresses (and possibly
runtime relocation). When you dlopen() it in a dynamically linked
program the dynamic loader code only has to connect it to the existing
symbol tables in your program (plus any new shared libraries that it
requires). And of course a chunk of code that's built to be statically
linked doesn't have any of this infrastructure built into it since
such infrastructure isn't necessary and in fact is usually undesirable
(partly because of the added complexity).
2013-01-09
It turns out I'm biased towards kernel iSCSI target implementations
In the process of writing up why we wound up using Linux for our iSCSI targets I wound up looking up the state of iSCSI target implementations on FreeBSD. It looks like they have one but it also seems to be a purely user-mode implementation (as a daemon), and when I realized that I discovered that I have an odd visceral bias against user-mode iSCSI targets and towards an in-kernel implementation.
I find this odd, partly because I don't know where this bias comes from. iSCSI is a sufficiently complex protocol to be the kind of thing I'd normally like to have out of the kernel (in code that's easier to manage and deal with), I don't have any bad experiences with a user-mode iSCSI target implementation (I don't think I've ever used one), and in fact when I looked into ATA-over-Ethernet (which you might expect me to have the same reaction to) I actually considered writing my own user-level implementation for production use.
I don't think there's any strong technical reason to (strongly) favour an in-kernel iSCSI target implementation. Depending on kernel and user level APIs such an implementation might have an easier time of doing efficient disk and network access, but I don't think it's likely to have a huge effect (in part because there's lots of spare CPU with current kernel implementations). In theory an in-kernel implementation might do less copying around of disk buffers. But given the oddities of the protocol you may well not be able to achieve this with any particular iSCSI initiator, or even in general, simply because the protocol (plus TCP) already requires you to slice and dice things around.
Despite all of this, a user-level iSCSI target implementation just makes me reflexively nervous. For no good reason I trust kernel ones more. Maybe I'm not trusting a user-level implementation to be able to manage enough concurrency to drive a big system at full capacity.
(This is one of those entries where I don't have any answers, just a realization that I want to note down. If I get the chance, I now want to explore the performance of a good user-level iSCSI target implementation.)
2013-01-07
Why bad support matters (war story included)
Once upon a time, we wanted to buy an 'appliance' style machine. We did our research, made a choice, and put our choice through a bunch of qualification and testing in what we figured would be our production configuration. At the last minute we decided to try our test unit out in an odd configuration, one that we weren't planning to run in production, just to see how the overall system would look in that setup.
It exploded, by which I mean 'the appliance locked up'. We had support for the test unit, so we called up the vendor. What followed was one of the worst support experiences I've ever had, with the vendor doing everything it could to not support us short of outright telling us to go away; there were interminable conversations with people who didn't seem to take notes, the usual belief that we were clueless, there was the 'that's not a supported configuration' excuse (multiple times), and so on. By the end of the experience it was clear both that the vendor wasn't going to fix this bug and that the vendor wasn't going to provide any support at all (beyond repairing hardware faults) regardless of what they said.
(The latter is not unusual in practice regardless of what people say in theory.)
This led to me having the experience of sitting around a table with a bunch of other people here, trying to decide if we were going to stay with the appliance. I want to note here that we really wanted to use this appliance. It was clear to us that it was our best, most affordable choice for an appliance solution to the problem we were trying to solve and all other similar appliances we'd looked at either cost too much or were clearly worse than our choice.
(We'd also invested months of work into the appliance-based solution at this point, which was a consideration.)
Our decision was no, we couldn't use the appliance. You might think that this would be a slamdunk, but that's not the case; after all, the bug we'd found wasn't in anything we were planning to do in production. Why should we let one ultimately unimportant (to us) bug and a bad support experience disqualify an otherwise solid unit?
We reached this decision by deciding that where there's one bug, there's quite possibly going to be more than one. We had no guarantee that the next bug to turn up wouldn't be in something that we were using in production, and if we hit an 'appliance locks up' bug in production and got the same level of non-support we would be totally dead in the water. We would have spent tens of thousands of dollars building an environment that didn't work and that failed explosively just when the entire department was depending on it, and we'd have done so knowing that this was a real possibility. We decided that we could not take that risk, even if throwing out the appliance-based solution could cost us a lot of time and a not-insignificant amount of money.
(In the end it merely cost us time.)
That is why bad support matters, sometimes a lot. Using something from a vendor requires trust; you need to trust that the equipment is going to work and you need to trust that the vendor will fix it when it doesn't. Getting bad support in a situation like this means that both elements of this trust have been destroyed; the equipment (sometimes) doesn't work and the vendor is not going to fix it. The only time you can safely continue to use the equipment is when you are absolutely sure that it fully works in your specific situation and thus that you won't need vendor support.
(See also CommercialSupportNote and DefiniteSupportResolution for more ranting about vendor support.)
Sidebar: the commercial cost of bad support
Bad support cost this vendor tens of thousands of dollars worth of direct business (in addition to all of the time their support people, management, and sales chain spent in arguing with us and being yelled at) and who knows how much follow on business from us and other people within the university. At a stroke they went from a centerpiece of what was going to be a fairly visible core system to someone we recommended against having anything to do with.
(We've not bought anything else from them and I doubt we ever will, even though this incident is now years in the past. They have a reputation now.)