2010-07-31
The other peculiar effects of grant funding at universities
A long time ago, I wrote about the power that grant funding gives people at universities. But there's a flip side to grant funding, and it is that people with grant funding often don't really have money as the business world thinks of it.
From the outside, it looks like people with grant funding are rolling in cash; they get hundreds of thousands of dollars, or even million dollar grants. From the inside, though, that money is almost entirely tied down for very specific things. Professors do not get to go to grant agencies, tell them 'I would like to do this promising research; it will take about $200K', and walk away with $200K in their research account that they can spend on anything that's necessary to do the research. Instead, both the grant requests and the grant approvals allocate all of that money to quite specific things; so much for buying servers, so much for storage, so much for network switches, so much to pay two people for a year, and so on.
So far, this may sound just like the budgeting process for a department in a company. But here's the kicker for grant funding: you are legally required to only spend the money on what it was approved for. Does it turn out that two people for a year isn't what you actually needed, or you need more servers and less storage than you thought? Do you have a sudden emergency need for money in some other area of the project? Tough. You're pretty much stuck. There is no spending the money on what you need now and justifying it later, or even going to your boss and saying that you'd like to shift the specific allocations around and here's why.
(Naturally there is an entire cottage industry of figuring out how to slide what you really need into the grant's funding categories in a way that will pass auditing, if you ever get audited. For example, just how much disk space does a server have to have before you can say with a straight face that you bought it for storage, not as a compute server?)
One thing that combines somewhat unhappily with this is that grant agencies generally have restrictions on what sort of things they will fund. There is of course an art to describing what you really need in a way that the grant agency will approve funding for and that you can spend the resulting money on with a straight face.
(Sometimes they also effectively have restrictions on who you can buy from, where in theory you can buy from any vendor that is willing to go to the effort but in practice only a few vendors are interested enough to brave the bureaucracy.)
There are sources of relatively unconstrained grant funding, but they are generally not very large when compared to the constrained sort. Generally all of the big ticket grants that sound so impressive are going to come with lots of restrictions on what that money can actually be used for.
(Ie, it is not so much money as somewhat fuzzy things that haven't shown up on the loading dock yet.)
2010-07-25
iSCSI versus NFS
These days, an increasing number of storage appliances can do both NFS fileservice and iSCSI (generally using the same underlying pool of disk space), which has resulted in me seeing an increasing number of people who are wondering which one they should use.
The summary of my answer is that iSCSI is a SAN technology and NFS is a fileservice technology. If you want to add storage to a single machine, iSCSI will work acceptably well; if you want to share files among a bunch of machines, you want NFS. If you just want a single machine to have access to a filesystem to store files, I still think that NFS is better.
(One wild card in this is your storage appliance's management features, like snapshots, quotas, and so on, which may well differ significantly between iSCSI and NFS.)
Like all SAN technologies, iSCSI itself won't let you share a disk between multiple client systems; if you need that, you'll need to layer some sort of cluster filesystem on top. At that point you're almost certainly better off just using NFS unless you have some compelling reason otherwise. Hence NFS is the right answer for sharing files between multiple client machines.
(I find it hard to believe that iSCSI from a storage appliance plus a cluster filesystem running on the clients will have any sort of performance or management advantage over NFS from the storage appliance, but I've been surprised before. If the storage appliance's NFS server is terrible but its iSCSI target is good, the simpler solution is to have a single client machine be an NFS server for the storage.)
If all you want is ordinary file service for a single machine, I think that NFS is a better answer because it is generally going to be simpler and more portable. With NFS you can expand to giving multiple machines access to the files (even read-only access) and any machine that can speak NFS can get at the files. With iSCSI, you are pretty much locked to a single machine at a time, and you need a machine that both talks iSCSI and understands the filesystem and disk partitioning being used on the iSCSI disk; in many cases this will restrict you to a single operating system.
(There are cases, such as virtualization hosts, where your client machines are going to be doing exclusive access and really want to be dealing with 'real' devices, and having them use NFS would just result in them faking it anyways by, eg, making a single big file on the NFS filesystem. In that sort of situation I think it makes sense to use iSCSI instead of NFS for the single machine access case.)
It is tempting to say that iSCSI is better because it lets the client treat the storage like any other physical device, without having to worry about all of the networking issues that come up with NFS. This is a mistake; your iSCSI disks are running over a network and thus all of the networking issues are still there, they have just been swept under the rug and made more or less inaccessible. Pretending that they are not there does not make them go away, and in fact the history of networking protocols has shown over and over again that pretending the network isn't there doesn't work in the long run.
(Consider the history of RPC protocols that attempt to pretend that you're just making a local function call. Generally this doesn't go well, especially once latency and network glitches and so on come up. Things happening over a network have failure modes that rarely or never come up for purely local actions.)
2010-07-14
The challenges of shared spares in RAID arrays
It's getting popular these days for RAID implementations to support what I've heard called 'shared spares'; spare disks that are shared between multiple RAID arrays, so that that they can be used by any array that happens to need them. This is an attractive idea because it gives you better protection against moderate problems than you could get with dedicated spares. (If you have large problems you run out of spares, of course.)
The problem with shared spares is that they are pretty much intrinsically hard to do well in the general case, once you get beyond simple configurations and start working at larger scales. I'll use our fileservers as an example.
Our fileservers have 'RAID arrays' (ZFS pools) of varying sizes that are made up of some number of mirror pairs from two different iSCSI backends per fileserver. Suppose a disk fails in some pool; clearly, if possible we want to replace that disk with another disk from the same iSCSI backend so that we maintain cross-backend redundancy.
Suppose that several disks fail at once, in a situation where we have too few suitable spares to restore all affected pools to full redundancy. In this situation we want as many pools as possible restored to full redundancy, as fast as possible; we'd rather have two smaller pools be fully redundant than one much larger pool be 2/3rds redundant (two out of three mirrors restored to full operation).
Large setups are like this: their disks don't have a flat topology, and they have policy issues surrounding what should be done in situations with limited resources or what should be prioritized first. I'm sure that you can support all of this in a general RAID shared spares system if you try hard enough, but you're going to have a very complex configuration system; it'll practically be a programming language.
(In theory issues of selecting the right spare disk just need a sufficiently smart general algorithm that knows enough or is told enough about the real disk topology. But policy issues of what gets priority can't be sorted out that way.)
Sadly, large systems with lots of RAID arrays are also exactly the situation where you want shared spares. From this I conclude that your shared spares system should be modular, so that sites have a place to plug in different and more sophisticated methods of selecting what disk to use and what RAID arrays to heal first (or at all).
2010-07-10
People forget exceptions
One of the possible replies to the problem of Unix programs exiting non-zero for clever reasons is that all of these special cases and clever reasons are clearly documented in the manpages for the various commands, so all of these problems are not the fault of the authors of the commands.
This view is objectively wrong. The clear, well established truth is that people do not remember weird little exceptions and funny little corner cases in what your program does unless they use your program all of the time (in fact, unless they frequently run into those exceptions). What is there in your documentation doesn't matter; people will forget your documentation too, and just assume that your program does whatever they think that it should and does not have special cases. When they do not entirely remember what your program does, they will skim your documentation to refresh their memories (or find just what they need) and miss all of your little notes about things.
(And none of this should be surprising if you have much exposure to cognitive psychology and similar issues, and frankly everyone who builds computer systems should have that exposure; how humans actually behave is a big part of practical HCI, which system-builders hopefully care about for the obvious reasons.)
This especially applies to Unix tools, because there are just too many Unix tools for people to remember the special cases of any particular one. In fact, this generalizes: the less any particular thing is used, the less its particular quirks and oddities will be remembered by people.
2010-07-03
Returning to the era of 'duplicated' Ethernet addresses
Once upon a time back in the old days, Sun caused quite a stir by deciding that they would make the Ethernet addresses for their hardware be an attribute of the machine, not of the network interface. Or to translate, Sun machines used the same Ethernet address on all of their interfaces instead of doing what everyone else did, which was to have a different Ethernet address on each interface. This is spec-legal but caused various sorts of annoyances for system administrators and network designers; among the set of people who care about this stuff at all, there were many who felt that Sun had made the wrong decision and life was just simpler when multi-homed machines had different MAC addresses on different interfaces.
Over time this became less important as Sun's servers became less popular, and today this particular bit of trivia is probably long forgotten in most places. Even Sun's x86 servers used per-interface Ethernet addresses (and for all I know, Sun gave up on this in the end on their SPARC machines too).
Well, guess what. Those days are back, courtesy of VLAN-aware hosts. Such a machine has only a single physical network interface with a single Ethernet address, but that single interface is used by multiple networks through VLAN tagging. Outgoing Ethernet frames on each VLAN generally have no choice but to use the physical interface's Ethernet address, and so the host will be seen as having the same Ethernet address on different VLAN networks.
(There might be a use for this in detecting whether a host is using VLANs or physically separate network interfaces, assuming that you even care.)
In short: Ethernet addresses are now back to being only unique on their particular network, not globally unique across your entire set of networks. Hopefully no switches will explode.
(However, you're less likely to run into problems with this than people were in the Sun era. Back then, problems ensued because there were legitimate situations where people wanted to connect two network interfaces from the same machine onto the same network and tell them apart.)