2013-07-27
The easy path versus the virtuous path (in system setup)
For reasons beyond the scope of this entry I've decided that it's time for me to get my own server somewhere. At one level this is easy; any number of places will give you, say, Ubuntu 12.04 machines and I'd have no problem setting up the relatively minimal web server environment I want from there.
At another level, this is not the virtuous way to set up your server today (especially if it's a virtual server). There are several tools to automatically provision virtual servers through various places' infrastructure and then all sorts of tools to customize the configuration and set up a standardized, repeatable environment. Doing it right, even for a one-off personal server, would be to use these tools instead of just trying to note down everything I did by hand. This would give me repeatable builds for recreating the server in case of a disaster, testing or moving to another provider, and upgrading the operating system (which is sure to be needed someday).
The problem with taking the virtuous path right now is that it would take me a lot more work. I could set up a virtual server by hand tomorrow (and then recreate it from scratch just to make sure that my notes were correct). Simply picking a configuration automation system could take me days of research, never mind learning my chosen system well enough to set everything up and figure out how to embody my somewhat odd requirements in it.
(If I already had a chosen automation system it would be a lot easier, of course. Then it would just be a matter of writing the necessary recipe and that should be simple.)
If I was doing this as a learning exercise it would be one thing (then I might even try out several different systems). But I'm not. My goal is to get my server up and doing something useful without leaving me with a big annoying mess that will cause me pain in the future. And for that, the virtuous path is looking awfully thorny right now.
2013-07-24
Why vendor prices are important things to have
I have been looking into various bits of hardware lately and I've noticed an irritating trend: equipment vendors that don't list any sort of price on their website. To get pricing information you need to find their VARs or resellers and then find a reseller that lists those prices online. This is absurdly difficult and annoying.
(I award a special bonus prize to vendors whose VAR reseller page contains links to websites that aren't there. Yes, I've run into this.)
The reason I care about prices is simple; prices are part of how I know whether it's worth looking at a vendor's products. If your 24-port 10G Ethernet switch costs $20K, it may be an excellent switch with many fine benefits but we can't afford it. It's purely a waste of my time (and the vendor's time) for me to investigate it any further.
Hiding prices from me has two effects. The obvious one is that it makes me waste my time this way. The less obvious one is that it encourages me to guess about what the prices are so that I can avoid wasting my time. This guessing is mostly not to the benefit of vendors; if you don't list prices I'm much more likely to guess that your prices are too high than anything else.
(It doesn't help vendors if I guess low and keep looking into something that turns out to be too expensive. The odds that your attractive but expensive product will increase our available budget is approximately nil, so I can't buy it however neat it is.)
I understand that vendors don't want to turn their product pages into a shopping site. But in practice pretty much everyone actually does have an MSRP so you might as well give me at least an indication of what it is. Even if it's too high for us today, it will make me think better of you in the future (since you were straightforward about it).
By the way, I have a violent reaction to vendors who are withholding pricing in an attempt to force me to talk to salespeople (or at the very least to give them my email address). These vendors are consciously making my life more annoying and harder in order to make their own lives easier (to put it one way); you can imagine how that makes me feel about them. I am rather uninterested in dealing with people whose first move is to inconvenience me, especially when I suspect that their future moves will be to continue to inconvenience me in various ways (pestering phone calls, being sent vendor email blasts, etc etc).
2013-07-22
External disk enclosures versus disk servers
We're finally looking at renewing our fileserver infrastructure because the hardware is five years old and we're running a version of Solaris that is now laughable. As part of that we're revisiting decisions we made in the original version to see if we still like them. One of those decisions is the issue of how we attach a fair number of data disks to a server.
Generally you have two decent options. You can either get external disk enclosures and connect them to a generic server somehow (these days with your choice of eSATA or SAS) or you can get server cases with room for ever increasing amounts of disks (cabled up to the motherboard somehow). Our current iSCSI backends are built with external disk enclosures, but other groups here use disk servers and have plenty of good experience. I believe that disk servers are generally cheaper although I haven't looked at the numbers.
(The reasonably famous Backblaze Storage Pod design is an example of a disk server.)
Our current plan is to continue with some variety of external disk enclosures, for a few reasons:
- Flexibility in that external disk enclosures can be used with any
server you like (provided that the server can take an appropriate
interface card). If the server model you like becomes unavailable
you can switch to another and in an emergency you can connect an
enclosure to almost anything.
As part of this external disk enclosures may well have a longer lifetime. A disk server is strongly tied to its motherboard and motherboards become obsolete within a few years. An external disk enclosure can be used (and reused) with many generations of servers without problems.
- It's easier to fix many hardware failures. External disk enclosures
are anonymous components so if one fails for some reason you can
put your spare into place then swap the disks in and you're done.
Dealing with a failure in integrated hardware is more complicated
because there are non-anonymous parts of it (like Ethernet ports
with specific MAC addresses).
(This may be a bias due to our experience; we've mostly had disk enclosures fail, not servers. Maybe a disk server would be as reliable as our servers, not our disk enclosures.)
- Less has to change if you have some servers with a different number
of data disks. We have some servers with a few SSDs instead of a
number of HDs; the only real difference is what sort of external
disk shelf they use (and external disk shelves are anonymous).
Among other things, this makes the spares situation easier.
- What we want in total disk count doesn't seem to fit well in disk servers. We're looking at 12 to 16 data disks (ideally the latter) plus two system disks, and we want all of them to be hot-swappable. This doesn't seem to fit well with what I could see in disk servers; the next jump after 16 hot-swappable bays is 24, which is too many for us in a single server.
Our overall feeling is that using external disk enclosures in our current environment has been a not insignificant win and that we value the flexibility and so on that they've given us.
(My impression is that disk servers use somewhat less physical space. This is currently not an issue for us.)
2013-07-07
Sometimes the right thing to do is nothing (at least right then)
As system administrators we tend to have a tropism towards heroism. If there is a problem and something that can be done, then by gum we feel that we should do it. No matter what the exact circumstances, sitting on our hands feels very wrong.
We just lost an entire iSCSI backend for one of our fileservers. This has happened before and it took only a brief amount of sysadmin work to deal with, but this time around there are three things different. First, this happened at 11pm on a Sunday night. Second, there isn't anyone in physically in the office. Third, there are some anomalies about the current state of our hot spare iSCSI backend.
(As a corollary to the second issue I don't know exactly what went wrong with the iSCSI backend, beyond all of its data disks disappearing from the system. There any number of potential causes.)
I could heroically spring into action anyways; imbibe a bunch of caffeine, either work remotely or race in to the office, ignore the anomaly as unimportant or kludge things together. But as I started to think about this and plan what I'd need, a little voice in the back of my head piped up to ask: are you crazy?
Several rested, alert sysadmins are going to be in the office in approximately ten hours (possibly less). Thus, putting the hot spare backend into production right now will gain us at most a ten hour head start on resynchronizing several terabytes of disk space. This is not completely insignificant but it's also not particularly huge (I expect the resync to take days, even if we run it flat out and accept the impact on users). Set against that moderate potential gain is the large potential downsides if something goes wrong for any number of causes.
One of the rules of sysadmin crisis response should be do no harm, and one of our jobs is to evaluate our heroic impulses and urges against that standard. Sometimes the right answer is to do nothing because we cannot be confident enough that our actions are sure to improve the situation instead of making it worse.
Am I confident that I'm making the right decision here? No. Not at all. It's almost certain that I could put the hot spare backend into production without problems and then we'd have a ten hour head start. But that 'almost' stays my hand.
(Note that we don't have any requirement to provide crisis response outside of working hours. In many organizations the sysadmins are on the hook for out of hours responses and this would be considered a sufficiently important crisis to force people into action. I think that those organizations may be making a mistake for reasons connected to why me doing things could be a bad idea, but that's another entry.)
A mistake to avoid with summer interns
If you're part of a university and you have both some spare money and some work that you'd like to get done but don't have the time and energy for with your existing staff, one traditional solution is to hire a student or two for the summer. We've done this in the past and in retrospect we made a mistake or two in the process. Today I want to write about it, partly so that I can hopefully avoid mistakes in the future.
The big mistake to avoid is do not abandon your summer intern in a corner. Even if your interns are perfectly competent (which ours have been) and are working on completely self contained projects, there are two things that will go wrong here.
The obvious problem is that what you will get at the end of the summer is a black box, because you won't have been involved in developing or doing whatever your intern worked on. Even if your intern has meticulously documented everything about it you're going to have to read that documentation first and the odds are very good that the documentation will turn out to be not good enough. This is especially likely if you don't read the documentation until the end of the summer, when the intern is leaving. The less obvious problem is that there probably will be design issues with how the project is constructed and how it works. Your summer intern is likely quite competent, but they are still an inexperienced student not an experienced sysadmin or programmer like you are (and especially they're not familiar with your specific environment and so on).
In retrospect, none of this should be surprising to me. We really need to treat summer interns as (very) junior people that we actively supervise and work with, not as magic black boxes where we insert requests and get perfect results back out from. The corollary, which I hope I remember in the future, is that it's a mistake to get summer interns if what we really want or need is a black box.
(Abandoning interns in a corner may sound crazy, but trust me, it came about in a very natural way. When you're already busy yourself it takes active work to carve out time to work with an intern, and because you're busy doing so feels like an imposition that's slowing you down. It's very tempting to think that you don't really need to or just to let it slip, although this is not a good thing.)
PS: there are other reasons not to do this that have to do with the intern's experience, but that's a topic for another entry.