Wandering Thoughts archives

2007-06-29

The stupidity of being nickled and dimed by vendors

We have some current rackmount servers from Sun and HP. They both come with convenient, built in remote management systems that are basically equivalent from my perspective; each has remote power control, KVM over IP, and virtual media. There's only one important difference between the two: Sun gives me this for free as part of the base server but HP wants a few hundred dollars to give me a license that enables the useful features (KVM over IP and virtual media).

I suspect you can guess whose servers are much, much more attractive to me.

I'm sure HP feels that this is a small expense when compared to the overall cost of their servers. They're wrong, because it doesn't work that way. Optional costs are subject to ruthless pressure and unless you work in an environment with a lot of server turnover it is hard to argue for spending a few hundred dollars extra per server merely to save a few visits to the machine room.

The whole thing feels especially annoying because it wouldn't cost HP any more to just give us the whole thing. In other words, HP is making my life more difficult merely to try to get some more money from us. Although it's nothing new, I still resent being nickled and dimed by vendors, and the only thing HP is really doing is shooting itself in the foot; now, I would much rather buy Sun servers than the equivalent HP servers.

(The most common form of being nickled and dimed by server vendors is having to buy the smallest size disks they sell, at marked up prices, just to get the special drive sleds necessary to put real disks in their servers. We would be happy to buy just the drive sleds at fair prices; it would even be convenient to have spares, so we could have our cold spare drives pre-assembled and ready to go.)

NickledAndDimed written at 15:02:20; Add Comment

2007-06-15

Why I am in system administration instead of programming

One way to put it is that in system administration, an issue that takes all of your time and energy for a week is a big thing and one that takes a month is generally a crisis. The same generally cannot be said of programming except at a very small level; while you may clearly be going up steps, the top of the path is generally not in sight.

Or in other words, you accomplish visible things much faster in system administration, and this gives you real, tangible signs of progress and accomplishment; you can fix a problem and make a user happy in a day, and you often do. Programming, well, it moves more slowly, and thus your rewards come more slowly, sometimes very much so.

(Generally. Web programming can be an exception, and interestingly so can internal applications, because you can deploy new versions much faster. If you are working in an environment where you can actually implement a new feature in a few days, you can roll it out soon thereafter and have the good feeling of seeing it help users. I note that much sysadmin programming is for what I think of as small internal applications, and thus you keep getting your fast feedback.)

All of this ties into the idea that feedback makes us feel that we're getting things done. Because system administration problems are generally comparatively small, you get much faster feedback and rewards than in programming.

WhySystemAdministration written at 23:52:50; Add Comment

2007-06-07

Why I hate firewalls, especially stateful firewalls

I hate firewalls because every firewall between two machines trying to talk to each other is another place for things to go wrong, which means another place to check (somehow) when things do go wrong.

Stateless firewalls at least have the grace to have consistent and predictable behavior; if something is wrong, it is going to be wrong all the time. Stateful firewalls make your life exciting by varying their behavior based on an ever-changing flux of generally unpredictable and inaccessible information, so things can go wrong now and right in ten minutes from now or vice versa.

As alluded to, figuring out which firewall ate your packets is not a trivial exercise. The downside of transparency is invisibility, and even with a stateless routing firewall the tools required to probe its behavior from the outside are quite technical and not necessarily complete. And that's the best case.

(Even on the inside, the tools are technical. You are doing well if your own firewalls tell you about the packets that they drop, reject, or modify.)

FirewallHate written at 23:39:33; Add Comment

2007-06-06

Why you want a filesystem consistency checker

Filesystem consistency checkers have historically had three overlapping purposes:

  1. to patch up the damage done when a machine was shut down part way through modifications to the filesystem.

  2. to find and fix up problems caused by corrupted disk blocks (whether they're caused by a dying disk, a controller error that scribbled random data on a track, or whatever).

  3. to check for and repair structural errors created by operating system bugs.

Fixing inconsistent filesystems is mostly or entirely obsolete these days, due to people moving to journaled filesystems that never allow themselves to get into an inconsistent state to start with. Both of the other reasons remain valid, because systems are fallible at all levels.

In theory there is no need to have the filesystem consistency checker be a separate program, especially since the kernel filesystem code has to do some consistency checking itself. In practice system administrators find a standalone program to be more reassuring, partly because it gives them more control over what can be a nervous process (especially if you suspect that you have problems of the third sort).

It is worth noting explicitly that no amount of block checksumming can protect you against the third sort of problem. Checksums only tell you that the data that made it to disk is the data the operating system thought it was putting there; they can't tell you whether the data itself is completely correct, and so they can't protect against logic errors.

WhyFsck written at 23:07:53; Add Comment

2007-06-04

Why we need our SAN RAID controllers to support logical drives

Not all SAN RAID controllers support the idea of logical disks, where you can make a bunch of drives into a storage pool and then carve the storage pool up into software-managed bits; some prefer the simpler and more straightforward approach of just exporting the pool itself and staying out of the space management business.

Unfortunately, we need logical drives. The problem (and the reason) is that logical drives are the only way to sensibly dynamically split the space on a RAID controller between multiple fileservers. This is because logical drives are the only way to split out relatively small amounts of the controller's disk space with any efficiency and ease.

If you can only split whole disks out to fileservers and you want protection against single drive failures, the minimum allocation to a fileserver is two disks (RAID-1, a 50% overhead). If you want less wasted space, you get to make significantly larger allocations. In addition, adding or removing disks from a RAID set is an expensive operation that requires shuffling a lot of data around. When it can be done at all (which is not guaranteed, especially for shrinking a RAID set), it's not going to be fast.

By contrast, logical drives can split off relatively small amounts of disk space, and they do it easily. Because they are carving free space out of a pool of it, no data has to be shuffled around; because they are using the pool, they are as efficient as the pool itself.

As a result, if you want to be able to shuffle unused disk space in your pool of space on the SAN between fileservers without having to plan it well in advance, you need the SAN RAID controllers to support logical drives. RAID controllers without logical drives are only really good for situations where you can statically allocate things at setup time, especially when you are not splitting up the space very much (otherwise the overhead eats you alive).

LogicalDisksNeed written at 00:47:06; Add Comment

By day for June 2007: 4 6 7 15 29; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.