2010-07-25
Why sysadmins almost never replace distribution packages
I mentioned this in passing recently; today I feel like elaborating on why replacing a distribution package with your own locally built version of something is a big pain in the rear and thus why sysadmins almost never do it.
(I'm going to assume here that you're familiar with building from source in general.)
First, you have two options for how to do this; you can build from
source and just do a 'make install
', or you can actually (re)build a
new package and install it. Doing a raw build from source is horrible
for all of the reasons that we have modern package management, plus
you're going behind the back of your package management system and this
generally doesn't end well.
Rebuilding a package is superficially more attractive, but it causes a number of issues. First, you need to know how to build and rebuild packages for the particular OS that you need the new version of the program on, and to have a build environment suitable for doing this. But let's assume that you can do all of this because you've already invested the time to become a competent package builder for this distribution.
Once you have a package, what you're doing is adding unofficial local packages to the distribution. When you do this, making everything work nicely together becomes your responsibility, and when you override a distribution package instead of adding a new one you also get the headaches of dealing with the distribution's own updates to the package. The distribution may update their version of the package in ways that clash with your version or simply cause your version to be removed, and they may change how the package works in ways that require you to immediately update your own version in order to keep working with the rest of the system.
(In short, you're essentially maintaining a fork of their package and you get to do all of the tracking and updating that that implies.)
In either case you now have to keep track of the upstream version yourself, in order to pick up security issues and (important) bugfixes. If you do not want to lock yourself to using the latest and greatest version, this may include backporting those changes to the older version that you're using. You will probably also want to keep track of the changes that your distribution thinks are important enough to include in their packaged version of the program.
All of this requires more than just work and time; it requires attention (to upstream changes, to distribution changes, to security alerts, etc). Attention is an especially scarce resource for sysadmins, much scarcer than time.
(The one time it starts being worth doing this is when a distribution has hit end of life. In that case, there will be no distribution package changes and the distribution has stopped tracking the upstream for security updates and so on anyways, so either you worry about it or no one will.)
iSCSI versus NFS
These days, an increasing number of storage appliances can do both NFS fileservice and iSCSI (generally using the same underlying pool of disk space), which has resulted in me seeing an increasing number of people who are wondering which one they should use.
The summary of my answer is that iSCSI is a SAN technology and NFS is a fileservice technology. If you want to add storage to a single machine, iSCSI will work acceptably well; if you want to share files among a bunch of machines, you want NFS. If you just want a single machine to have access to a filesystem to store files, I still think that NFS is better.
(One wild card in this is your storage appliance's management features, like snapshots, quotas, and so on, which may well differ significantly between iSCSI and NFS.)
Like all SAN technologies, iSCSI itself won't let you share a disk between multiple client systems; if you need that, you'll need to layer some sort of cluster filesystem on top. At that point you're almost certainly better off just using NFS unless you have some compelling reason otherwise. Hence NFS is the right answer for sharing files between multiple client machines.
(I find it hard to believe that iSCSI from a storage appliance plus a cluster filesystem running on the clients will have any sort of performance or management advantage over NFS from the storage appliance, but I've been surprised before. If the storage appliance's NFS server is terrible but its iSCSI target is good, the simpler solution is to have a single client machine be an NFS server for the storage.)
If all you want is ordinary file service for a single machine, I think that NFS is a better answer because it is generally going to be simpler and more portable. With NFS you can expand to giving multiple machines access to the files (even read-only access) and any machine that can speak NFS can get at the files. With iSCSI, you are pretty much locked to a single machine at a time, and you need a machine that both talks iSCSI and understands the filesystem and disk partitioning being used on the iSCSI disk; in many cases this will restrict you to a single operating system.
(There are cases, such as virtualization hosts, where your client machines are going to be doing exclusive access and really want to be dealing with 'real' devices, and having them use NFS would just result in them faking it anyways by, eg, making a single big file on the NFS filesystem. In that sort of situation I think it makes sense to use iSCSI instead of NFS for the single machine access case.)
It is tempting to say that iSCSI is better because it lets the client treat the storage like any other physical device, without having to worry about all of the networking issues that come up with NFS. This is a mistake; your iSCSI disks are running over a network and thus all of the networking issues are still there, they have just been swept under the rug and made more or less inaccessible. Pretending that they are not there does not make them go away, and in fact the history of networking protocols has shown over and over again that pretending the network isn't there doesn't work in the long run.
(Consider the history of RPC protocols that attempt to pretend that you're just making a local function call. Generally this doesn't go well, especially once latency and network glitches and so on come up. Things happening over a network have failure modes that rarely or never come up for purely local actions.)