Wandering Thoughts: Recent Entries

2012-05-22

Our pragmatic experiences with (ZFS) disk errors in our infrastructure

I wrote before about when we replace disks based on errors (and then more on ZFS read errors). Today I want to talk about our pragmatic experiences in our fileserver infrastructure. The first and most important thing to understand about our experiences is that in our environment disk errors are indirect things. Because we are using iSCSI backends, ZFS does not have access to the actual SATA disk status; instead, all it gets is whatever the iSCSI backends report.

(I find it plausible and indeed likely that ZFS could behave somewhat differently if it was dealing directly with SATA disks and had better error reports available to it.)

On the backends themselves we see two levels of read errors, what I will call soft read errors and hard read errors. Both soft and hard read errors seem to generally result in SATA channel resets (which affect all disks on the channel); the difference between the two is that at the end of a soft error the read appears to succeed, while at the end of a hard error we see the Linux kernel log an actual read error (and then iSCSI relays the read error to Solaris and ZFS). On the backends, soft disk errors only report the ATA device name for the disk involved, which can make finding it a little bit interesting; hard read errors report the full name. Handling soft read errors can sometimes take long enough that Solaris sees an IO timeout and retries the IO (and logs a message about it), but usually the only sign on the fileservers themselves is slow IO.

(It's possible that some reads from soft errors are actually returning corrupted data and this is the cause of some of our checksum errors. However, I don't think we've seen a strong correlation between reported checksum errors in ZFS and soft read errors on the backends.)

Our experience is that SMART error reports (on the backends) are all but useless. We do not always see SMART errors for hard read errors (much less soft ones) and we see SMART errors reported on disks that have no observable problems. At this point SMART reports are mostly useful for catastrophic things like 'the disk disappeared'; however, we've seen spurious reports even for those (our current theory is that a smartd check at the wrong time during a SATA channel reset can fail to see the disk).

As far as we've been able to see, hard read errors do get reported to Solaris and ZFS and do result in ZFS read errors. However, I admit that we haven't generally done forward checks here (noticing hard read errors on the backends and then seeing that the Solaris fileservers reported hard read errors at the same time); instead, we have tended to work backwards from ZFS read errors on the fileservers to see that they are mostly hard read errors on the backends.

(Offhand, I'm not sure if we've seen ZFS read errors without hard read errors on the backends. It's a good question and we have some records, but I'm going to defer carefully checking them to a potential future entry.)

We haven't seen ZFS write errors unless the actual disks go away entirely (eg, if we pull a live disk ZFS will light up with write errors in short order). I don't think we've noticed any backend reports about write errors on running disks.

Our old version of Solaris is generally okay with both soft and hard read errors; soft errors sometimes cause IO timeouts and hard read errors wind up with actual ZFS-visible read errors (sometimes after timeouts), but that has mostly been it. The one exception is a single Solaris fileserver install that got itself into an odd state that we don't understand. Although it was theoretically identical to all of our other fileservers, this single fileserver had a very bad reaction to read errors at the ZFS level; after a while NFS became very slow or non-responsive and all ZFS operations would usually eventually start locking up entirely (even things like 'zpool status' for a pool not experiencing IO problems). Once we identified the cause of its lockups, we started aggressively replacing its backend disks the moment they reported hard read errors. This machine had other iSCSI anomalies (eg, it established iSCSI connections at boot very slowly) and we eventually replaced its Solaris install, which seems to have made the problem go away.

(Our troubleshooting was complicated by the fact that this is our only fileserver that uses 1.5 TB disks instead of 750 GB disks on the backends and almost all of our problem disks have been 1.5 TB disks. We weren't clear if it was just how ZFS reacted to this sort of slow hard read errors over iSCSI, something different about the disks, some hardware problem on the fileserver server, something different about the iSCSI backends it used, and so on.)

ZFSDiskErrorsExperience written at 00:49:32; Add Comment

2012-04-28

ZFS and various sorts of read errors

After I wrote about our experience of transient checksum errors in ZFS here, a commentator wrote (quoting me):

Our experience so far is that checksum errors are always transient and don't reappear after scrubs, so for us they've been a sign of (presumed) software weirdness instead of slowly failing disk drives.

Or there was some bit rot and it was fixed by copying good data from another mirrored drive (or re-creating it via RAIDZ) and replacing the bad data. Isn't that that the whole point of checksums and scrubs: go over all the bits to make sure things match?

My view is that the ice is dangerously thin here and that it's safer for us to assume that the checksum failures are not from disk bit rot.

As far as ZFS is concerned there are two sorts of read errors, hard read errors (where the underlying device or storage system reports an error and returns no data) and checksum errors (where the underlying storage claims to succeed but returns data that ZFS can see is incorrect). ZFS covers up both sorts of errors using whatever redundancy the pool (well, the vdev) has, but otherwise it treats them differently; it never attempts to repair read errors (although it's willing to try the read again later) while it immediately repairs bad checksums by rewriting the data in place.

My understanding of modern disks is that on-disk bit rot rarely goes undetected, since the actual on-disk data is protected by pretty good ECC checks (although they're not as strong as ZFS's checksums). When a disk detects a failed ECC (and cannot repair the damage), it returns a hard read error for that sector. You can still have various forms of in-flight corruption (sometimes as the data is being written, which means that the on-disk data is bad but probably passes the drive's ECC); all of these (broadly construed) read errors will result in nominally successful reads but ZFS checksum errors, which ZFS will then fix.

So the important question is: how many of the checksum errors that one sees are actually real read errors that were not recognized as such, either on-disk bit rot that still passed the drive's ECC checks or in-flight corruption inside the drive, and how many of them are from something else?

I don't know the answer to this, which is why I think the ice is thin. Right now my default assumption is that most or all of the actual drive bit rot is being detected as hard read errors; I make this partly because it's the safer assumption (since it means that we don't understand the causes of our checksum failures).

PS: ZFS's treatment of read errors means that in some ways you would be better off if you could tell your storage system to lie about them, so that instead of returning an actual error it would just log it and return random data. This would force a checksum error, causing ZFS to rewrite the data, which would force the sector to be rewritten and perhaps spared out.

(Yes, this is kind of a crazy idea.)

Sidebar: the purpose of scrubs

Scrubs do three things: they uncover hard read errors, they find and repair any checksum errors, and at a high level they verify that your data is actually redundant and tell you if it isn't. Because ZFS never rewrites hard read errors, scrubs do not necessarily restore full redundancy. But at least you know (via read errors that persist over repeated scrubs) that you have a potential problem that you need to do something about (ie you need to replace the disk with read errors).

(Because a ZFS scrub only reads live data, you know that any read error is in a spot that is actually being used for current data.)

Sidebar: the redundancy effects of read errors

If your vdevs are only single-redundant, a read error means that that particular piece of data is not redundant at all. If you have multi-way redundancy, eg from raidz2, and you have read errors on multiple disks I don't know if there's any way to know how much redundancy any particular piece of data has left. Note that ZFS does not always write a piece of data to the same offset on all disks, although it usually does.

(If you have multi-way redundancy and read errors on only a single disk, all of your data is still redundant although some of it is more exposed than it used to be.)

ZFSReadErrorTypes written at 01:46:53; Add Comment

2012-04-26

When we replace disks in our ZFS fileserver environment

Recently, someone came here (well, here) as the result of a Google search for [zfs chksum non zero when to replace disk]. As it happens this is an issue that we've faced repeatedly so I can give you our answer. I don't claim that it's the right one but it's mostly worked for us.

First off, we have yet to replace a disk due to ZFS checksum errors. Our experience so far is that checksum errors are always transient and don't reappear after scrubs, so for us they've been a sign of (presumed) software weirdness instead of slowly failing disk drives. If we ever have a disk that repeatedly gets checksum errors we might consider it a sign of slow failure and preemptively replace the disk, but that hasn't happened so far.

The usual sign of a problematic disk here has been one or more persistent read errors. The cautious thing to do when this happens is to immediately replace the disk; for various reasons we don't usually do this if there are only a handful of read errors. Instead we mostly wait until one of three things: either there are more than a handful of read errors, the read error count is increasing, or it seems that handling the read errors is causing performance issues. For us, this balances the disruption of disk replacement (and the cost of disks) with the risk of serious data loss (and hasn't blown up in our faces yet).

(Because ZFS doesn't make any attempt to rewrite read errors (although I wish it would), they are basically permanent when they crop up. We do check reported read errors to see if the iSCSI backends are also reporting hard read errors, or if things look like transient problems.)

So that's my answer: don't replace on ZFS checksum errors unless there's something unusual or persistent about them and only replace on small numbers of read errors if you're cautious (and even then you should check to make sure that the actual disks are reporting persistent read errors). If we ever have hard write errors I expect that we'll replace the disk right away, but that hasn't happened yet.

(Based on our lack of write errors, you can probably guess that we have yet to have a disk die completely on us.)

We never reuse disks that we've pulled and replaced, even if they only had a few read errors. They are always either returned under the warranty or discarded. Yes, in theory they might be fine once those few bad sectors were remapped by being rewritten, but in practice the risk is not worth it.

Sidebar: why disk replacement is disruptive for us

Replacing disks is disruptive both to the sysadmins and to some degree to our users. Partly this is because our pools resilver slowly and with visible IO impact (note that ZFS resilvering is effectively seek limited in many cases and affects the whole pool). In our environment, replacing a physical disk the fully safe way can require up to six resilvers; if we restrict ourselves to one resilver at a time to keep the IO load down, that by itself can easily take all day. Another part of this is because pulling and replacing a disk is a manual procedure that takes a bunch of care and attention; for instance you need to make absolutely sure that you have matched up the iSCSI disk name with the disk that is reporting real errors on the iSCSI backend (despite a confusing mess of Linux names for disks) and then correctly mapped it to a physical disk slot and disk. This is not work that can be delegated (or scripted), so one of the core sysadmins is going to wind up babysitting any disk replacement.

(I'm sure that more upscale environments can just tell the software to turn on the fault light on the right disk drive enclosure and then send a minion to do a swap.)

ZFSWhenReplaceDisks written at 01:28:25; Add Comment

2012-04-06

Why we haven't taken to DTrace

Recently I read Barriers to entry for DTrace adoption (via Twitter). As it happens I have an opinion on this, since we use Solaris and I have done a modest amount of things with DTrace. My belief is that DTrace has between two and three problems, depending on how you look at it.

(Part of our non-use of DTrace is that I once had a bad experience where starting to use DTrace on a production fileserver had immediate and significant bad effects. I've seen DTrace work okay since then but the uncertainty lingers, especially for writing my own DTrace scripts. But that's only a relatively modest part of it.)

First is that it's pretty hard to really use DTrace if you're not familiar with Solaris kernel internals. This issue takes some explanation (unless you've tried to use DTrace, in which case you're probably awfully familiar with it). What it boils down to is that there are really two DTraces, one for extracting subsystem information from the kernel and one for debugging the kernel, and the first one is incomplete.

In theory, DTrace lets you tap into all sorts of documented trace points that Solaris has put into the kernel, extracting a wide variety of interesting state from each of them (you can read the coverage of the various providers in the DTrace documentation). In practice, the Solaris kernel developers have never provided enough trace points with enough state information to be really useful by themselves. Instead they leave you to fall back on the 'kernel debugging' side of DTrace, where you can intercept and trace almost any function and extract random information from kernel memory provided that you know what you're looking for and what it means.

There are two problems with this (at least from my perspective). The first is that most of the really interesting uses of DTrace require using the kernel debugging DTrace and using the kernel debugging DTrace requires understanding the internals of the kernel. Ideally you need the code, which has always made things a little bit interesting (even before Solaris went closed source, OpenSolaris source did not exactly match Solaris (cf)). The second is that the DTrace documentation has never tried to address this split, instead throwing everything together in one big pile that (the last time I read it) was probably more oriented towards the person doing a deep dive into the kernel than a sysadmin trying to cleverly extract useful information from what trace points there are.

(One sign of the documentation quality is that there is a plethora of blog entries and web sites that try to explain clever DTrace tricks and how to use it to get interesting results. Personally I would like to see the documentation split into at least two parts, one for sysadmins and one for people debugging the kernel.)

Second (or third, depending on how you view the documentation problem) is that the DTrace scripting language has plenty of annoying awkwardness and pointless artificial limitations. These are situations where DTrace can do what you want but it forces you to jump through all sorts of hoops with no assistance; one example I've already mentioned is pulling information from user space. Many of these issues could be fixed with things like macros and other high level language features (or specific support for various higher level operations), but the DTrace authors seem to have deliberately chosen to keep much of the language at a low level. This is a virtue in a system language but DTrace isn't a system language, it's a way of specifying what information you want to extract from the system and when.

(One unkind way to put this is that the DTrace scripting language is mostly oriented around the needs of the people writing the kernel DTrace components instead of the people who are trying to use DTrace. It's easy to see how this happened but it doesn't make it right.)

These issues don't make DTrace impossible to use, and as a demonstration of that lots of people have written lots of very interesting and useful DTrace scripts. But they do significantly raise the barriers to entry for using DTrace; for most serious and interesting uses, you have to be prepared to learn kernel internals and slog through a certain amount of annoyance and make-work. It should not be any surprise that plenty of people haven't had problems that are sufficiently urgent and intractable to cause them to do this.

(It is not just that this stuff has to be learned. It's also that the learning simply takes time, probably significant time, and many people may not have that much time if they're dealing with a non-urgent problem.)

DTraceWhyNot written at 03:08:17; Add Comment

2012-04-02

The problem of ZFS pool and filesystem version numbers

ZFS pools and filesystems have version numbers for the straightforward reason; it lets ZFS augment or (carefully) change the on-disk storage format to add new features. Old versions of ZFS will know that they shouldn't touch a pool with a new version because they don't understand all of its metadata; new versions of ZFS will know that some pools can't have new metadata written to them and so on. All of this is very conventional.

In light of my previous entry on the several OS options for getting ZFS, it's occurred to me that this nice scheme has a little problem. To put it simply: if you have ZFS pool at version 55 is that the Solaris ZFS version 55, the Illumos version 55, or the FreeBSD version 55?

Right now it is always the Solaris version N, because both Illumos and FreeBSD stopped at the last OpenSolaris ZFS pool version. But this situation may not last forever; someday the Illumos people may well want to make a pool change that is not in Solaris, and they may also not want to reimplement some changes that created new Solaris pool version numbers. In fact the Illumos people may not be able to reimplement some Solaris changes; since Solaris is closed source they don't have source code, and Oracle may not release full documentation for the disk format and so on (or the changes may involve patented technology).

To make the problem worse, ZFS version numbers are a sequence where support for version N implies support for everything in version N-1, N-2, and so on. This means that even if Oracle was feeling friendly it can't just allocate a ZFS pool version for some Illumos change, because it would mean that when Oracle wanted to use version N+1 for its next change it would need to support the Illumos version N change.

The root cause of this issue is that when Sun designed ZFS version numbers, they intended there to be a single authority for them, ie Sun itself, and a single sequence of features. This single authority and sequence is viable only so long as there is only one version of ZFS, Sun's Oracle's. But once ZFS forks, which is what it effectively has done, there is no single authority any more and all of this explodes.

Sidebar: the problem for Illumos

In theory Illumos can half-solve this problem by defining a new ZFS property for the Illumos ZFS version; Illumos pools would then have a base ZFS version number of something or other (possibly set at the last official ZFS version that Illumos supports) plus their own Illumos version number. However, the problem with this is stopping Solaris systems from improperly importing Illumos ZFS pools, because after all Solaris doesn't know anything about the new Illumos version property.

I think that the only way out for Illumos is for them to create their own Illumos pool version property and then set the basic ZFS version to some implausibly high value, one that Solaris should never reach. Solaris systems will give the wrong error report, but there's only so much you can do.

(Illumos systems would always report the Illumos version number as the pool version number.)

ZFSPoolVersionProblem written at 00:34:41; Add Comment

2012-03-31

Why I no longer believe that you need Solaris if you want ZFS

Four years ago I wrote an entry on why you wanted to use Solaris if you were going to use ZFS. Recently I have been reconsidering this issue, and I no longer believe that you need to pick Solaris if you're going to use ZFS. What has happened is that ZFS and ZFS development has changed drastically.

Back in 2008 it was clear that there was only one ZFS. All of the real ZFS development was happening at Sun and was being done to Solaris; all other versions were copying this work with various delays. Today in 2012 there's effectively not one ZFS any more, but instead at least two and maybe three (or more): Illumos ZFS, Solaris ZFS, and perhaps FreeBSD ZFS. (I don't know how separate FreeBSD ZFS is from Illumos ZFS.)

Illumos ZFS has real developer firepower behind it (many of the original ZFS developers have left Sun Oracle and moved to companies that contribute to Illumos), while at the same time Oracle has made changes that make Solaris 11 far less desirable (eg much higher costs and closed source). It also seems likely that neither version of ZFS will get really compelling changes (like the ability to remove vdevs from a pool). This makes the two versions of ZFS much more balanced and competitive, and the lack of major changes makes a (potentially) older ZFS like FreeBSD's not that unattractive.

(As for support and bug fixes, let's just say that I expect even less from Oracle than from Sun.)

Another, less complementary way of putting it is that with ZFS today what you see now is pretty much what you're going to get in the future. Major changes might happen but they don't seem to be the way to bet. With ZFS basically frozen it's much easier to look at something like FreeBSD, evaluate its ZFS, and say 'this is good enough for us'; you're unlikely to be missing anything important in the future no matter what happens (or doesn't happen) with FreeBSD ZFS development.

To condense a potentially long discussion, all of this leaves me feeling that FreeBSD is now a generally viable mainline ZFS platform. It doesn't have the absolutely latest ZFS and bugfixes (whether you consider these to be the Illumos ones or the Solaris ones), but it has other advantages and its ZFS is likely to be good enough for most things.

(If you really need the features of Oracle Solaris's ZFS, even despite the uncertainties, well, you don't have a choice right now and maybe not ever. But I don't think many people are stuck like that, and I do mean 'stuck'.)

SolarisForZFSII written at 01:03:13; Add Comment

2012-03-12

Why ZFS log devices aren't likely to help us

Back in commentary on my entry on ZFS features that could entice us to upgrade Solaris versions I mentioned that we were in an unusual situation where ZFS log devices didn't seem likely to help us enough to be worth the various costs, but that explaining it properly would require an actual entry. Well, you can guess what this finally is.

The primary purpose of ZFS log devices (hereafter 'slogs') is to accelerate synchronous writes, such as the writes that need to be done when an application calls fsync() (or sync()) or a NFS client issues a NFS v3 COMMIT message (or, I suppose, when an NFS v2 client issues a WRITE, if you still have any NFS v2 clients around). Without an slog, the ZFS pool must make some synchronous writes to your actual pool disks; with an slog, it can make some synchronous writes to what one hopes are very much faster SSDs.

The first reason that we're not likely to see much of a win from slogs is that, well, um, er, it turns out that we're not actually doing synchronous writes. We're still writing to the actual disks, though, and under sufficient load those disks are not going to immediately tell us 'your write has been done'. Also, having slogs would allow us to switch to doing proper synchronous writes without (probably) losing too much performance.

Now we run into the other part of the problem. Every pool needs two slog devices (yes, we'd mirror them), and we have a fair number of pools. It's not feasible to give every pool two physical SSDs; this means some degree of sharing, which means some degree of shared points of failure (and shared IO choke points, since several pools will all be doing IO to the same physical SSDs). It's quite possible that we could wind up with all pools on a single fileserver depending on two physical SSDs for their slogs (in two different backends, of course).

(The third problem is that we would have to put the slog SSDs behind iSCSI. iSCSI itself adds some amount of latency, which creates a lower bound on how fast synchronous writes can go even with an infinitely fast disk system on the iSCSI target.)

For all of this we would get accelerated synchronous writes. But there's another important question: how much synchronous write activity do we actually have? Our belief so far is that most pools are read-mostly with low amounts of writes (and probably bursty writes). When we've looked at disk performance issues, there has been no clear sign pointing to write issues. So all of this effort for slog devices would likely get us not very much actual performance increase in real life usage; in fact, many of our users might not notice.

My impression is that our situation is quite unusual. Most people have only a few big pools, hosted on local disks, and they can easily identify pools that have significant write activity (often from knowing things about the usage, eg 'this pool is used for databases'). In this situation it's much easier to add an slog or two and have it give you a clear benefit.

ZFSWhyNotSlogs written at 00:37:24; Add Comment

2012-02-26

What information I want out of ZFS tools and libraries

Back in comments on my observation that Solaris 11 is closed source, Joshua M. Clulow noted that the Illumos people are working on making a better (and presumably public) version of libzfs, the nominal interface for dealing with ZFS. Although I've moved slowly on this, I think it's time to write down my thoughts about what I want for dealing with ZFS.

First off, my needs are probably somewhat unusual. I don't actually want to do anything to ZFS through libzfs; I just want to extract information. I also mostly don't care if I get an actual C-level API or simply some tools that give me information; either is about as convenient to me, since I'm actually going to consume the information in a non-C environment (either shell scripts or Python, depending on just what we're doing).

What I do need is three things: a stable and documented interface, information in a form that I can easily parse and interpret reliably, and complete information (not just things that have been cooked into some user-friendly form that elides details). The output of current zpool and zfs commands are none of these three; exact output is neither stable nor documented, it's very hard to parse, and it's not complete. What we current get through (ab)using Solaris's current libzfs is complete and easy to 'parse' (C structures are easy to deal with in one sense), but it's not stable or documented.

(I have a moderate bias towards a stable C API for libzfs because at this point I'd rather roll my own information extraction stuff than trust ZFS's own commands, and it's harder to cheat or omit things in a C API. And I don't have to worry that people will feel that, eg, XML is the perfect output format.)

Currently, we need two sorts of information; we need configuration information and pool state information. Configuration information covers things like what disks the pool uses and how it's organized, what filesystems there are, what snapshots there are, and so on. We use this both passively (we periodically record basic information about all pools for tracking purposes) and actively (knowing what disks are in use and how is a vital part of our spares system). Pool state information covers the health of disks in the pool and the state of things like resilvers and scrubs; we use this both for ongoing health monitoring and as part of our spares system.

(We don't currently need to extract performance data but we might at some point in the future.)

As for what specific pieces of configuration and state information we want, the likely answer is 'all of it'. If ZFS tracks it at all, I'm at least potentially interested in it.

Sidebar: how to test a proposed ZFS API

My rather obvious advice to anyone designing a public API for getting ZFS information is to test it by rewriting the information display portions of zpool and zfs using only the public API. If you can't do it at all, the API has obviously failed. However, if the API doesn't give you any extra information over what those two commands need today, it also fails, because both commands don't display most of the available information about configuration and state.

Generally you should be able to use the API to write an absurdly more verbose version of zpool status, one that will deluge you in a pile of detailed information.

ZFSInformationDesire written at 22:15:29; Add Comment

2012-02-01

A ZFS pool scrub wish: suspending scrubs

Like sensible people, we scrub our pools periodically in order to turn up latent problems. Because pool scrubs have a visible impact on responsiveness (at least in the lightly patched Solaris 10 update 8 that we're running), we only run scrubs on weekends (and only scrub one pool per fileserver). However, we've recently started running into problems where pool scrubs slow the fileservers down enough that backups have started failing.

The obvious way around this is to switch things to only doing scrubs when backups aren't running. Except there's a problem: we run backups every day, they run for a fairly long time every day, and some of our pools take up to fifteen hours to scrub. If we only scrub when backups aren't running, there just isn't a fifteen hour gap that our biggest pools need.

(It's possible that they would scrub somewhat faster if they never overlapped with backups, but that's only a vague possibility. And as the pools get more data, they'll take longer and longer to scrub.)

Which brings me to my wish: I wish you could suspend ZFS pool scrubs. Not stop them and start them again from the start, but just put one to sleep by telling the pool to remember where the scrub was but do no further scrub IO for now, then later resume the scrub from where it left off. This would allow us to do even big scrubs around the backups, and in fact we could schedule scrubs much more liberally than we do right now. For example, we might have a couple of hours in a weekday early morning after backups have finished that we could use to get some scrubbing in.

(I'd be perfectly happy if this was only an in-memory pause, so that if you rebooted your system or exported the pool you lost it and had to start from scratch. As an in-memory pause it ought to be relatively simple to implement.)

PS: I checked and this doesn't seem to be in Illumos, at least based on the current Illumos zpool manpage.

ZFSScrubWish written at 11:38:13; Add Comment

2012-01-31

Where is Oracle going with Solaris?

(Disclaimer: rambling ahead.)

Once upon a time, back when Sun was still Sun, it was possible to kind of see what they thought the future market for Solaris was. Solaris wasn't Linux, but they could load it with attractive features (ZFS, DTrace, arguably Zones, etc) to make up for being not-Linux and then sell it for a relatively low price to hook the low end of the market. Arguably Sun skipped the bit where they upsold to more lucrative services later.

(In this view, the free Linux distributions serve as a valuable initial hook for higher end commercial Linuxes like Red Hat Enterprise. A small company is unlikely to buy RHEL right away; instead they can progressively move closer, first with Debian or Ubuntu, then with CentOS, and finally they start paying Red Hat when they get tired of the alternatives. Since very few people were going to jump from a Linux to Solaris, Solaris needed a similar entry-level hook.)

Then Oracle took over Solaris and now I don't understand how they see its future. The initial moves were straightforward: Oracle drastically raised prices and effectively drastically reduced hardware availability. Then of course they killed off other features that made Solaris attractive, like source availability. As far as I can see this took out the bottom end of the Solaris market entirely.

(It's hard to find current pricing for Solaris on non-Oracle hardware. The best I could find on Oracle's own website was $1k per core per year; it's not clear if you can get a better deal through either Dell or HP, which were at one point theoretically reselling Solaris on their own hardware. I couldn't configure a low-end 1U Dell server with Solaris, for what that's worth.)

One possible answer is that Oracle has no real plans for Solaris's future. In this view, they're treating it as a declining asset and milking it to get as much money as possible from those people who have to have Solaris. As the ranks of those people dwindle, Solaris itself will dwindle away with them. Eventually Oracle will politely sunset it and no one will really care. In this view, the relatively high prices for Solaris (and the outrageously high ones for non-Oracle hardware) are somewhat deliberately designed to discourage new customers; the last thing Oracle wants is for Solaris to actually get popular, because then Oracle would have to start spending real money on it.

Another possible answer is that Oracle thinks that Solaris has a viable future on big iron but not on low end hardware. I'm a professional skeptic about big iron in general, so I'm not well placed to evaluate how realistic this is. I think you can make a case that big iron customers are mostly insensitive to both the exact operating system (they care about the apps, which are often layered on top of a database to start with) and the licensing costs, but will value various (theoretical) Solaris virtues like resilience and inspectability with DTrace (especially if Oracle integrates DTrace support into their database products). On the other hand they do care about TCO (and there can be a lot of money involved in that TCO with big iron and Solaris licensing) and I'm not sure Oracle has a good sales pitch for Solaris against the relentless march of cheaper Linuxes.

(I'm not persuaded by the variant of this where Solaris is supposed to be the true home of Oracle's database software, because it requires customers to either like or be neutral to Solaris and its increased costs. If everyone wants to run Oracle on RHEL, it's hard to make Solaris Oracle's true home.)

All of this is mostly but not entirely academic to me, since it seems clear that we have too little money to interest Oracle. Still, I just can't stop wondering; there was a time when Solaris looked like it had a place in the general Unix future.

(You can argue that Solaris still does, in the form of Illumos and distributions using it. Especially as apparently a whole lot of the Sun technical people have left Oracle and settled at various other places that are working on Illumos; this makes Illumos the technical future of Solaris, and the technical future is the interesting one.)

PS: I would probably be better informed about the speculation on this if I actually followed Solaris news. I don't, because it seems very unlikely that anything Solaris news is going to affect us; Oracle would have to perform one of the world's most spectacular sudden reverses in order to be relevant to us again.

OracleSolarisFuture written at 00:17:09; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
[There's more, starting at 2012/01/19 or Previous 10]
(Previous day)
By day for May 2012: 22; before May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.