Why I'm not looking for any alternatives to iSCSI for us
Broad scale distributed storage systems such as Ceph are an in thing these days (at least in some quarters). A while back a commentator on this entry suggested looking at them as an alternative to our use of iSCSI and I've been mulling over my reaction since then. Let me put it simply: my reaction is strongly negative. The short reason why is that I see no compelling benefits and all alternatives appear to involve more complexity and magic.
Let's assume that the setuid issue can be dealt with somehow (this is a basic prerequisite). First off, it's worth noting that ZFS plus iSCSI plus backends involves completely commodity hardware and (with Illumos) completely open source software; moving to something like Ceph gives no benefits there.
Our current ZFS plus iSCSI environment has simple components where we understand and can predict (at some level) basically everything that is going on. The distribution of data over physical backends and physical disks is not completely predictable (ZFS pools smear data across all of their components in somewhat unpredictable ways) but it is relatively so, as is the performance of the resulting bits. This is a feature for us. We very much do not want a big black box where magic happens and people's data is distributed over, well, something, somewhere.
I do not want to say that Ceph or other distributed storage systems are going to be black boxes, because I suspect that they aren't and I certainly don't have the experience to say one way or another. But what I can say is that I don't see any way in which they're going to be simpler than our current environment. No matter how you slice it we need filesystems inside pools of storage (that are fixed size but expandable) where those storage pools are mapped to some mirrored disk space. ZFS pools on disks is about as direct an expression of this as you can get and we know that it works and that we can manage it easily. I just don't see how a distributed storage system can do this even better, not without introducing magic that we don't want.
(Given the risks of switching from a known to work environment, it's not enough for a distributed storage system to be just as good as our current system. It must be better, and not just a little bit better; it should be substantially and visibly better.)
PS: I'm not saying that distributed storage systems have no use. I can certainly see situations where something like our ZFS plus iSCSI environment would become unmanageably complex and inflexible, for example. But we are not operating anywhere near that scale today or in the foreseeable future.
Sidebar: ease of use versus magic
It's possible to imagine a distributed storage system that makes our environment easier to manage at one level. You could have this cloud of storage, a storage pool management layer that insured that everything in it was mirrored, and a set of storage pools or filesystem groups on top of this (with quota or other size limits). Storage would be automatically managed and migrated and all sorts of good things.
The problem is that this system is much more magical and less predictable than our current environment. For instance, we might generally have no idea which storage pools or filesystems are using any particular chunk of storage because the system handles storage distribution for us. We don't consider this a feature, partly because we definitely want the ability to engineer our system so that certain sources of IO load are fenced off from other sources.