The current weak areas of ZFS on Linux
I've been looking into ZFS on Linux for a while as a potential alternative to Illumos for our next generation of fileservers (FreeBSD is unfortunately disqualified). As part of that I have been working to understand ZoL's current weak areas so that I can better understand where it might cause us problems.
The following is the best current information I have; it comes from reading the ZoL mailing list (and at one point asking the ZoL mailing list this exact question).
The weak areas that I know about:
- Using ZFS as your root filesystem requires wrestling with GRUB,
Grub scripts, initramfs-building scripts, and support in installers
(if you want to install the system as ZoL-root from the start).
How well this works depends on your distribution; some have good
support (eg Gentoo), others
have third party repositories with prebuilt packages, and still
others leave you on your own.
- There are periodic problem reports about getting ZFS filesystems reliably mounted on boot.
- In some environments ZoL can have problems reliably finding the
disk devices for your pools on boot. This is especially likely
if you use
/dev/sd*device names but apparently sometimes happens to people who use more stable identifiers.
(Apparently part of the likely solution is to hook ZoL into udev so that as disks are discovered ZoL checks to see if a pool now has a full set of devices and can be brought up.)
- ZoL lacks a number of standard Linux filesystem features, including
O_DIRECT, asynchronous IO, and POSIX ACLs. It also lacks support for issuing TRIM commands to drives (this is apparently only present in the FreeBSD version of ZFS so far).
- There is no 'event daemon' to handle events like disks going away.
The most significant result of this is that ZFS pool spares do not
get activated on disk failure (making them basically pointless).
- ZFS's use of kernel memory is not well integrated with the Linux
kernel memory system, resulting in runaway memory usage in some
situations. Apparently metadata intensive workloads (such as
rsyncruns) are especially prone to this.
The last issue deserves more discussion. All of this is what I've gathered from the mailing list and from looking at the ZFS on Linux source code.
To start with, ZFS on Linux is not really ZFS ported to Linux; instead it's mostly the Illumos ZFS code dropped on top of a layer of code to translate and emulate the Solaris kernel APIs that ZFS needs (the SPL, short for 'Solaris Porting Layer'). This includes a great deal of kernel memory handling. The unfortunate result of this is a series of mismatches between what ZFS thinks is going on with kernel memory and what is actually going on, due to the translation and emulation that is required. Through fragmentation that's invisible to ZFS and other issues, ZFS can wind up using a lot more memory for things like the ARC than it is supposed to (because ZFS thinks it's using a lot less memory than it actually is).
(I suspect that ZFS itself still has some degree of the ZFS level fragmentation problems we've seen but that's much less dangerous because it just leaves the ARC smaller than it should be. The ZoL problem is that the ARC and related things can eat all of your RAM and make your kernel explode.)
Whether this happens to you (and how much it affects you) is unpredictable because it depends very much on the details of how your system uses memory. As mentioned, people seem to have problems with metadata heavy workloads but not everyone reporting problems on the ZoL mailing lists is in this situation.
PS: if you are coming here from Internet searches, please pay attention to the date of this entry. I certainly hope that all of these issues will get dealt with over time.