2019-08-30
ZFS is not a universal filesystem that is always good for all workloads
Every so often, people show up on various ZFS mailing lists with problems where ZFS is performing not just a bit worse than other filesystems or the raw disks, but a lot worse. Often although not always, these people are using raidz on hard disks and trying to do random IO, which doesn't work very well because of various ZFS decisions. When this happens, whatever their configuration and workload, the people who are trying out ZFS are surprised, and this surprise is reasonable. Most filesystems today are generally good and also generally have relatively flat performance characteristics, where you can't make them really bad unless you have very unusual and demanding workloads.
Unfortunately, ZFS is not like this today. For all that I like it a lot, I have to accept the reality that ZFS is not a universal filesystem that works fine in all reasonable configurations and under all reasonable workloads. ZFS usually works great for many real world workloads (ours included), but there are perfectly reasonable setups where it will fall down, especially if you're using hard drives instead of SSDs. Raidz is merely an unusually catastrophic case (and an unusually common one, partly because no one expects RAID-5/6 to have that kind of drawback).
(Many of the issues that cause ZFS problems are baked into its fundamental design, but as storage gets faster and faster their effects are likely to diminish a lot for most systems. There is a difference between 10,000 IOPs a second and 100,000, but it may not matter as much as a difference between 100 a second and 1,000. And not all of the issues are about performance; there is also, for example, that there's no great solution to shrinking a ZFS pool. In some environments that will matter a lot.)
People sometimes agonize about this and devote a lot of effort to pushing water uphill. It's a natural reaction, especially among fans of ZFS (which includes me), but I've come to think that it's better to quickly identify situations where ZFS is not a good fit and recommend that people move to another filesystem and storage system. Sometimes we can make ZFS fit better with some tuning, but I'm not convinced that even that is a good idea; tuning is often fragile, partly because it's often relatively specific to your current workload. Sometimes the advantages of ZFS are worth going through the hassle and risk of tuning things like ZFS's recordsize, but not always.
(Having to tune has all sorts of operational impacts, especially since some things can only be tuned on a per-filesystem or even per-pool basis.)
PS: The obvious question is what ZFS is and isn't good for, and that I don't have nice convenient answers for. I know some pain points, such as raidz on HDs with random IO and the lack of shrinking, and others you can spot by looking for 'you should tune ZFS if you're doing <X>' advice, but that's not a complete set. And of course some of the issues today are simply problems with current implementations and will get better over time. Anything involving memory usage is probably one of them, for obvious reasons.
How I'm dealing with my Python indentation problem in GNU Emacs
The current (cultural) standard for indentation in Python is four space indent levels and indenting only with spaces, never tabs; this is what GNU Emacs' python mode defaults to and what YAPF and other code formatters use. Our new and updated Python 3 code is written in this official standard, as is some relatively recent Python 2 code. However, I spent a very long time writing Python code using 8-space indent levels and tab-based indentation, which means that I have a great deal of existing Python code in this style, including almost all of our existing Python 2 code at work and all of DWiki. For various reasons I don't want to reformat or reindent all of this code, so I want to work on existing code in its current style, whatever that is. Because Python 3 doesn't like it when you mix spaces and tabs, this should include the use of tabs in indentation.
My existing .emacs
settings for this across various different
systems were basically an inconsistent mess. On my desktop I was
reflexively clinging to my old indentation style with various python
mode settings; on our Ubuntu login servers, I'd stopped overriding
the python mode defaults due to shifting toward the standard style, but that left me with the tab problem.
Today, as part of dealing with my .emacs
in general, I decided
that I wanted to have the same .emacs
everywhere, and that drove
me to actively work out a solution.
First, I realized that if I was willing to really commit to shifting my indentation style to the standard on, the only real problem I had was with tabs. GNU Emacs's python mode will automatically detect the current indentation level for existing Python code, and for new files I'll use 4-space indents with spaces no matter what the other files existed in the project. For tabs, I want to continue using tabs if and only if the file is already in my old 8-space tab based indentation style, so the only problem is detecting this.
As far as I can tell there are no existing GNU Emacs features or functions to do this, so I wrote some ELisp to be run as a python-mode hook (which means it happens on a per-file basis). I won't claim it's very good ELisp, but here it is:
(defun cks/leading-tabs-p () "Detect if the current buffer has a line with leading tab(s)." (save-excursion (save-restriction (widen) (goto-char (point-min)) (if (re-search-forward "^\t+" nil t) t nil)))) (add-hook 'python-mode-hook (lambda () (if (and (= python-indent-offset 8) (cks/leading-tabs-p)) (setq indent-tabs-mode t))))
The detection of tab-indented lines here is highly imperfect and can be fooled by all sorts of things, but for my purposes it's good enough; misfires are unlikely in practice. I'm not sure I even have any Python code that uses 8-space indentation but without tabs.
(The start of cks/leading-tabs-p is copied directly from the python-mode function that scans the buffer to determine the indentation level it currently uses. The function naming is superstition based on what I've seen around the Internet.)
I also decided to write some ELisp functions to toggle back and forth between the modern style and my old style and to report the indentation state of a buffer:
(defun cks/python-toggle () "Toggle between old-style Python 2 and modern Python 3 settings." (interactive) (if (= python-indent-offset 8) (progn (setq indent-tabs-mode nil) (setq python-indent-offset 4) (message "Set to modern Python 3 (4-level spaces)")) (progn (setq indent-tabs-mode t) (setq python-indent-offset 8) (message "Set to ancient Python 2 with tabs")))) (defun cks/rep-python () "Report the Python indentation status of the current buffer." (interactive) (message "Python indentation is %d-space indents with %s %s" python-indent-offset (if (eq indent-tabs-mode t) "tabs" "spaces only") (cond ((and (= python-indent-offset 4) (eq indent-tabs-mode nil)) "(Python 3 standard)") ((and (= python-indent-offset 8) (eq indent-tabs-mode t)) "(my Python 2 style)") (t "(something weird)"))))
It's deliberate that after cks/python-toggle
, I'm in one or the
other of my standard indentation styles, even if the buffer started
out in some weird style.
PS: Both python-indent-offset and indent-tabs-mode are buffer-local
variables by the time I get my hands on them, so I can just directly
use setq
and so on. There may be a better way to do this these days,
but my ELisp knowledge is old and rusty.