== Revisiting some bits of ZFS's ZIL with separate log devices Back in [[this entry ZFSWritesAndZIL]] I described how the ZFS ZIL may or may not put large writes into the ZIL itself depending on various factors (how big they are, whether you're using a separate ZIL device, and so on). It turns out that I missed one potentially important factor, in fact one that affects more than large writes. If you're using a separate log device, ZFS will normally put all write data into the ZIL (on the presumption that flushing data to the SLOG is faster than flushing it to the regular pool) and will then put the ZIL on your separate log device (unless you've turned this off with the _logbias_ property). However this only applies if the log is not 'too big'. What's 'too big'? That's the tunable ((zil_slog_limit)), expressed in bytes, but how it gets used is a little bit obscure. First, let's backtrack to [[the overall ZIL structure ZFSTXGsAndZILs]]. Each on disk ZIL is made up from some number of ZIL commits; these commits clean out over time as transaction groups push things into stable storage on the pool. This gives us two sizes: the size of the current ZIL commit that's being prepared and the total size of the (active) on disk ZIL at the moment. What ((zil_slog_limit)) does is ~~turn off use of the SLOG for large ZIL commits or large total ZIL log sizes~~. If the current ZIL commit is over ((zil_slog_limit)) or the current total ZIL log size is over twice ((zil_slog_limit)), the ZIL commit is not written to your SLOG device but instead is written into the main pool. The default value of this tunable appears to be only one megabyte, which really startles me. But wait, things get more fun. In ZFSWritesAndZIL I described how large writes are put directly into the ZIL if you have a separate log device, on the presumption that your SLOG is much faster than your actual disks. That decision is *independent* from the decision of whether your ZIL commit will be written to the SLOG or to your real disks (really, the code only checks 'does this have a SLOG?'). It appears to be quite possible to have a SLOG, have relatively large writes be put into a ZIL commit, and then have this ZIL commit written (relatively slowly) to your real disks instead of to your SLOG. You probably don't want this. In a world where SLOG SSDs were tiny and precious, this may have made some sense. In a world where 60 GB SSDs are common as grass it's my opinion that this no longer really does in most environments. Most ZFS environments with SLOG SSDs will never come close to filling the SSD with active ZIL log entries because almost no one writes and _fsync()_s that much data that fast (you can and should measure this for yourself, of course, but this is the typical result). Raising ((zil_slog_limit)) substantially seems like a good idea to me (we'll probably tune it up to at least a gigabyte). (See [[here https://github.com/zfsonlinux/zfs/issues/1012]] for a nice overview of what gets written where and when and also some discussions about what may be faster under various circumstances.)