Wandering Thoughts archives

2018-12-24

Some notes on ZFS prefetch related stats

For reasons beyond the scope of this entry, I've recently been looking at ARC stats, in part using a handy analysis program by Richard Elling. This has gotten me looking again at ZFS (k)stats related the prefetching, which I touched on before in my entry on some basic ZFS ARC statistics and prefetching. So here are some notes on what I think these mean or might mean.

To be able to successfully prefetch at all, ZFS needs to recognize and predict your access pattern. The extent to which it can do this is visible in the ZFS DMU zfetchstats kstats; zfetchstats:hits is the number of reads that matched a prediction stream, while zfetchstats:misses is the number of reads that did not match one. If zfetchstats:hits is low, there are two possible reasons; you could have a mostly random IO pattern, or you could have too many different sequential streams reading from the same file(s) at once. In theory there is a kstat that counts 'you had too many streams for this file and I couldn't create a new one', zfetchstats:max_streams. In practice this seems to be useless and you can't really tell these cases apart, because as far as I can tell even random access to files creates ZFS prefetch streams.

Every file can have at most zfetch_max_streams streams (default 8), and even streams that have never matched any reads aren't removed for zfetch_min_sec_reap seconds (default 2). So when you start doing random reads to a new file, as far as I can tell your first 8 random reads will immediately create 8 DMU prefetch streams and then every read after that will still try to create a new one but fail because you've hit the maximum stream count for the file. Since the streams are maxed out, each new random read will increment both zfetchstats:misses (since it doesn't match any existing stream) and zfetchstats:max_streams (since the file has 8 streams). Every two seconds, your current streams expire and you get 8 new ones from the next 8 random reads you do.

(This theory matches the numbers I see when I produce a flood of random reads to a large file with ioping. Our ZFS fileservers do show a slowly growing difference between the two stats.)

As discussed in my previous entry, the ARC 'prefetch hits' statistics count only how many prefetch reads were found in the ARC instead of needing to be read from disk. A high prefetch ARC hit rate means that you're doing sequential reads of files that are already in the ARC and staying there (either because you've read them before or because you recently wrote them). A low prefetch ARC hit rate means that this isn't happening, but there are multiple reasons for this. Obviously, one cause is that your sequential re-reads are collectively too large for your ARC and so at least some of them are being evicted before you re-read them. Another cause is that you're mostly not re-reading things, at least not very soon; most of the time you read a file once and then move on.

If you know or believe that your workload should be in the ARC, a low ARC prefetch hit rate or more exactly a high ARC prefetch miss count is a sign that something is off, since it means that your prefetch reads are not finding things in the ARC that you expect to be there. A low ARC prefetch hit rate is not necessarily otherwise a problem.

I believe that there are situations where you will naturally get a low ARC prefetch hit rate. For example, if you perform a full backup of a number of ZFS filesystems with tar, I would expect a lot of ARC prefetch misses, since it's unlikely that you can fit all the data from all of your filesystems into ARC. And this is in fact the pattern we see on our ZFS fileservers during our Amanda backups. On the other hand, you should see a lot of ARC demand data hits, since prefetching itself should be very successful (and this is also the pattern we see).

solaris/ZFSPrefetchStatsNotes written at 23:56:26; Add Comment

Plaintext parts of email are fading away (in spam and non-spam)

One of the things that I've been noticing these days is how much plaintext parts of emails are fading away. I'm not talking here about HTML-only emails (which have been on the rise here for years); instead, this is about MIME multipart/alternative email which theoretically has both a plaintext and a HTML portion. For years I've had my mail system set to show me the plaintext version instead of the HTML version. For a long time that worked reasonably well, but increasingly it's not; when there is a plaintext version that isn't just 'get a HTML capable client', more and more often the plaintext version is incomplete or otherwise not really functional.

This happens in regular email and it also happens in spam email. For instance, my spamtraps recently captured some email where the plaintext portion started:

To view it online, please go here: %%webversion%%

That's the literal text, and it comes from a spam operation that's clearly organized and using dedicated software (and servers) for their spamming.

Of course, plenty of spammers still use plaintext or functional multipart messages; it seems to be especially common with advance fee fraud spammers, who generally have plain text messages anyway and who may be using well implemented webmail software that does this right. But if spammers (and significant mailing list operations) cannot be bothered to even look at their plaintext versions and get them functional, I have to conclude that plaintext versions are becoming vestigial remnants in the modern email ecosystem.

This isn't surprising, really. If anything it's sort of surprising that it hasn't happened before now. Apparently inertia is a thing.

Unfortunately, since this is done by both spam software and legitimate senders, a significant mismatch between the plaintext version and the HTML version is probably not a useful sign of spam. Depending on your tastes and who you get email from, it may still be a useful sign of email you don't want to read.

spam/FadingPlaintextParts written at 02:40:17; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.