Wandering Thoughts archives


A lingering sign of old hopes for ZFS deduplication

Over on Twitter, I said:

It's funny-sad that ZFS dedup was considered such an important feature when it launched that 'zpool list' had a DEDUP field added, even for systems with no dedup ever enabled. Maybe someday zpool status will drop that field in the default output.

For people who have never seen it, here is 'zpool list' output on a current (development) version of OpenZFS on Linux:

; zpool list
ssddata  596G   272G  324G        -         -   40%  45%  1.00x  ONLINE  -

The DEDUP field is the ratio of space saved by deduplication, expressed as a multiplier (from the allocated space after deduplication to what it would be without deduplication). It's always present in default 'zpool list' output, and since almost all ZFS pools don't use deduplication, it's almost always 1.00x.

It seems very likely that Sun and the Solaris ZFS developers had great hope for ZFS deduplication when the feature was initially launched. Certainly the feature was very attention getting and superficially attractive; back a decade ago, people had heard of it and would recommend it casually, although actual Solaris developers were more nuanced. It seems very likely that the presence of a DEDUP field in the default 'zpool list' output is a product of an assumption that ZFS deduplication would be commonly used and so showing the field was both useful and important.

However, things did not turn out that way. ZFS deduplication is almost never used, because once people tried to use it for real they discovered that it was mostly toxic, primarily because of high memory requirements. Yet the DEDUP field lingers on in the default 'zpool list' output, and people like me can see it as a funny and sad reminder of the initial hopes for ZFS deduplication.

(OpenZFS could either remove it or, if possible, replace it with the overall compression ratio multiplier for the pool, since many pools these days turn on compression. You would still want to have DEDUP available as a field in some version of 'zpool list' output, since the information doesn't seem to be readily available anywhere else.)

PS: Since I looked it up, ZFS deduplication was introduced in Oracle Solaris 11 for people using Solaris, which came out in November of 2011. It was available earlier for people using OpenSolaris, Illumos, and derivatives. Wikipedia says that it was added to OpenSolaris toward the end of 2009 and first appeared in OpenSolaris build 128, released in early December of 2009.

solaris/ZFSDedupLingeringSign written at 23:01:10; Add Comment

Real email has MIME attachments that are HTML

One of the things that MIME parts in email have (or can have) is a content disposition, which theoretically tells your mail client whether the MIME part should be displayed as part of the message (a content disposition of inline) or it should be not displayed by the client and you'd be offered the option to save it, view it with something, and so on (a content disposition of attachment).

(HTTP reuses this idea in the Content-Disposition header, which tells the browser if it should try to display the response or jump straight to forcing you to download it or hand it to some external program.)

In most email, HTML MIME parts have an inline content disposition, because this is how the sender (or their mail software) arranges for them to be visible to the receiver. This is true both for a message that is HTML only or for a 'multipart/alternative' message with (theoretically) equivalent plain text and HTML versions.

For a long time, I've known that our commercial anti-spam filter was counting some varieties of phish spam as 'viruses'. When we first started logging MIME part type information, I discovered that a lot of these rejections for for HTML MIME parts that had an 'attachment' content disposition. This led me to assume that essentially all legitimate real mail with HTML MIME parts had them with an inline content disposition, and only suspicious and probably bad email had 'attachment' HTML MIME parts.

Recently I had reasons to specifically look at our MIME part type logs for email that we can be reasonably confident is good, and I got a surprise. We definitely see legitimate email with HTML MIME parts that have a content disposition of 'attachment'. Apparently this is even the standard and normal behavior of some email clients in some situations, especially when forwarding email.

Beyond the specific fixing of my ignorance and assumption here, in general this has been a useful reminder to me that I don't actually know as much about modern email as I usually think I do. Before I confidently assume something like 'HTML MIME parts that are attachments are suspicious', I should at least go check our logs to see what they say. After all, that's the largest reason we collect this information; we realized that we didn't actually know what sorts of MIME parts our users received and we should.

spam/HTMLAttachmentsAreReal written at 00:33:01; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.