Wandering Thoughts archives

2009-08-24

The problem with the CFQ IO scheduler and our iSCSI targets

Current versions of Linux have several different ways to schedule IO activity; one writeup of this is here. The developers of the iSCSI target software that we use have for some time recommended that people switch away from CFQ (the default one) to either the 'deadline' or the 'noop' scheduler to get better performance. Recently I got around to testing this to see if it made a difference in our environment, and it turns out that it does.

(Whether the difference is significant is an open question.)

The specifics of the difference are less interesting than why it happens. According to what I've gathered from the developers, this is what is going on:

CFQ is the 'completely fair queuing' scheduler. Its goal is to fairly schedule IO between multiple contending processes, so that each gets its fair share of disk bandwidth and IO and that one highly active process doesn't lead to long delays for IO from other processes. (This problem has a long history in Unixes, especially once the unified buffer cache appeared.)

The problem between CFQ and our iSCSI target software is the target driver uses multiple threads and randomly assigns incoming IO requests to them. Each thread is seen as a separate context by CFQ, and as part of being fair CFQ won't merge contiguous requests together if they're in different CFQ contexts. So the frontend splits up one big IO operating into multiple iSCSI requests (because of, for example, size limits on a single iSCSI request), these iSCSI requests are dispatched to different threads on the backend, and then they are scheduled and completed separately (and thus slower).

(The other case that I can imagine is several different streams of sequential IO requests, such as readaheads on multiple files at once. The target software may well split the IO requests for a given stream across threads, thereby killing any chance of merging them.)

Because the deadline scheduler doesn't attempt fair scheduling this way, it will merge such requests together, which is what we want. My understanding is that it will also try to make sure that no individual request waits too long, which has various potential benefits in our environment.

(We effectively have three or four different logical disks on each physical disk, so we don't want IO to one logical disk to starve the others. This implies that iSCSI cooperating with CFQ with one CFQ context per logical disk/LUN would probably be ideal, since that would fairly divide the disk bandwidth between the LUNs. The IET developers are talking about doing that at some point in the future.)

Having written all this (partly to get it straight in my head), I now wonder if this issue also affects other sorts of disk and fileservers that use threads. The big one would be NFS servers, since I believe that they have a similar thread pool setup.

(Unfortunately I have no Linux NFS servers to do experiments with.)

linux/CFQAndiSCSITargets written at 23:04:34; Add Comment

Anti-spam content scanning systems need to scan more

It's long since past the time when anti-spam content scanning systems should decode and scan all the encoded attachments of email messages, especially encoded plaintext ones. Most content scanning systems always been willing to decode base-64 encoded inline text and HTML (it's sort of a basic requirement), but I don't think very many of them scan attachments. The predictable result is that spammers have caught on that attaching their spam in a base-64 encoded attachment works, and it shouldn't.

(And this is not sophisticated spams from sophisticated operations; this is advance fee fraud and the like. I've been receiving an increasing number of these of late, many of which have been getting through the commercial system that we use.)

The sophisticated version of this is to embed the spam in a Microsoft Word .doc file, so pretty soon content scanning systems are going to need to be able to extract text from those too. I'm sure that spammers will try to obfuscate the text, just like they try to obfuscate the text in HTML messages today, but such obfuscation makes a good signature all on its own.

(Yes, accepting random .doc attachments from strangers has its own risks, but in most environments it's probably not politically acceptable to just refuse all of them, however tempting it sometimes is.)

spam/ScanningMore written at 00:54:40; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.