Wandering Thoughts archives


One of the problems with 'you should submit a patch'

Today I reported a relatively small issue in the development version of ZFS on Linux. In theory the way of open source development is that I should submit a patch with my problem report, since this is a small and easily fixed issue, and I suspect that a certain number of the usual suspects would say that I'm letting down up my end of the open source social compact by not doing this (even though the ZoL developers did not ask me for this). Well, there's a problem with this cheerful view of how easy it is to make patches:

It's only easy to make half-assed partially tested patches. Making well-tested good ones is generally hard.

In theory this issue and the fix is really simple. In practice there are a bunch of things that I don't know for sure and that I should test. Here's two examples that I should do in a 'good' patch submission:

  • I should build the package from scratch and verify that it installs and works on a clean system. My own ZFS on Linux machine is not such a clean system so I'd need to spin up a test virtual machine.

  • I should test that my understanding of what happens when an systemd.service ExecStartPre command fails is correct. I think I've correctly understood the documentation, but 'I think' is not 'I know'; instead it's superstition.

Making a patch that should work and looks good and maybe boots on my machine is about ten minutes work (ignoring the need to reboot my machine). Making a good patch, one that is not potentially part of a lurching drunkard's walk in the vague direction of a solution, is a lot more work.

(This is not particularly surprising, because it's the same general kind of thing that it takes to go from a personal program to something that can pass for a product (in the Fred Brooks sense). The distance from 'works for me' to 'it should work for everyone and it's probably the right way to do it' is not insubstantial.)

Almost all of the time that people say 'you should submit a patch' they don't actually mean 'you should submit a starting point'. What they really want is 'you should submit a finished, good to go patch that we can confidently apply and then ship'. At one level this is perfectly natural; someone has to do this work and they'd rather you be that person than them (and some of the time you're in a theoretically better position to test the patch). At another level, well, it's not really welcoming to put it one way.

(It also risks misunderstandings, along the same lines as too detailed bug reports but less obviously. If I give you a 'works for me' patch but you think that it's a 'good to go' patch, ship it, and later discover that there are problems, well, I've just burned a bunch of goodwill with the project. It doesn't help that patch quality expectations are often not spelled out.)

There are open source projects that are genuinely not like this, where the call for patches really includes these 'works for me' starting points (often because the project leadership understands that every new contributor starts small and incomplete). But these projects are relatively rare and unfortunately the well is kind of poisoned here, so if your project is one of these you're going to have to work quite hard to persuade skittish people that you really mean 'we love even starting point patches'.

(Note that this is different from saying basically 'bug reports are only accepted when accompanied by patches'. Here I'm talking about a situation where it seems easy enough to make a patch as well as a bug report, but the devil is in the details.)

tech/PatchSubmissionProblem written at 23:07:58; Add Comment

Email providers cannot stop spam by scanning outgoing email

One of the things that Amazon SES advertises that it (usually) does is that it scans the outgoing email that people send through it to block spam. This sounds great and certainly should mean that Amazon SES emits very low levels of spam, right? Well, no, not so fast. Unfortunately, no outgoing mail scanning on a service like this can eliminate spam. All it can do is stop certain sorts of obvious spam. This is intrinsic in the definition of 'spam' and the limitations of what a mail sending system like Amazon SES does.

Essentially perfect content scanning can tell you two things: whether the email has markers of known types of spam, such as phish, advance fee fraud, malware distribution, and so on, and whether the email will be be scored as spam by however many spam scoring systems you can get your hands on the rules for. These are undeniably useful things to know (provided that you act on them), but messages that fail these tests are far from the only sorts of spam. In particular, basically all sorts of advertising and marketing emails cannot be blocked by such a system because what makes these messages spam is not their content, it's that they are unsolicited (cf, cf).

The only way to even theoretically tell whether a message is solicited or unsolicited is to control not just the sending of outgoing email but the process of choosing destination email addresses. If you only scan messages but don't control addresses, you have very little choice but to believe the sender when they tell you 'honest, all of these addresses want this email'. And then the marketing department of everyone and sundry descends on Amazon SES with their list of leads and prospects and people to notify about their very special whatever it is that of course everyone will be interested in, and then Amazon SES is sending spam.

(Or the marketing people buy 'qualified email addresses' from spam providers because why not, you could get lucky.)

There is absolutely nothing content filtering can do about this. Nothing. You could have a strong AI reading the messages and it wouldn't be able to stop all of the UBE.

(I wrote a version of this as a comment reply on my Amazon SES entry but I've decided it's an important enough point to state and elaborate in an entry.)

spam/OutgoingScanningLimitation written at 00:15:37; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.