Wandering Thoughts archives

2006-06-26

How not to report spam (part 1)

For my sins, I am on one of the aliases here that gets a certain amount of reports of spamming theoretically committed by UofT IP addresses. (I am not one of the people who has to deal with them, fortunately; it is a thankless job). This exposes me to a certain amount of good examples of how not to report spam.

Today's example comes to us from an official government organization in a large Southern American country. All the information they gave us was:

  • the date (with the format spelled out: +1)
  • the time (with the time zone, as an offset from GMT: +1)
  • the sending IP address.
  • the 'SMTP ID', apparently something generated by their system.
  • the virus type it was identified as.
  • the Subject line of the mail.

Unfortunately, the IP address is the IP address of our main outgoing SMTP gateway. It sends a considerable amount of email, and little details like the MAIL FROM and the RCPT TO of the problematic message would have been useful.

(Disclaimer: despite my grumbles, Vernon Schryver's remarks about spam complaints definitely apply. Even people making imperfect spam reports are doing us a favour that they don't have to. It would just be faster to fix the issue if we got more information.)

spam/HowNotToReportSpamI written at 17:37:49; Add Comment

WSGI versus asynchronous servers

Asynchronous servers and frameworks are a popular way to create highly scalable systems. Although WSGI isn't explicitly designed to support them, putting a WSGI application in an asynchronous server isn't totally foolish: many WSGI applications won't be doing anything that can block.

(Technically disk IO can block, but Python on Unix doesn't have any way to do asynchronous disk IO without using threads.)

However, there is one serious fly in the ointment: the WSGI spec requires a synchronous interface for reading the HTTP request body. You get it from wsgi.input, which is specified to be a file-like object.

The spec suggests one way around this: the WSGI server can read the request body from the network (doing so asynchronously) and buffer it all up before invoking the WSGI application. I'm not very fond of this because it makes defending against certain sorts of denial of service attacks much more difficult, as the WSGI server has no idea what the size and time limits of the WSGI application are.

(For example, DWiki rejects all POSTs over 64K without even trying to read them.)

This may seem nit-picky, but building resilient servers is already hard enough that I'm nervous about adding more obstacles.

This is one of those situations when continuations or coroutines would be pretty handy; the wsgi.input object could use one or the other to put the entire WSGI application to sleep until more network input showed up. (Python's yield-based coroutines aren't good enough because they only work with direct function calls; the wsgi.input.read() method function can't use yield to pop all the way back to the WSGI server.)

(I don't fault WSGI for not working easily in asynchronous servers; it's hard to design general interfaces that do, and they're not very natural for synchronous servers. WSGI is sensibly designed for the relatively common case.)

python/AsynchronousWSGI written at 01:28:27; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.