Wandering Thoughts archives

2014-10-20

Some numbers on our inbound and outbound TLS usage in SMTP

As a result of POODLE, it's suddenly rather interesting to find out the volume of SSLv3 usage that you're seeing. Fortunately for us, Exim directly logs the SSL/TLS protocol version in a relatively easy to search for format; it's recorded as the 'X=...' parameter for both inbound and outbound email. So here's some statistics, first from our external MX gateway for inbound messages and then from our other servers for external deliveries.

Over the past 90 days, we've received roughly 1.17 million external email messages. 389,000 of them were received with some version of SSL/TLS. Unfortunately our external mail gateway currently only supports up to TLS 1.0, so the only split I can report is that only 130 of these messages were received using SSLv3 instead of TLS 1.0. 130 messages is low enough for me to examine the sources by hand; the only particularly interesting and eyebrow-raising ones were a couple of servers at a US university and a .nl ISP.

(I'm a little bit surprised that our Exim doesn't support higher TLS versions, to be honest. We're using Exim on Ubuntu 12.04, which I would have thought would support something more than just TLS 1.0.)

On our user mail submission machine, we've delivered to 167,000 remote addresses over the past 90 days. Almost all of them, 158,000, were done with SSL/TLS. Only three of them used SSLv3 and they were all to the same destination; everything else was TLS 1.0.

(It turns out that very few of our user submitted messages were received with TLS, only 0.9%. This rather surprises me but maybe many IMAP programs default to not using TLS even if the submission server offers it. All of these small number of submissions used TLS 1.0, as I'd hope.)

Given that our Exim version only supports TLS 1.0, these numbers are more boring than I was hoping they'd be when I started writing this entry. That's how it goes sometimes; the research process can be disappointing as well as educating.

(I did verify that our SMTP servers really only do support up to TLS 1.0 and it's not just that no one asked for a higher version than that.)

One set of numbers I'd like to get for our inbound email is how TLS usage correlates with spam score. Unfortunately our inbound mail setup makes it basically impossible to correlate the bits together, as spam scoring is done well after TLS information is readily available.

Sidebar: these numbers don't quite mean what you might think

I've talked about inbound message deliveries and outbound destination addresses here because that's what Exim logs information about, but of course what is really encrypted is connections. One (encrypted) connection may deliver multiple inbound messages and certainly may be handed multiple RCPT TO addresses in the same conversation. I've also made no attempt to aggregate this by source or destination, so very popular sources or destinations (like, say, Gmail) will influence these numbers quite a lot.

All of this means that this sort of numbers can't be taken as an indication of how many sources or destinations do TLS with us. All I can talk about is message flows.

(I can't even talk about how many outgoing messages are completely protected by TLS, because to do that I'd have to work out how many messages had no non-TLS deliveries. This is probably possible with Exim logs, but it's more work than I'm interested in doing right now. Clearly what I need is some sort of easy to use Exim log aggregator that will group all log messages for a given email message together and then let me do relatively sophisticated queries on the result.)

spam/CSLabTLSUsage2014-10 written at 23:27:51; Add Comment

Revisiting Python's string concatenation optimization

Back in Python 2.4, CPython introduced an optimization for string concatenation that was designed to reduce memory churn in this operation and I got curious enough about this to examine it in some detail. Python 2.4 is a long time ago and I recently was prompted to wonder what had changed since then, if anything, in both Python 2 and Python 3.

To quickly summarize my earlier entry, CPython only optimizes string concatenations by attempting to grow the left side in place instead of making a new string and copying everything. It can only do this if the left side string only has (or clearly will have) a reference count of one, because otherwise it's breaking the promise that strings are immutable. Generally this requires code of the form 'avar = avar + ...' or 'avar += ...'.

As of Python 2.7.8, things have changed only slightly. In particular concatenation of Unicode strings is still not optimized; this remains a byte string only optimization. For byte strings there are two cases. Strings under somewhat less than 512 bytes can sometimes be grown in place by a few bytes, depending on their exact sizes. Strings over that can be grown if the system realloc() can find empty space after them.

(As a trivial root, CPython also optimizes concatenating an empty string to something by just returning the other string with its reference count increased.)

In Python 3, things are more complicated but the good news is that this optimization does work on Unicode strings. Python 3.3+ has a complex implementation of (Unicode) strings, but it does attempt to do in-place resizing on them under appropriate circumstances. The first complication is that internally Python 3 has a hierarchy of Unicode string storage and you can't do an in-place concatenation of a more complex sort of Unicode string into a less complex one. Once you have compatible strings in this sense, in terms of byte sizes the relevant sizes are the same as for Python 2.7.8; Unicode string objects that are less than 512 bytes can sometimes be grown by a few bytes while ones larger than that are at the mercy of the system realloc(). However, how many bytes a Unicode string takes up depends on what sort of string storage it is using, which I think mostly depends on how big your Unicode characters are (see this section of the Python 3.3 release notes and PEP 393 for the gory details).

So my overall conclusion remains as before; this optimization is chancy and should not be counted on. If you are doing repeated concatenation you're almost certainly better off using .join() on a list; if you think you have a situation that's otherwise, you should benchmark it.

(In Python 3, the place to start is PyUnicode_Append() in Objects/unicodeobject.c. You'll probably also want to read Include/unicodeobject.h and PEP 393 to understand this, and then see Objects/obmalloc.c for the small object allocator.)

Sidebar: What the funny 512 byte breakpoint is about

Current versions of CPython 2 and 3 allocate 'small' objects using an internal allocator that I think is basically a slab allocator. This allocator is used for all overall objects that are 512 bytes or less and it rounds object size up to the next 8-byte boundary. This means that if you ask for, say, a 41-byte object you actually get one that can hold up to 48 bytes and thus can be 'grown' in place up to this size.

python/ExaminingStringConcatOptII written at 00:37:10; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.