Some numbers on our inbound and outbound TLS usage in SMTP
As a result of POODLE,
it's suddenly rather interesting to find out the volume of SSLv3
usage that you're seeing. Fortunately for us, Exim directly logs
the SSL/TLS protocol version in a relatively easy to search for
format; it's recorded as the '
X=...' parameter for both inbound
and outbound email. So here's some statistics, first from our external
MX gateway for inbound messages and then from our other servers for
Over the past 90 days, we've received roughly 1.17 million external email messages. 389,000 of them were received with some version of SSL/TLS. Unfortunately our external mail gateway currently only supports up to TLS 1.0, so the only split I can report is that only 130 of these messages were received using SSLv3 instead of TLS 1.0. 130 messages is low enough for me to examine the sources by hand; the only particularly interesting and eyebrow-raising ones were a couple of servers at a US university and a .nl ISP.
(I'm a little bit surprised that our Exim doesn't support higher TLS versions, to be honest. We're using Exim on Ubuntu 12.04, which I would have thought would support something more than just TLS 1.0.)
On our user mail submission machine, we've delivered to 167,000 remote addresses over the past 90 days. Almost all of them, 158,000, were done with SSL/TLS. Only three of them used SSLv3 and they were all to the same destination; everything else was TLS 1.0.
(It turns out that very few of our user submitted messages were received with TLS, only 0.9%. This rather surprises me but maybe many IMAP programs default to not using TLS even if the submission server offers it. All of these small number of submissions used TLS 1.0, as I'd hope.)
Given that our Exim version only supports TLS 1.0, these numbers are more boring than I was hoping they'd be when I started writing this entry. That's how it goes sometimes; the research process can be disappointing as well as educating.
(I did verify that our SMTP servers really only do support up to TLS 1.0 and it's not just that no one asked for a higher version than that.)
One set of numbers I'd like to get for our inbound email is how TLS usage correlates with spam score. Unfortunately our inbound mail setup makes it basically impossible to correlate the bits together, as spam scoring is done well after TLS information is readily available.
Sidebar: these numbers don't quite mean what you might think
I've talked about inbound message deliveries and outbound destination
addresses here because that's what Exim logs information about, but
of course what is really encrypted is connections. One (encrypted)
connection may deliver multiple inbound messages and certainly may
be handed multiple
RCPT TO addresses in the same conversation.
I've also made no attempt to aggregate this by source or destination,
so very popular sources or destinations (like, say, Gmail) will
influence these numbers quite a lot.
All of this means that this sort of numbers can't be taken as an indication of how many sources or destinations do TLS with us. All I can talk about is message flows.
(I can't even talk about how many outgoing messages are completely protected by TLS, because to do that I'd have to work out how many messages had no non-TLS deliveries. This is probably possible with Exim logs, but it's more work than I'm interested in doing right now. Clearly what I need is some sort of easy to use Exim log aggregator that will group all log messages for a given email message together and then let me do relatively sophisticated queries on the result.)
Revisiting Python's string concatenation optimization
Back in Python 2.4, CPython introduced an optimization for string concatenation that was designed to reduce memory churn in this operation and I got curious enough about this to examine it in some detail. Python 2.4 is a long time ago and I recently was prompted to wonder what had changed since then, if anything, in both Python 2 and Python 3.
To quickly summarize my earlier entry,
CPython only optimizes string concatenations by attempting to grow
the left side in place instead of making a new string and copying
everything. It can only do this if the left side string only has
(or clearly will have) a reference count of one, because otherwise
it's breaking the promise that strings are immutable. Generally
this requires code of the form '
avar = avar + ...' or '
As of Python 2.7.8, things have changed only slightly. In particular
concatenation of Unicode strings is still not optimized; this
remains a byte string only optimization. For byte strings there are two
cases. Strings under somewhat less than 512 bytes can sometimes be grown
in place by a few bytes, depending on their exact sizes. Strings over
that can be grown if the system
realloc() can find empty space after
(As a trivial root, CPython also optimizes concatenating an empty string to something by just returning the other string with its reference count increased.)
In Python 3, things are more complicated but the good news is that
this optimization does work on Unicode strings. Python 3.3+ has a
complex implementation of (Unicode) strings, but it does attempt
to do in-place resizing on them under appropriate circumstances.
The first complication is that internally Python 3 has a hierarchy
of Unicode string storage and you can't do an in-place concatenation
of a more complex sort of Unicode string into a less complex one.
Once you have compatible strings in this sense, in terms of byte
sizes the relevant sizes are the same as for Python 2.7.8; Unicode
string objects that are less than 512 bytes can sometimes be grown
by a few bytes while ones larger than that are at the mercy of the
realloc(). However, how many bytes a Unicode string takes
up depends on what sort of string storage it is using, which I think
mostly depends on how big your Unicode characters are (see this
section of the Python 3.3 release notes and PEP 393 for the gory details).
So my overall conclusion remains as before; this optimization is
chancy and should not be counted on. If you are doing repeated
concatenation you're almost certainly better off using
on a list; if you think you have a situation that's otherwise, you
should benchmark it.
(In Python 3, the place to start is
Objects/unicodeobject.c. You'll probably also want to read
Include/unicodeobject.h and PEP 393 to understand this, and
then see Objects/obmalloc.c for the small object allocator.)
Sidebar: What the funny 512 byte breakpoint is about
Current versions of CPython 2 and 3 allocate 'small' objects using an internal allocator that I think is basically a slab allocator. This allocator is used for all overall objects that are 512 bytes or less and it rounds object size up to the next 8-byte boundary. This means that if you ask for, say, a 41-byte object you actually get one that can hold up to 48 bytes and thus can be 'grown' in place up to this size.