I want opportunistic, identity-less encryption on the Internet
The tech news has recently been full of reports that some ISPs are modifying SMTP conversations passing through them in order to remove encryption, or more specifically to remove a note that the server is offering it (this goes by the name of 'STARTTLS'). Some people are up in arms over it; others are saying that this is no big deal because it's not as if TLS on SMTP conversations means much in the first place. This latter viewpoint may raise some eyebrows. The reason why TLS SMTP means less than you think is that essentially no clients require signed and verified TLS certificates from the SMTP server; instead they'll almost always accept any random TLS certificate. In one set of jargon, this is called 'opportunistic' encryption; you could also call it 'unauthenticated' encryption. Its drawback is that this lack of authentication of the server means that the server could be anyone, including an attacker playing man in the middle to read your 'TLS encrypted' email.
This general issue is a long-standing dispute in Internet cryptography. On one side are people who say that opportunistic encryption is better than nothing; on the other side are people who say that it's effectively nothing because of its vulnerability to MITM attacks (and that it has other bad side effects, like causing people to think that they're more secure than they are). Once upon a time I might have been reasonably sympathetic to the second view, but these days I have come completely around to the first view.
As far as the second view goes: yes, there are many places that are perfectly happy to perform MITM attacks on you at the drop of a hat. Most such places generally don't call them MITM attacks, of course; instead they're 'captive portals' and so on. Those ISPs that are stripping STARTTLS markers are effectively doing a MITM attack. But in practice, in the real world, this is not equivalent to having no encryption at all. The big difference between opportunistic encryption and no encryption is that opportunistic encryption completely defeats passive monitoring. And in the real world, passive monitoring is now pervasive.
That is why I want as much opportunistic encryption as possible; I want that pervasive passive monitoring to get as little as possible. Sure, in theory the opportunistic encryption could be MITMd. In practice no significant amount of it is likely to be, because the resources required to do so go up much faster than with passive monitoring (and in some situations so do the chances of getting noticed and caught). Opportunistic encryption is not theoretically clean but it is practically useful, and it is much, much easier to design in and set up than authenticated encryption is.
(And this is why ISPs stripping STARTTLS matters quite a bit.)
A wish: setting Python 3 to do no implicit Unicode conversions
In light of the lurking Unicode conversion issues in my DWiki port to Python 3, one of the things I've realized I would like in Python 3 is some way to turn off all of the implicit conversions to and from Unicode that Python 3 currently does when it talks to the outside world.
The goal here is the obvious one: since any implicit conversion is a place where I need to consider how to handle errors, character encodings, and so on, making them either raise errors or produce bytestrings would allow me to find them all (and to force me to handle things explicitly). Right now many implicit conversions can sail quietly past because they're only having to deal with valid input or simple output, only to blow up in my face later.
(Yes, in a greenfield project you would be paying close attention to all places where you deal with the outside world. Except of course for the ones that you overlook because you don't think about them and they just work. DWiki is not in any way a greenfield project and in Python 2 it arrogantly doesn't use Unicode at all.)
It's possible that you can fake this by setting your (Unix) character encoding to either an existing encoding that is going to blow up on utf-8 input and output (including plain ASCII) or to a new Python encoding that always errors out. However this gets me down into the swamps of default Python encodings and how to change them, which I'm not sure I want to venture into. I'd like either an officially supported feature or an easy hack. I suspect that I'm dreaming on the former.
(I suspect that there are currently places in Python 3 that always both always perform a conversion and don't provide an API to set the character encoding for the conversion. Such places are an obvious problem for an official 'conversion always produces errors' setting.)
Why I don't have a real profile picture anywhere
Recently I decided that I needed a non-default icon aka profile picture for my Twitter account. Although I have pictures of myself, I never considered using one; it's not something that I do. Mostly I don't set profile pictures on websites that ask for them and if I do, it's never actually a picture of me.
Part of this habit is certainly that I don't feel like giving nosy websites that much help (and they're almost all nosy). Sure, there are pictures of me out on the Internet and they can be found through search engines, but they don't actually come helpfully confirmed as me (and in fact one of the top results right now is someone else). Places like Facebook and Twitter and so on are already trying very hard to harvest my information and I don't feel like giving them any more than the very minimum. For a long time that was all that I needed and all of the reason that I had.
These days I have another reason for refusing to provide a real picture, one involving a more abstract principle than just a reflexive habit towards 'none of your business' privacy. Put simply, I don't put up a profile picture because I've become conscious that I could do so safely, without fear of consequences due to people becoming aware of what I look like. Seeing my picture will not make people who interact with me think any less of me and the views I express. It won't lead to dismissals or insults or even threats. It won't expose me to increased risks in real life because people will know what I look like if they want to find me.
All of this sounds very routine, but there are plenty of people on the Internet for whom this is at least not a sure thing (and thus something that they have to consider consciously every time they make this choice) or even very much not true. These people don't have my freedom to casually expose my face and my name if I feel like it, with no greater consideration than a casual dislike of giving out my information. They have much bigger, much more serious worries about the whole thing, worries that I have the privilege of not even thinking about almost all of the time.
By the way, I don't think I'm accomplishing anything in particular by not using a real picture of myself now that I'm conscious of this issue. It's just a privilege that I no longer feel like taking advantage of, for my own quixotic reasons.
(You might reasonably ask 'what about using your real name?'. The honest answer there is that I am terrible with names and that particular ship sailed a very long time ago, back in the days before people were wary about littering their name around every corner of the Internet.)
PS: One obvious catalyst for me becoming more aware of this issue was the Google+ 'real names' policy and the huge controversy over it, with plenty of people giving lots of excellent arguments about why people had excellent reasons not to give out their real names (see eg the Wikipedia entry if you haven't already heard plenty about this).
What it took to get DWiki running under Python 3
For quixotic reasons I recently decided to see how far I could get with porting DWiki (the code behind this blog) to Python 3 before I ran out of either patience or enthusiasm. I've gotten much further than I expected; at this point I'm far enough that it can handle this entire site when running under Python's builtin basic HTTP server, rendering the HTML exactly the same as the Python 2 version does.
Getting this far basically took three steps. The largest step was
updating the code to modern Python 2,
because Python 3 doesn't accept various bits of old syntax. After
I'd done that, I ran
2to3 over the codebase to do a bunch of
mechanical substitutions, mostly rewriting
All of this sounds great, but the reality is that DWiki is only limping along under Python 3 and this is exactly because of the Unicode issue. Closely related to this is that I have not revised my WSGI code for any changes in the Python 3 version of WSGI (I'm sure there must be some, just because of character encoding issues). Doing a real Python 3 port of DWiki would require dealing with this, which means going through everywhere that DWiki talks to the outside world (for file IO, for logging, and for reading and replying to HTTP requests), figuring out where the conversion boundary is between Unicode and bytestrings, what character encoding I need to use and how to recognize this, and finally what to do about encoding and decoding errors. Complicating this is that some of these encoding boundaries are further upstream than you might think. Two closely related cases I've run into so far is that DWiki computes the ETag and Content-Length for the HTTP reply itself, and for obvious reasons both of these must be calculated against the encoded bytestring version of the content body instead of its original Unicode version. This happens relatively far inside my code, not right at the boundary between WSGI and me.
(Another interesting case is encoding URLs that have non-ASCII characters in them, for example from a page with a name that has Unicode characters in it. Such URLs can get encoded both in HTML and in the headers of redirects, and need to be decoded at some point on the way in, where I probably need to %-decode to a bytestring and then decode that bytestring to a Unicode string.)
Handling encoding and decoding errors are a real concern of mine
for a production quality version of DWiki in Python 3. The problem
is that most input these days is well behaved, so you can go quite
a while before someone sends you illegal UTF-8 in headers, URLs,
POST bodies (or for that matter sends you something in another
character set). This handily disguises failures to handle encoding
and decoding problems, since things work almost all the time. And
Python 3 has a lot of places with implicit conversions.
That these Unicode issues exist doesn't surprise me. Rather the reverse; dealing with Unicode has always been the thing that I thought would be hardest about any DWiki port to Python 3. I am pleasantly surprised by how few code changes were required to get to this point, as I was expecting much more code changes (and for them to be much more difficult to make, I think because at some point I'd got the impression that 2to3 wasn't very well regarded).
Given the depths of the Unicode swamps here, I'm not sure that I'll go much further with a Python 3 version of DWiki than I already have. But, as mentioned, it is both nice and surprising to me that I could get this far with this little effort. The basics of porting to Python 3 are clearly a lot less work than I was afraid of.
NFS hard mounts versus soft mounts
On most Unix systems NFS mounts come in your choice of two flavours, hard or soft. The Linux nfs manpage actually has a very good description of the difference; the short summary is that a hard NFS mount will keep trying NFS operations endlessly until the server responds while a soft NFS mount will give up and return errors after a while.
You can find people with very divergent opinions about which is better (cf, 2). My opinion is fairly strongly negative about soft mounts. The problem is that it is routine for a loaded NFS server to not respond to client requests within the client timeout interval because the timeout is not for the NFS server to receive the request, it's for the server to fully process it. As you might imagine, a server under heavy IO and network load may not be able to finish your disk IO for some time, especially if it's write IO. This makes NFS timeouts that would trigger soft NFS mount errors a relatively routine event in many real world environments.
(On Linux, any time a client reports 'nfs: server X not responding, still trying' that would be an IO error on a soft NFS mount. In our fileserver environment, some of these happen nearly every day.)
Many Unix programs do not really expect their IO to fail. Even programs that do notice IO errors often don't and can't do anything more than print an error message and perhaps abort. This is not a helpful response to transient errors, but then Unix programs are generally not really designed for a world with routine transient IO errors. Even when programs report the situation, users may not notice or may not be prepared to do very much except, perhaps, retry the operation.
(Write errors are especially dangerous because they can easily cause you to permanently lose data, but even read errors will cause you plenty of heartburn.)
Soft NFS mounts primarily make sense when you have some system that absolutely must remain responsive and cannot delay for too long for any reason. In this case a random but potentially very long kernel imposed delay is a really bad thing and you'd rather have the operation error out entirely so that your user level code can take action and at least respond in some way. Some NFS clients (or just specific NFS mounts) are only used in this way, for a custom system, and are not exposed to general use and general users.
(IO to NFS hard mounts can still be interrupted if you've sensibly
mounted them with the
intr option. It just requires an explicit
decision at user level that the operation should be aborted, instead
of the kernel deciding that all operations that have taken 'too
long' should be aborted.)
PS: My bias here is that I've always been involved in running general use NFS clients, ones where random people will be using the NFS mounts for random and varied things with random and varied programs of very varied quality. This is basically a worst case for NFS soft mounts.
What you're saying when you tell people to send in patches
Exerpted from a comment on my entry about my problem with reporting CentOS bugs:
You could not be more wrong about your assumptions:
- Bugs.centos.org is community help .. meaning that our users and volunteer QA team answer questions there. Not only should you report bugs there .. you should also find and fix, the report the fix there. That is how open source works. You should fix the problem, report the fix to bugs.centos.org and bugzilla.redhat.com if you can .. I mean, you are getting the software for free, right?
When I read things like this, where people tell other people 'it's open source, you should be finding and submitting patches', this is how I expect those other people are reading it. Whether you realize it or not and whether you intend it or not, telling people 'submit a patch' is actually telling people 'go away, you annoying thing'.
(It's also sending the message that your project is only really for developers, who are the people who can actually come up with those patches. Mere mortals need not apply. For that matter, developers who have other things to work on need not apply either.)
Sometimes, occasionally, this is an appropriate thing to say. If someone is showing up in your bug report system to demand that you fix a bug (or is otherwise complaining loudly that an open source community is failing to do so), sure, go ahead and tell them to go away and shut up. Mind you, telling them a softball version of 'patch or GTFO' is kind of passive-aggressive; perhaps you could be more straightforward and just say 'we're not going to fix this, if you need a fix badly enough you'll need to do it yourself'.
But as a general reply? No. As a general reply or a general suggestion it's both rude and a bridge burner. It also hangs a kind of smug open source arrogance out for bystanders to see and take note of.
Oh, as a side note, saying this sort of thing is also kind of insulting in that it suggests (by implication) that someone with the capacity to create a fix has not realized that gosh, they could do it. Of course, there are people who don't think that they're good enough to create fixes or lack the confidence to do so and who need encouragement, but such gentle encouragement is not delivered in anything even vaguely close to this 'send patches' manner (any more than encouragement to submit bug reports is, cf the comments on this entry).
PS: this is one of the rare entries when I will say the following directly and explicitly: if you're thinking of being deliberately rude in a comment, either on this entry or elsewhere, go away. Enabling and hosting rudeness is not anywhere near why I have comments here and rude comments may be subject to summary removal if I am angry enough.
Sidebar: the toxic nature of 'that is how open source works'
I want to point the following out explicitly. If, to quote the comment, 'not only should you report bugs there .. you should also find and fix, the report the fix there. That is how open source works' is actually how open source works then the direct corollary is that open source is only really for developers, who are the only people who can actually find the source of bugs and fix them. Everyone else is a hanger-on and camp follower.
To put it mildly, I think that a lot of people in the open source world would strongly disagree with the view that their open source work is only or primarily for developers.
(I can't call this view a toxic one, although I want to. It's certainly toxic for wide use of open source, but if your view is 'open source is for developers' then you've already decided that you don't care about a wide use of open source.)
Porting to Python 3 by updating to modern Python 2
For quixotic reasons I decided to take a shot at porting DWiki to Python 3 just to see how difficult and annoying it would be and how far I could get. One of the surprising things about the process has been that a great deal of porting to Python 3 has been less about porting the code and more about modernizing it to current Python 2 standards.
DWiki is what is now a pretty old codebase (as you might guess) and even when it was new it wasn't written with the latest Python idioms for various reasons, including that I started with Python back in the Python 1.5 era. As a result it contained a number of long obsolete idioms that are very much not supported in Python 3 and had to be changed. Once the dust settled it turned out that modernizing these idioms was most (although not all) of what was needed to make DWiki at least start up under Python 3.
At this point you might be wondering just what ancient idioms I was still using. I'm glad you asked. DWiki was doing all of these:
raise EXCEPTION, STR' instead of '
raise E(STR)'. I have no real excuse here; I'm sure this was considered obsolete even when I started writing DWiki.
except CLS, VAR:' instead of '
except CLS as VAR:', which I think is at least less ancient than my
- using comparison functions in
reverse=True. Switching made things clearer.
- dividing two integers with '
/' and expecting the result to be an integer. In Python 3 this is an exact float instead, which caused an interesting bug when I used the result as an (integer) counter. Using '
//' explicitly is better and is needed in Python 3.
I consider this modernization of the Python 2 codebase to be a good thing. Even if I never do anything with a Python 3 version of DWiki, updating to the current best practice idioms is an improvement of the code (especially since it's public and I'd like it to not be too embarrassing). I'm glad that trying out a Python 3 port has pushed me into doing this; it really has been overdue.
(Another gotcha that Python 3 exposed is that in at least one place
I was assuming that '
None > 0' was a valid comparison to make and
False. This works in Python 2 but it's not exactly a
good idea and fixing the code to explicitly check for
None is a
good cleanup. Since this sort of stuff can only really be checked
dynamically there may be other spots that do this.)
(Probably) Why Bash imports functions from the environment
In the wake of the Shellshock issues, a lot of people started asking why Bash even had a feature to import functions from the environment. The obvious answer is to allow subshell Bashes to inherit functions from parent shells. Now, you can come up with some clever uses for this feature (eg to pass very complex options down from parents to children), but as it happens I have my own views about why this feature probably originally came to exist.
Let us rewind to a time very long ago, like 1989 (when this feature was
introduced). In 1989, Unix computers were slow. Very slow. They were
slow to read files, especially if you might be reading your files over
the network from a congested NFS server, and they were slow to parse and
process files once they were loaded. This was the era in which shells
were importing more and more commands as builtins, because not having
to load and execute programs for things like
test could significantly
speed up your life. A similar logic drove the use of shell functions
instead of shell scripts; shell functions were already resident and
didn't require the overhead of starting a new shell and so on and so
So there you are, with your environment all set up in Bash and you
want to start an interactive subshell (from inside your editor, as
screen window, starting a new
xterm, or any number of
other ways). Bash supports a per-shell startup file in
so you could define all your shell functions in it and be done. But
if you did this, your new subshell would have to open and read and
parse and process your
.bashrc. Slowly. In fact every new subshell
would have to do this and on a slow system the idea of cutting out
almost all of this overhead is very attractive (doing so really
will make your new subshell start faster).
Bash already exports and imports plain environment variables, but those
aren't all you might define in your
.bashrc; you might also define
shell functions. If a subshell could be passed shell functions from
the environment, you could bypass that expensive read of
pre-setting the entire environment in your initial shell and then just
having them inherit it all. On small, congested 1989 era hardware (and
even for years afterwards) you could get a nice speed boost here.
(This speed boost was especially important because Bash was already a fairly big and thus slow shell by 1989 standards.)
By the way, importing shell functions from the environment on startup
is such a good idea that it was implemented at least twice; once
in Bash and once in Tom Duff's
rc shell for Plan 9.
(I don't know for sure which one was first but I suspect it was
The weakness of doing authentication over a side channel
Yesterday I mentioned our method of authenticating NFS client hosts; fundamentally it operates by every so often verifying that the client host knows a secret. Suppose that we had a slightly improved version of this, where the NFS fileserver holds an authenticated TCP connection open with the client and periodically exchanges authenticated and encrypted packets with it; the simple version of this would just be a SSH connection with SSH level keepalives. Is this a reasonably secure system or is it attackable?
(A system without a continuous authentication connection is trivially attackable; get the real client to authenticate once, force it off the network, and replace it with your imposter client.)
Unfortunately, yes it is. Take your attack host with two network interfaces and insert it as a bridge between a valid client and the client's normal network. Now set your host to pass SSH traffic through the bridge to the valid client (and back out) but to intercept and generate its own NFS traffic. We will now authenticate the valid client but take NFS requests from your imposter client, and the authentication channel will stay perfectly live while you're doing this.
As far as I can tell this is a fundamental weakness of doing authentication over a side channel instead of the channel that your main communication is flowing over. If the authentication is not strongly connected with the actual real conversation, an attacker can peel the two apart and pass the authentication to a valid client while handling the real conversation itself. For full security, the authentication should be an intrinsic and inseparable part of the main communication.
(There are hacks you can try if you're stuck with separate channels, like having the client observe signs of the main protocol in action and report them back over the authentication channel. If this doesn't match the server's view of the client's activity, something's up.)
PS: I'm sure this is well known in the security and protocol design community. I'm writing it down for myself, because I want to remember the logic of this after I worked it out in my head.
Hassles with getting our NFS mount authentication working on Linux
Our existing Solaris NFS fileservers have a custom NFS mount authentication method to do relatively strong authentication of a machine's identity before we allow it to establish a NFS mount from us. For various reasons we've started looking at doing NFS service from Linux machines and so we need to implement some version of our NFS mount authentication for them (ideally one that looks exactly the same from the client side).
Our existing Solaris mechanism
uses a NSS netgroup module that does the authentication as part of
checking netgroup membership. Given that Linux has NSS modules and
/etc/nsswitch.conf and so on, I figured that this would be the
easiest and most natural way to implement our system on Linux.
Unfortunately this ran into the roadblock that you
can't write a NSS module that implements netgroups without using
internal glibc headers that define an internal glibc
your functions must manipulate. In effect you can't write NSS
netgroup modules at all.
(Since this is an internal header glibc is free to change the structure at any time and thereby make your existing compiled module silently binary incompatible.)
This leaves me trying to figure out the best alternate approach. Right now I can think of three:
mountditself, since we have the source code. The two drawbacks to this is that we have to figure out where (and how) to modify the
mountdsource and then we have to keep our hack up to date as new versions of
mountdare released. This implies we'd be replacing a standard Ubuntu or CentOS package with a local version, which doesn't make me really happy.
- Switch to IPSec for clients that need NFS mount authentication.
The obvious drawbacks are that this is likely to be (much) slower
and we have to get IPSec working, probably in its more complex form
- Use a firewall based mechanism to block access to the server's NFS and mountd daemons until a client has been authenticated (this might be a good use for ipsets). The drawback of this approach is that it requires the most custom software on the server and probably the client. I'd like some way to intercept mount requests, trigger our actions, and then let the mount request go on, but I don't know if that's even possible (or at least possible without deep netfilter hacks).
I plan to ask the linux-nfs mailing list if they have any clever ideas that I'm missing or any opinions on the best approach to do this, although I suspect that they'll tell me to use IPSec (it's the obvious solution apart from the performance hit).
The whole situation with glibc mis-managing NSS netgroup support irritates me. NSS is theoretically supposed to enable people to add their own modules but glibc has crippled this part of it for no good reason and made our life significantly more painful in the process. I'd report this as a bug to the glibc people except that the idea of reporting any 'features missing or misimplemented' issue to glibc is utterly laughable; not only would my problems not get solved any time soon even in the best case, but historically it has been an excellent way to get abused.
(While there are NSS modules outside of glibc I don't believe that any of them support netgroups, just things like users and groups and passwords. Not that glibc even really documents how to write those sorts of NSS modules either.)