2008-07-31
SSL's identity problem
One of the many problems of SSL, especially on the web, is that it gets its idea of identity wrong (in practice, in actual implementations). SSL's version of identity is all tied to abstruse X.509 things, all CN and O and OU and so on, but on the real Internet, users don't think of identity that way; they think of identity as websites, or more exactly of some personal label that they have for the entity that stands behind a website.
This matters because a great deal of SSL's claimed security is based on users understanding the identity of what they're talking to. If they don't, SSL cannot possibly create trust or protect against even moderately clever impersonation attacks, and indeed that is what we see all the time.
(For example, consider the various IDNA homograph attacks. If users actually used the SSL/X.509 idea of identity they would not be fooled in the least, but they don't; they look at the domain name, and the IDNA homograph attacks created lookalike domain names through Unicode tricks.)
Fortunately this is mostly a matter of how applications present SSL identity information, so it can be mostly solved in applications. Roughly speaking, we want to be able to tell users 'this website is the same people as www.microsoft.com' in some useful way. Given IDNA homograph attacks we can't tell the user literally this, but we can get around that by letting the user create tags for organizations and showing those tags as the 'this is part of' identity. (This is more useful for the user anyways.)
(Mechanically we can use the SSL certificate identity information to compare a new certificate's X.509 organizational information to a known certificate for the organization captured when the user made the tag.)
Note that you cannot let organizations assert tags for themselves in certificates, however convenient it would be, because then you are back to both IDNA homograph attacks and the 'we issued a certificate to Microsoft, but it wasn't the right Microsoft' problem.
2008-07-18
The advantage of blog comments
The advantage of having blog comments is that they are the easiest way for people to, well, make a comment on an entry. The main alternative is email, but blog comments are significantly easier than email in this modern age of spam and other problems, where not only is it difficult to find the blog author's email address (because if it was easy to find the spammers would find it), but it's difficult to know if you can trust them with your email address.
(While writing an entry of your own can be easy, my feeling is that it's not an effective way of commenting because there's no good automatic way of bringing it to the attention of the original author. In theory trackback would solve this problem, but in practice it has drowned in spam.)
When you make things more difficult, fewer people care to go through the effort, especially first time people, and as a result you'll get fewer comments. Mind you, sometimes this is a desirable state of affairs; there are drawbacks to comments, especially a lot of comments, and that's ignoring the entire issue of comment spam.
(Note that someone who 'comments' regularly has less problems; they will know your email address or other way of getting your attention (or not care), they will have already decided to trust you with their email address, and so on.)
It is my guess that you will not necessarily get a better class of comments by making commenting harder; you may even get a worse one overall. The problem is that you're not selecting for people who have something good to say, you're selecting for people who care enough, including people who have a pet cause that they will only be too happy to tell you about. (On the other hand, it is generally easier to filter such people out of email and other ways of attracting your attention than it is to filter out their blog comments, and this effect may not kick in until you are fairly well known.)
(Obligatory attribution darnit: this entry was inspired by this, which started me thinking about the general issue.)
2008-07-02
Why reverse proxies are good for big web applications
One of the things that you can do to make a complex web application perform better is to put a reverse proxy in front of it. On the surface this seems counterintuitive, since you're adding an extra layer of software that does nothing but pass the traffic back and forth.
What's going on is simple: buffering. Namely, it's better to buffer replies in the reverse proxy than to 'buffer' them in your application itself.
Your web application uses a certain amount of memory in the process of handling a request and sending out the reply. Most of the time this is more memory than the size of the generated page and most of it stays tied down until your application has sent the last of the reply and gets to tear down all the data structures associated with the request. Thus you are much better off offloading the reply page into the reverse proxy's memory, where it will use only a tiny bit more memory than the size of the page itself, and having your application immediately tear down and throw away those resources.
This is compounded by the effects of talking to slow clients, which hold their connection open for a relatively long time as they very slowly accept data from you and thus hold down their per-connection resources on your end for a long time. It's much better for you if this is some small data structures in a reverse proxy rather than the full memory that your application required to answer their request.
You can to some extent work around this by making sure that your web application uses as little memory as possible, but you're unlikely to get as lean as a good reverse proxy; modern web environments just have too much intrinsic overhead. (I'm not talking about frameworks as such; consider the base memory usage of another thread or process in your favorite programming language. I suspect that Java comes off best here. The base language overhead also limits how much you can gain from imitating the reverse proxy approach in your application by tearing down as much as possible once you're ready to start sending out your reply.)
(This entry was inspired by reading this, which is on a related issue.)