2016-02-18
Two models of dealing with cookies in Firefox with addons
Recently on Twitter, Dan McDonald was looking for a Firefox cookie handling addon. I had some opinions on this but Twitter being Twitter and me being me I wasn't entirely articulate about them at the time. So here is my attempt to do it better.
There are at least two fundamental models for dealing with cookies (in Firefox and probably elsewhere). The first model is to not allow cookies into your browser session at all. You default-deny all cookies (even first party ones) and whitelist only selected sites; some of them you may accept permanent cookies from, others you may force nominally permanent cookies to become session cookies or short duration ones instead. This is the model I use in my set of extensions, enforced partly via a filtering proxy and partly via a succession of extensions over the years (first CookieSafe, currently CS Lite Mod, and apparently I'm going to need to switch to CookieShield at some point). I believe this is the dominant model of other addons, such as Cookie Monster.
The problem with this strict and narrow approach to cookie management is that there are a steadily increasing number of websites that absolutely require you to accept their cookies in order to use them (one prominent offender is Google's Blogspot). I personally don't mind this for various and sundry reasons, but I suspect that a lot of people do eventually get tired of living this way. So the second model of dealing with cookies is to accept most of them into your browser in a casual and relaxed way, but then delete them again the moment that you don't need them. All of those places that demand the ability to place cookies into your browser before they'll let you see anything get pacified (and you get to see their content), but you get rid of those extremely temporary cookies the moment you're done looking at the site. This is the model of Self-Destructing Cookies.
So the first thing you need to do when looking at Firefox cookie handling and cookie addons is to decide which model sounds more attractive to you. Do you want the low friction model where sites get to temporarily drop cookies while you look at their content, but then those cookies get unceremoniously ejected? Or do you want the strict model where you don't take the risk of any 'bad' cookies managing to get into your browser? To be clear about this, I don't think there's any general right answer; people will have different preferences and tolerances.
(And there are likely other models and intermediate steps between these two. These are just the ones I have direct exposure to.)
(Although I'm currently in the 'strict' camp and have been for a long time, I may someday switch, perhaps just to see what it's like. I admit that the idea of having my browser accept all those cookies makes me nervous, even if they're theoretically only very temporary. But that's an irrational feeling.)
2016-02-03
You aren't entitled to good errors from someone else's web app
This particular small rant starts with some tweets:
@liamosaur: Developers who respond to bad URLs with 302 redirects to a 200 page with error info instead of a proper 404 page should be shot into the sun
@_wirepair: as someone who does research for web app scanners, a million times this.
@thatcks: It sounds like web apps are exercising good security against your scanners & denying them information.
If you are scanning someone else's web application, you have absolutely no grounds to complain when it does things that you don't like. Sure, it would be convenient for you if the web app gave you all the clear, semantically transparent HTTP errors you could wish for that make your life easy, but whatever error messages it emits are almost by definition not for you. The developers of those web apps owe you exactly nothing; if anything, they owe you less than nothing. You get whatever answers they feel like giving you, because you are not their audience. If they go so far as to give you deliberately misleading and malicious HTTP replies, well, that's what you get for poking where you weren't invited.
(Google and Bing and so on may or may not be part of their audience, and if so they may give Google good errors and you not. Or they may confine their good errors to the URLs that Google is supposed to crawl.)
Good HTTP error responses (at least to the level of 404's instead of 302s to 200 pages) may serve the goals of the web app developers and their audience. Or they may not. For a user-facing web app that is not intended to be crawled by automation, 302s to selected 200 pages may be more user friendly (or simply easier) than straight up 404s. As a distant outside observer, you don't know and you have no grounds for claiming otherwise.
(There are all sorts of pragmatic and entirely rational reasons that developers might do things that you disagree with.)
It's probably the case that web app developers are better served over the long term by doing relatively proper HTTP error handling, with real 404s and so on (although I might not worry too much about the exact error codes). However this is merely a default recommendation that's intended to make the life of developers easier. It is not any sort of requirement and developers who deviate from it are not necessarily doing it wrong. They may well be making the correct decision for their environment (including ones to deliberately make your life harder).
(See also Who or what your website is for and more on HTTP errors, which comes at the general issue from another angle.)
PS: If you are scanning your own organization's web apps, with authorization, it may be worth a conversation with the developers about making the life of security people a little easier. But that's a different issue entirely; then 'our security people' are within the scope of who the web app is for.
2016-01-24
Hostile HTTPS interception on the modern web is now increasingly costly and risky
One of the things that HTTPS is vulnerable to is a state level actor armed with enough money that is willing (and able) to compromise a CA and get certificates issued for sites that they want to run a MITM attack against. This is nothing new; it is the core security problem of SSL on the web, namely that any of the hundreds of CAs that are trusted by your browser can generate a certificate for anyone.
Technically, that is still the case. All of the trusted CAs in the
world can still issue certificates for, say, gmail.com, and
quite a lot of browsers will trust those certificates. But not
Chrome. If you are an attacker and you try this against a Chrome
user, Chrome will try hard to scream bloody murder about this back
to Google (and refuse to go ahead). Then pretty soon Google's
security people will get to write another blog post and your nice
compromised CA will be lost to you (one way or another).
Chrome has been doing this for a while (this is part of how Google has gotten to write a number of blog posts about this sort of thing), but it is not alone. On the modern web, there are a steadily increasing number of things that are more or less automatically looking for and reporting bogus certificates and an increasing number of ways to block many of them from being useful to attack your site. On the one hand, many of the better things are not included in web browsers by default; on the other hand, many of the people that a state level actor is likely to be most interested in MITM'ing are exactly the sort of people who may install things like HTTPS Everywhere and enable its reporting features.
Based on what I've read from the security circles that I follow, the net effect of all of these changes is that mounting anything but an extremely carefully targeted MITM attack is almost certain to cost you the compromised CA you were able to exploit. Each compromised CA you have is good for exactly one attack, if that.
(See for example the Twitter conversation linked to here.)
This doesn't make HTTPS interception impossible, of course. CAs can still be compromised. But it means that no one is going to do this for anything except very high priority needs, which in practice makes us all safer by reducing how often it happens.
An important contributing factor to the increased chanciness of HTTPS interception is that browsers are increasingly willing to say no. There was a time when you could MITM a significant number of people with a plain old bogus certificate (no CA compromise required, just generate it on the fly in your MITM box). Those days are mostly over, especially for some popular sites, and increasingly even a real certificate from a compromised CA may not work due to things like HPKP.
2016-01-22
Browsers are increasingly willing to say no to users over HTTPS issues
One of the quiet sea changes that underpins a significant increase in the security of the modern HTTPS web is that browsers are increasingly willing to say no to users. I happen to think that this is a big change, but it's one that didn't really strike me until recently.
There was a time when the fundamental imperative of browsers was that if the user insisted enough, they could go ahead with operations that the browser was pretty sure were a bad idea; attempts to change this back in the days were met by strong pushback. The inevitable result of those decisions was that attackers who wanted to MITM people's HTTPS connections to places like Facebook could often just present a self-signed certificate generated by their MITM interceptor system and have most people accept it. When attackers couldn't do that, they could often force downgrades to unencrypted HTTP (or just stop upgrades from an initial HTTP connection to a HTTPS one); again, these mostly got accepted. People wrote impassioned security advice that boiled down to 'please don't do that' and tweaked and overhauled security warning UIs, but all of it was ultimately somewhat futile because most users just didn't care. They wanted their Facebook, thanks, and they didn't really care (or even read) beyond that.
(There are any number of rational reasons for this, including the often very high rate of false positives in security alerts.)
Over the past few years that has changed. Yes, most of the changes are opt-in on the part of websites, using things like HSTS and HPKP, but the really big sea changes is browsers mostly do not let users override the website settings. Instead, browsers are now willing to hard-fail connections because of HSTS or HPKP settings even if this angers users because they can't get to Facebook or wherever. Yes, browsers have a defense in that the site told them to do this, but in the past I'm not sure this would have cut enough ice to be accepted by browser developers.
(In the process browsers are now willing to let sites commit HSTS or HPKP suicide, with very little chance to recover from eg key loss or inability to offer HTTPS for a while for some reason.)
Obviously related to this is the increasing willingness of browsers to refuse SSL ciphers and so on that are now considered too weak, again pretty much without user overrides. Given that browsers used to accept pretty much any old SSL crap in the name of backwards compatibility, this is itself a welcome change.
(Despite my past views, I think that browsers are making the right overall choice here even if it's probably going to cause me heartburn sooner or later. I previously threw other people under the HTTPS bus in the name of the greater good, so it's only fair that I get thrown under it too sooner or later, and it behooves me to take it with good grace.)
2015-12-21
The Apache mod_qos module worked for us
We run a shared web server where our users can run CGIs and so on in response to incoming HTTP requests. This presents several obvious potential problems, and recently we ran into one of them. A user had a quite slow CGI (although not a CPU-consuming one) and, as sometimes happens when your users are computer scientists sharing their hot research results, it got linked to from a popular place and a result the requests for it just poured in. In not very much time at all, the slow CGIs were using up all of Apache's request slots and nothing else could get a request in edgewise.
On the one hand, I didn't want to just turn off the user's CGI entirely. It's actively great that lots of people are interested in people's research results and we'd be serving our users very badly if we shut that down whenever it happened. On the other hand, it's a shared web server with other important things hosted on it, so I needed to keep the web server functioning in general. What I needed was something to limit how many concurrent requests for this particular CGI. Fortunately there is a (third-party) Apache module that can do this, ``mod_qos.
Mod_qos has a whole lot of configuration settings, most of which I didn't try to play with. What I used (and what worked) is the simple QS_LocRequestLimitMatch directive:
QS_LocRequestLimitMatch "^/(~|%7E|%7e)USER/.*$" NNN
Rereading the documentation now suggests that I could have used the simpler QS_LocRequestLimit directive, but at the time it wasn't clear to me if this was the right choice. I used the three-part match for '~' because the mod_qos specifically says that the directive applies to the unparsed URL and I wasn't sure quite how unparsed it meant.
(At the time I was in a mood to be basically sure with one change, because it was happening on a Saturday.)
Given that this situation may come up in the future, it would be sort of nice if we could set up generic per-URL resource limits or something. The module has the QS_LocRequestLimitDefault directive, but I don't know if it sets a global limit or a per-URL one. I'd have to experiment with this.
(In general, mod_qos seems like the kind of thing I should experiment with. It's potentially useful but fairly complex, and the documentation is clearly written for people who are already somewhat familiar with various terms of art and so on.)
2015-12-07
What I like web templating languages for
In reaction to my entry on the problems with creating new (web) template languages, some of the people on Hacker News felt that the time of web templating languages was over due to various improvements in full programming languages. As it happens, I disagree. While there are situations where I would rather create HTML in code instead of through a template system, there are also plenty of situations where I would much rather use a template system.
In my opinion, templates work well in two interrelated situations. The first is when there is (much) more HTML than there is templating syntax to generate your dynamic content. One case this happens is if you're wrapping a form or some displayed (variable) information in a bunch of explanatory text. If you did this in the application's code, what you'd wind up with is mostly some large strings with periodic interruptions for the actual code; in my opinion, this is neither attractive nor very easy to follow (especially if the actual text of the page is broken up into multiple sections).
(I'd expect this to be even worse in a language that forced you to assemble the HTML through structured elements; you'd have a lot of code that existed just to make a whole stream of elements with the static content and string them together, interspersed with a few bits that generated the dynamic stuff.)
The second situation is where presenting the whole text and code of a page in one place makes it clearer what is going on and how the page is formed. Here, a single template is serving as the equivalent of a single giant function, except of course you probably wouldn't write a single giant function. Good template languages help this by creating compact but clear ways of describing their dynamic portions; generally these are much smaller than you could write in actual code without a lot of complex helper functions. While single templates can get tangled and complicated, this is in a sense a good thing because it's an honest expression of how complicated you're making generating the page be. Just as big, tangled functions are a code smell that suggests something needs to be refactored, big tangled template pages are probably a suggestion that they should be split into several template variants or otherwise restructured to be clearer.
Ultimately, templating languages versus programming languages are another incarnation of the gulf between shells and scripting languages. Programming languages are optimized for writing code that happens to have some embedded text, while templating languages are optimized for writing text that happens to have some embedded code. The more you have of one relative to the other, the more you will generally tilt one way.
(It's possible to make a templating system without embedded code, where your templates simply define places that code will later manipulate to add your custom content. I used to quite like this idea in the abstract, but I've come around to feeling that it's not what I want in practice.)
2015-11-26
Some notes on Apache's suexec
We've recently been wrestling with suexec in an attempt to get it to do something that it seemed that suexec would do. As a result of that learning experience, I feel like writing down some things about suexec. You may wish to also read the official Apache documentation on suexec, but note that you may have to pay close attention to some of the things that it says (and a few things appear to be outright wrong).
Suexec has two modes:
- Running
/~<user>/...CGIs as the particular user involved. This needs no special extra configuration for suexec and simply just happens. Per-user CGIs must be located under a specific subdirectory in the user's Unix home directory, by defaultpublic_html; suexec documentation calls this subdirectory name the userdir. - Running CGIs for a virtual host as a particular user and group.
This must be configured with the
SuexecUserGroupdirective. All virtual host CGIs must be located under a specific top level directory, by default often/var/www; suexec documentation calls this directory the docroot.
(Suexec also does various ownership and permissions checks on the CGIs and the directory they are directly in. Those are beyond the scope of these notes.)
The first important thing here is that the suexec docroot and
userdir are not taken from the Apache DocumentRoot and UserDir
settings; instead, they're hard coded into suexec itself. Any time
that suexec logs errors like 'command not in docroot', the docroot
it means is not the Apache DocumentRoot you've configured. It
pretty much follows that if your Apache settings do not match the
hardcoded suexec settings, suexec will thumb its nose at you.
(Also, the only form of UserDir directive that will ever work
with suexec is 'UserDir somename'. You cannot use either 'UserDir
/some/dir' or 'UserDir /some/*/subdir' with suexec. The suexec
documentation notes this.)
The second important thing is that Apache and suexec explicitly
distinguish between the two modes based on the incoming request
itself, not the final paths involved, and these two modes are
exclusive. If you make a request for a CGI via a /~user/... URL,
the only thing that matters is if the eventual path is under the
user's home directory plus the suexec userdir. If you make a
request to a virtual host with a SuexecUserGroup directive, the
only thing that matters is if the eventual path is under the suexec
docroot. In particular, you cannot configure a virtual host for
a user, point its DocumentRoot to that user's userdir, and have
suexec run CGIs. This path would be perfectly acceptable if the
CGIs were invoked via /~user/... URLs, but when invoked for a plain
virtual host, suexec will reject these requests because the paths
aren't under its docroot.
(Mechanically, Apache prefixes the user name it passes to the suexec
binary with a ~ if it is a UserDir request. This is undocumented
behavior reverse engineered from the code, so you shouldn't count
on it.)
The third important thing is that suexec ignores symlinks in all
of this checking; it uses only the 'real' physical paths, after
symlinks have been traversed. As a result you cannot fool suexec
by, for example, putting symlinks to elsewhere under what it considers
its docroot. However it is fine for user /etc/passwd entries
to include symlinks (as we do); suexec will not
be upset by that.
Normally the suexec docroot and userdir are set when suexec
is compiled and are fixed afterwards, which obviously creates some
problems if you need something different. Debian and Ubuntu provide
a second version of suexec that can look these up at runtime from
a configuration file (this is the apache2-suexec-custom package).
Failing this, well, you'll be arranging (somehow) for all of your
virtual hosts to appear under /var/www (or at least all of the
ones that need CGIs).
(You can determine the userdir and docroot settings for your
suexec with 'suexec -V' as root. You want AP_DOC_ROOT and
AP_USERDIR_SUFFIX.)
Sidebar: what 'command not in docroot' really means
The suexec error 'command not in docroot' is actually generic and is used for both modes of requests. So what suexec means by 'docroot' here is either the actual docroot, for a virtual host request, or the user's home directory plus the userdir subdirectory, for a /~user/... request. Unfortunately you cannot tell from suexec's log messages whether it was invoked for what it thought was a user home directory request or for a virtual host request; that has to be obtained from the Apache logs.
The check is done by a simple brute force method: first, chdir()
to the CGI's directory and do a getcwd(). Then chdir() to either
the docroot or the user's home directory plus the userdir and
do another getcwd(). Compare the two directory paths and fail if
the first is not underneath the second. Because it uses getcwd(),
all symlinks involved in either path will wind up getting fully
expanded.
2015-11-16
The problems with creating a new template language
One reaction to my entry saying you shouldn't create new templating languages is to ask why this is so. My original entry was written from the perspective of someone who's actually done this so I just assumed that all of the problems with creating your own were obvious, but this is not necessarily the case. So let's run down the problems here.
When you create a new web templating system, you face a number of problems:
- You need to design the templating language. Language design is
hard and my own experience strongly suggests that you
don't want a (too) minimal design. Many of the design decisions
you make here will constrain your further steps and what can be
done with the templating system, often in ways that are not
necessarily obvious until later.
(A too complicated templating language has its own drawbacks as well, but there are tradeoffs and decisions that depend on the environment that the templating system will be used in. For example, are the people writing the templates also going to be coding the web app, so that you can move complexity from the templating language to the app itself, or are different groups doing each side of things?)
- You need to design the API for using the templating system; how
you specify and load templates, how you expand them, how you
provide data used during expansion, and so on. As part of this
you will face issues of what happens in various sorts of errors.
If your template language has loops or other potentially unbound
constructs, you're going to need to decide how you limit them (or
if you just let template coding errors cause template expansion
to run forever).
One issue that you will want to consider is how expansion strings will or won't be automatically escaped by the templating system. Not doing escaping at all has proven to be extremely dangerous, but at the same time inserting unescaped text is sometimes necessary. HTML has several different contexts that need different sorts of things escaped, then there's URLs, and you may also want your templating system to be useful for more than HTML.
- You need to actually write all of the code. I hope that by now
you see that this is much more sophisticated than just
printfexpansion of strings; we're talking about a full scale parser and interpreter of your language, which probably has conditions, loops, and so on. In the process of this (if you haven't already done so earlier), you're going to wind up dealing with character set conversion issues. - Once you've written the basic template handler there are a bunch
of efficiency issues that come into the picture. A good template
system does not reparse everything from scratch on every request,
which means both pre-parsing things and figuring out how to detect
when you need to reload changed template files off disk. Then
there's various sorts of caching for template expansion, and
perhaps you want some way to generate
ETagandLast-Modifiedinformation without running a full template expansion (and then often throwing away the result). Can you write out partial template results in chunks to hold down memory requirements, or do you have to fully generate even huge pages in memory before you can start output?(And there's the efficiency issue of simply making the code run fast. Profiling and performance tuning code takes work all by itself.)
You can write simpler templating systems that skip some or many of these considerations. Some of them are relatively unimportant at small scale (DWiki gets by without any sort of template preparsing) but others may cause you serious security problems if you neglect them. On top of that, there are any number of issues that have proven to be inobvious to people who are writing their first templating system. No matter what scale of templating system you're writing, you can expect to run into problems that you don't even initially recognize or realize that you have.
(I'm not even convinced I know how to design and write a good templating system, and I have the advantage of having done it once already.)
Using an existing templating system instead of writing your own has the great advantage that other people have already worried about and faced all of these issues. If you pick a good templating system, other people should have already invested all of the time and work to come up with good solutions (and they will continue to put effort into things like bug fixes and performance improvements). In fact they may well have solved problems you don't yet realize even exist.
All of that work saving is nice. But there's a deeper reason not to roll your own here:
You are probably not going to do as good a job as existing template systems do.
Writing a good templating system is hard work that takes a lot of specialized knowledge and skill. Unless you put a quite large amount of time into it, your new templating system is very likely to not be as nice as existing templating system. It will be incomplete and inefficient and limited and possibly buggy and problematic. This should not be surprising, since major templating systems have had a great deal of work put into them by a bunch of smart people. It would be really amazing if you could duplicate that all on your own in a relatively small amount of time.
(Of course you may have the advantage of writing a more focused and narrower templating system than those major templating systems, which tend to be quite general. My personal opinion is that you're probably not going to be making one that's narrow enough to make up all that ground.)
2015-11-11
No new web templating languages; use an existing one
Suppose, hypothetically, that you are creating a web application. Let's even suppose that it's a very small and simple one, almost an embarrassingly small one. As part of this app, you need a very little bit of something like a templating system. Not much, just a bit more than printing formatted strings. Clearly you have such a trivial situation that you can just bang together a tiny and simple mini-templating language, right?
Let me save you some time and effort: no. Don't do it. The reality is that we've reached a point in time where writing your own (web) templating language or system is basically guaranteed to be a mistake. I know, you have a trivial application and you don't want to take an external dependency, you hardly need anything, all of the existing templating systems are wrong or too heavyweight, there's a whole list of excuses. Don't accept them. Suck it up, take an external dependency, and use an existing templating system even if it's vast overkill for your problem. Your future self will thank you in a few years.
(I could almost go further than this and maybe I should, but that's another entry.)
All of this especially applies if you have an application that's needs more than a trivial templating system; I picked an extreme case because it's where the temptation can be strongest. Writing your own non-tiny templating system today is an especially masochistic exercise because even a basic one is a bunch of work and raises a moderate ton of questions that you're ill-equipped to answer (or even recognize) unless this is not your first templating system.
In hindsight, writing my own 'simple' templating system was one of the mistakes I made when I wrote DWiki (the code that powers Wandering Thoughts). It's been a very educational mistake, but unless you really want to do things the hard way for the experience I can't recommend it.
(Note that rolling your own is not a great learning experience unless you live with the result for a number of years, so that you have plenty of time to run into the lurking problems. Almost anything can look good if you write it, use it briefly, and then abandon it. Many of my painful lessons took years to smack me in the face.)
PS: This assumes that you aren't working in a new language where no one's written a decent templating system. If you are, I think that you should at least steal one of the battle-tested designs from good templating systems in other languages.
(Also, yes, a very few people have very special needs and have to write their own systems. They know who they are.)
2015-10-08
Why you (probably) want to have blog categories (and topics and more)
It started with an Eevee tweet:
like does anyone actually care about blog categories, considering you can just skim the list of titles and get a pretty good idea
It turns out that I have opinions on this, perhaps unsurprisingly. My answer is 'yes', or at least potentially yes. In fact I think there are several different levels of things that you want to have in a fully featured blog system.
The pragmatic purpose of categories is to allow people to easily follow only a subset of your blog, both through syndication feeds and perhaps other mechanisms. The more disparate the subjects you cover in your blog, the more likely people are to want a way to follow a subset of it; conversely, if you already have a narrow focus there may be no point in trying to subdivide it. Here on Wandering Thoughts I definitely have feed readers fetching category feeds for several of my categories. Of course you can choose not to provide this sort of thing in the interest of broadening the minds of your readers, but this may well cost you readers.
(Some form of restricted scope feeds are also handy if you think you may someday want to feed selected entries to, say, a specific planet-style aggregator. The importance of this depends on how many of your entries would be irrelevant to a particular aggregator; for example, I'd definitely want a category for Go if I ever wanted to be part of a hypothetical 'Planet Go'.)
Relatively broad topics are there so that readers can easily find more of your writing on specific areas of interest to them. This is the domain of 'that entry on ZFS was useful, has she written more about it?', where a reader is expanding outwards from some initial entry they've landed on. My view is that topics are less predictable and more retrospective than categories; they sort of emerge organically from your writing as you find you've written a bunch about a specific area.
(You might as well provide syndication feeds for topics too if you can support it easily, but I think it's less important than with categories. Among other things, your writing on any particular topic may be extremely sporadic or even stop.)
Finally, I definitely feel that real blog usability means that you want some mechanism that will let you create a collection of direct and visible links for things like '(strongly) related entries', 'next/previous entry in this series of entries', and the like. I don't think tags (as conventionally implemented) are the right answer for this because they're not clear enough (I discussed this in my thoughts on tags for blog entries). These sort of strong relationships between entries are your best hooks for getting new visitors to read more, so you want them and their relevance to be clearly visible in ways that you wouldn't do for categories and topics.
(For instance, for 'next/previous in series' I think you should not just mention that here's where a reader finds more but also show the entry titles.)
I've wound up feeling that to the extent that you have tags as such, they're an implementation detail that may be used to create any or all of the above. This means that your tags may be namespaced into some structured namespaces, and tags in these namespaces may be presented separately from each other (and perhaps in different ways). The sorts of namespaces you want for tags may vary between different blogs, depending on what the blog is used to write about.
(For instance, consider a blog that's used to write about TV series. You're going to want a way to clearly tie all entries that cover a specific series together and you probably want it to be clear to readers and distinct from how they find all your writing about, say, one director or writer. So you probably want not just a big generic 'Tags: ...' thing that smashes all of these separate namespaces together but instead specific 'Series: ...' and so on, and then maybe a 'Tags: ...' as well.)