2015-04-08
Your entire download infrastructure needs to use HTTPS
Let's start with something that I tweeted:
Today's security sadface: joyent's Illumos pkgsrc download page is not available over https, so all those checksums/etc could be MITMd.
Perhaps it is not obvious what's wrong here. Well, let's work backwards.
The Joyent pkgsrc bootstrap tar archive is served over plain HTTP, so a
man in the middle attacker can serve us a compromised tarball when we
use curl to fetch it. That's obvious, and the page gives us a SHA1
checksum and a PGP key to verify the tarball. But the page itself is
served over over plain HTTP, so the man in the middle attacker could
alter it too so it has the SHA1 checksum of their compromised tarball.
So surely the PGP verification will save us? No, once again we are
undone by HTTP; both the PGP key ID and the detached PGP ASCII signature
are served over HTTP, so our MITM attacker can alter the page to have a
different PGP key ID and then serve us a detached PGP ASCII signature
made with it for their compromised tarball.
(Even if retrieving the PGP key itself from the keyserver is secure, the attacker can easily insert their own key with a sufficiently good looking email address and so on. Or maybe even a fully duplicated email address and other details.)
There's a very simple rule that everyone should follow here: every step of a download process needs to be served over HTTPS. For instance, even without PGP keys et al in the picture it isn't sufficient to serve just the tarball over HTTPS, because a MITM attacker can rewrite the plaintext 'download things here' page to tell you to download the package over HTTP and then they have you. The entire chain needs to be secure (and forced that way) and from as far upstream in the process as you can manage (eg from the introductory pkgsrc landing page on down, because otherwise the attacker changes the landing page to point to a HTTP download page that they supply and so on).
Of course, having some HTTPS is better than none; it at least makes attackers work harder if they have to not just give you a different tarball than you asked for but also alter a web page in flight (but don't fool yourself that this is much more work, not with modern tools). And it's good to not rely purely on HTTPS by itself; SHA1 checksums and PGP signatures are at least cross-verification and can detect certain sorts of problems.
By the way, in case you think that this is purely theoretical, see the case of some Tor exit nodes silently patching nasty stuff into binaries fetched through them with HTTP. And I believe that there are freely available tools that will do on the fly alterations to web pages they detect you fetching over insecure wireless networks.
(I don't feel I'm unfairly picking on Joyent here because clearly they care not just about the integrity of the tarball but also its security, since they give not just a SHA1 (which might just be for an integrity check) but also a PGP key ID and a signature checking procedure.)
2015-04-06
What adblockers block
The thing about adblockers is that they don't really block ads; determining what is and isn't an ad is an AI problem and we're nowhere near solving those. So what adblockers really block is signs and patterns that designate or suggest ads. The primary patterns that adblockers can use are URLs of resources being requested (such as images and other additional content) and the surrounding HTML context of these requests (including things like CSS tags).
(Many adblockers will allow you to inspect the patterns that they use in their preferences or advanced configuration system.)
No set of heuristics and patterns can possibly be complete. So in practice adblockers only block major sources of ads, because these are the sources of ads where the work of writing rules really pays off in a reduction of ads. In other words, adblockers mostly block decent-sized ad networks and pervasive places with ads like Facebook, Google, and YouTube.
Unless someone really goes out of their way to write rules, adblockers
do not and will never block hand-crafted ads on small sites; there
are no non-AI heuristics that can reliably figure out which bits
of Jane's Fishing Information are ads and which bits aren't. Similarly,
adblockers mostly don't block the various small scale ad networks
that are active in niche areas like online webcomics, ultimately
because they haven't annoyed anyone enough to write (and update)
the blocking rules necessary.
The direct corollary of this is that even pervasive use of adblockers cannot kill advertising on the web. The only thing they can kill is mindless, computer-targeted advertising at mass scale. Such large scale advertising is attractive to a lot of people for a lot of reasons, but it is not the only advertising model for sustaining modest websites through ads.
(Adblockers can kill advertising on large sites because even if the large sites do entirely custom advertising systems, they are large enough that people will find it worthwhile to write the rules necessary.)
PS: There's an entirely separate discussion about whether adblockers can work in the long term if advertising people get determined enough. Ultimately the system of HTML, CSS, and JavaScript that displays ads on web pages is Turing-complete and so can be obfuscated in a nearly endless number of ways if website developers want to put up with the resulting complexity.
PPS: As it happens, large scale advertising networks are already often not an attractive model for modest websites with dedicated audiences because of various fundamental drawbacks in the model (like lack of control over what ads show on your website).
2015-04-05
A note on the argument about the 'morality' of adblockers
While adblockers make some people quite happy, there are others that consider them immoral; see for example this tweet. Let's set aside the security issues and other counter-arguments to note something important: much as in another case, it's extremely disingenuous to discuss morality here without mentioning the blatant amorality of advertising on the web itself. To put it simply, the ad industry and its supporters are coming to the table with extremely unclean hands.
By and large, the story of web advertising and ad companies and networks is a story of organizations aggressively and unapologetically tracking and intruding on people for years. At every turn web advertisers have done their best to obtain more information on more people, to mine this for as much creepy insight as they could, make as much money from it as possible, and never ever ask people for permission or even inform them. At every turn, the ad industry's view has been that if they could get away with something it was all good, especially if it was legal. Morality has never entered the picture.
The ad industry has spent years cultivating a 'fuck you' attitude where they would do everything that was within their technical capabilities to spy on people and shovel ads on top of them. To now suddenly be concerned about the 'morality' of what other people do is the height of hypocrisy. The ad industry has lived by the sword of 'technical capabilities are all that matters' (to the detriment of basically everyone else on the Internet), so it's only fair that they may now die on that sword, like it or not. Adblockers are possible, so by the ad industry's own conduct they're allowed.
(Since the ad industry has no morality it of course doesn't care about its own hypocrisy here; it will bleat whatever bleatings stand some chance of keeping its exploitative business model from collapsing. But bystanders should be listening to these bleatings with a full understanding.)
2015-03-28
All browsers need a (good) way to flush memorized HTTP redirects
As far as I know, basically all browsers cache HTTP redirects by default, especially permanent ones. If you send your redirects without cache-control headers (and why would you do that for a permanent redirect), they may well cache them for a very long time. In at least Firefox, these memorized redirects are extremely persistent and seem basically impossible to get rid of in any easy way (having Firefox clear your local (disk) cache certainly doesn't do it).
This is a bad mistake. The theory is that a permanent redirect is, well, permanent. The reality is that websites periodically send permanent redirects that are not in fact permanent (cf) and they disappear after a while. Except, of course, that when your browser more or less permanently memorizes this temporary permanent redirect, it's nigh-permanent for you. So browsers should have a way to flush such a HTTP redirect just as they have shift-reload to force a real cache-bypassing page refresh.
(A normal shift-reload won't do it because HTTP redirections are attached to following links, not to pages (okay, technically they're attached to looking up URLs). You could make it so that a shift-reload on a page flushes any memorized HTTP redirections for any link on the page, but that would be both kind of weird and not sufficient in a world where JavaScript can materialize its own links as it feels like.)
Sidebar: Some notes about this and Firefox
I've read some suggestions that Firefox will do this if you either tell Firefox to remove the site's browsing history entirely or delete your entire cache directory by hand. Neither are what are I consider adequate solutions; one has drastic side effects and the other requires quite obscure by-hand action. I want something within the browser that is no more effort and impact than 'Preferences / Advanced / Network / Clear cached web content'. It actually irritates me that telling Firefox to clear cached content does not also discard memorized HTTP redirections, but it clearly doesn't.
If you have some degree of control over the target website, you can force Firefox to drop the memorized HTTP redirection by redirecting back to the right version. This is generally only going to be useful in some situations, eg if you have the same site available under multiple names.
2015-03-18
The real speed advantage static rendering has over dynamic rendering
This entry made the Hacker News front page for much of Sunday, spending a bunch of time at #1 (clearly Sundays are slow news days) and giving me another opportunity to beat the drum about my views that dynamic sites don't need to be slow. So today, I'll take the other side and point out the real speed advantage that static rendering has over dynamic rendering.
Put simply, the advantage is that all static rendering is fast while only carefully tuned dynamic rendering is fast. Sure, you can make dynamic rendering go fast on ordinary hardware, and this blog is an existence proof; even with a crazy lashup it runs pretty fast. But you have to work to get fast dynamic rendering, and there are a huge number of ways to get slow dynamic rendering and plenty of software that behaves this way. Slow dynamic rendering is kind of the default state with many setups, software frameworks, system designs, and so on.
By contrast, you have to work really hard to make static rendering go slowly; speed and low impact is its natural state. Pretty much any current webserver with a vaguely rational configuration is going to be really fast at serving static files (especially relatively small ones). I'm not actually sure how you'd make a modern web server do this slowly.
So the default for static rendering is fast and the practical default for dynamic rendering is slow. And this is the real speed advantage; if you pick a random static rendering system to use, you're basically guaranteed to go fast. If you pick a random dynamic rendering system, you're probably going to be slow by default and perhaps fast with some amount of careful investigation and tuning.
(This is also true if you write your own system. A static system is again intrinsically fast on the web server, while you can look forward to paying attention to your dynamic system to make sure you haven't accidentally made it slow.)
Although I continue to feel that dynamic sites have significant advantages (including a more liberal URL design) and are not particularly hard to make fast enough for most people, I have to admit that this is a not insignificant advantage to static sites in practice. Like everyone else, I've heard any number of people be unhappy about how their off the shelf dynamic site is not doing too well under load; say what you like about static site generators, but these people wouldn't be having that problem if they'd used one instead.
PS: Based on a very brief log analysis, the volume from this Hacker News appearance was about the same as the first time around so I don't intend to do a writeup on it. I also suspect that my link experiences are not necessarily a good guide to the amount of traffic that HN directs to its (really) popular links. On the other hand, maybe not.
2015-02-16
Web ads considered as a security exposure
One of the things that reading Twitter has exposed me to is a number of people who deploy browser adblockers as part of their security precautions. This isn't because they're the kind of person who's strongly opposed to ads, and it's not even because they don't want their users (and themselves) to be tagged and tracked around the web (although that is a potential concern in places). It's because they see web ads themselves as a security risk, or more specifically a point of infection.
The problem with web ads is web ad networks. It's a fact that every so often web ad networks have been compromised by attackers and used to serve up 'ads' that are actually exploits. This doesn't just affect secondary or sketchy websites; major mainstream websites use ad networks, which means that visiting sites normally considered quite trustworthy and secure (like major media organizations) can expose you to this.
(As an extra risk, almost all ad networks use HTTP instead of HTTPS so you're vulnerable to man in the middle attacks on exposed networks like your usual random coffee shop wifi.)
Based on my understanding of modern sophisticated ad networks and the process of targeting ads, they also offer great opportunities for highly targeted attacks. At least some networks offer realtime bidding on individual ad impressions and as part of this they pass significant amounts of information about the person behind the request to the bidders. Want to target your malware against people in a narrow geographical area with certain demographics? You can do that, either by winning bids or by hijacking the same information processes from within a compromised ad network. You might even be able to do very specific 'watering hole' style attacks against people who operate from a restricted IP address range, such as a company's outgoing firewall.
(The great thing about winning bids is that you may not even be playing with your own money. After all, it's probably not too difficult to compromise one of the companies that's bidding to put its ads in front of people.)
If you're thinking about the risks here, web ad blockers make a lot of sense. They don't even have to be deeply comprehensive; just blocking the big popular web ad networks that are used by major sites probably takes out a lot of the exposure for most people.
I don't think about ad blockers this way myself, partly because I already consider myself low risk (I'm a Linux user with JavaScript and Flash blocked by default), but this is certainly something I'm going to think about this for people at work. Maybe we should join the places that do this as a standard recommendation or configuration.
2015-02-15
My current views on Firefox adblocker addons
I normally do my web browsing through a filtering proxy that strips out many ads and other bad stuff, and on top of that I use NoScript so basically all JavaScript based things drop out. However this proxy only does http, so I've known for a while that as the web moved more and more to https my current anti-ad solution would be less and less effective. This led to me playing around with various options in my testing browser but never pushed me to putting anything in my main browser. What pushed me over the edge to do this relatively recently was reaching my tolerance limit for Youtube ads and discovering that AdBlock Plus would reliably block them. Adding ABP made YouTube a drastically nicer experience for me; I consider its additional ad-blocking features to basically be a nice side effect.
(The popup ads are only slightly irritating, but then YT started feeding me more and more long, unskippable ads. At that point it was either stop watching YT videos or do something about it.)
What makes a bunch of people twitchy about AdBlock Plus is that it's run by a company plus their business model of allowing some ads through. Although ABP is open source, this means that its development is subject to changes in business model and we've seen that cause problems before. Eventually various things made me uncomfortable and unhappy enough to switch to AdBlock Edge (also), which is a fork of ABP with a bunch of things removed. In my 'basically use the defaults' setup, AdBlock Edge works the same as AdBlock Plus. It certainly removes the YouTube ads, which is what I really care about right now.
(My honest opinion is that AdBlock Plus is probably not going to go bad, partly because a fair number of people are paying attention to it since it's a quite popular Firefox extension. Still, I feel a bit better with AdBlock Edge, perhaps because I've been burned by changing extension business models before.)
Both AdBlock Plus and AdBlock Edge don't appear to have made my Firefox either particularly slow or particularly memory consuming. It's possible that I simply haven't noticed the impact because it's mild enough to not be visible for me, especially given my already filtered non-JavaScript browser environment. People certainly do report that these extensions cause them problems.
Recently µBlock has been in the information sources that I follow, so I gave it a try. Sadly, the results for me aren't positive in that µBlock did nothing to stop YouTube ads. Since this is the most important thing for me, I'm willing to forgive ABP and ABE a certain amount of resource consumption in order to get it. I do like the general µBlock pitch of being leaner and more efficient, so someday I hope it picks up this ability.
(As far as I know there's nothing else that blocks YouTube ads. I'd obviously be happy with a standalone extension for this plus µBlock for general blocking, but as far as I know no such thing exists.)
PS: I use other technology to block the scourge of YouTube autoplay. It's possible that this pile of hacks is interacting badly with µBlock.
2015-01-24
Web applications and generating alerts due to HTTP requests
One of the consequences and corollaries of never trusting anything you get from the network is that you should think long and hard before you make your web application generate alerts based on anything in incoming HTTP requests. Because outside people can put nearly anything into HTTP requests and because the Internet is very big, it's very likely that sooner or later some joker will submit really crazy HTTP requests will all sorts of bad or malicious content. If you're alerting on this, well, you can wind up with a large pile of alerts (or with an annoying trickle of alerts that numbs you to them and to potential problems).
Since the Internet is very big and much of it doesn't give much of a damn about your complaints, 'alerts' about bad traffic from random bits of the Internet are unlikely to be actionable alerts. You can't get the traffic stopped by its source (although you can waste a lot of time trying) and if your web application is competently coded it shouldn't be vulnerable to these malicious requests anyways. So it's reporting that someone rattled the doorknobs (or tried to kick the door in); well, that happens all the time (ask any sysadmin with an exposed SSH port). It's still potentially useful to feed this information to a trend monitoring system, but 'HTTP request contains bad stuff' should not be an actual alert that goes to humans.
(However, if your web application is only exposed inside what is supposed to be a secured and controlled environment, bad traffic may well be an alert-worthy thing because it's something that's really never supposed to happen.)
A corollary to this is that web frameworks should not default to treating 'HTTP request contains bad stuff' as any sort of serious error that generates an alert. Serious errors are things like 'cannot connect to database' or 'I crashed'; 'HTTP request contains bad stuff' is merely a piece of information. Sadly there are frameworks that get this wrong. And yes, under normal circumstances a framework's defaults should be set for operation on the Internet, not in a captive internal network, because this is the safest and most conservative assumption (for a definition of 'safest' that is 'does not deluge people with pointless alerts').
(This implies that web frameworks should have a notion of different types or priorities of 'errors' and should differentiate what sort of things get what priorities. They should also document this stuff.)
2015-01-10
Autoplaying anything is a terrible decision, doubly so for video
Youtube's autoplay behavior makes me so angry. No no no augh wrong. What a way to rudely demand attention.
The single thing I hate the most about Youtube is that its videos start playing the moment you open one. On the one hand I can kind of see why Youtube does this; I'm sure they have plenty of user experience studies that tell them that without autoplay people dislike having to do an extra step to get what they came to Youtube for and that a certain amount of people don't realize what they need to click to get things working and wind up giving up. On the other hand it is a terrible decision in many situations and they should have a preference for it (using a long term cookie).
There are two problems with autoplay, especially of videos. The first problem is the general problem that autoplay assumes that browsing to a page means that you want to deal with the page right now. In a world where some number of people make heavy use of browser tabs and Youtube videos are often extremely non-urgent things, this is wrong; it's not unusual to open something in a tab and then ignore it for some time until you get around to it. Autoplay puts a stop to that by giving you no choice; you have to deal with the page right away, if only to shut it up.
(And of course autoplay stomps all over anything else you may be playing at the time you open the new tab in, theoretically, the background.)
The second problem is that autoplay of videos makes going to a Youtube page (or any such page) into what is essentially a globally synchronous operation for you. Since the video will start playing the moment it loads, you'll miss the start of the video if you're not paying attention to the page at the time. Want or need to look away briefly to something else while the page loads over a potentially congested link? Better be prepared to switch your attention back on a moment's notice or you'll be restarting the video so you can see the beginning. This is okay if all you have is a full-screen browser window that's going to a Youtube page (unless you look away from the computer entirely), but there are plenty of people in the world who are still using their multi-tasking computers to multi-task.
(Given Youtube ads, the real effect may be that you miss the start and perhaps all an insert ad. This is actually probably worse from Youtube's perspective, as eventually it will cost them revenue.)
PS: from my grumpiness about this, you might correctly conclude that the pile of hacks I use to stop autoplay has stopped working recently. In this case Youtube appears to have done something that broke Flashblock. It's a reported issue but who knows when this will be fixed, given prior issues with getting Flashblock updated, although there turns out to be a workaround.
2014-12-31
I love Apache (well, like it at least)
There is somewhat of a meme around the sysadmin and web world that Apache is a bad web server that no one should use, or at least not use for very long. And if you're using Apache or thinking about configuring something with Apache, maybe you should think about how to migrate away or pick a setup that will make that easy. I once sort of felt this way and experimented with (and still use) alternatives like lighttpd, but I no longer do so; these days I think that Apache is your best default choice for a web server.
I will put it like this: choosing Apache is a lot like choosing to write code with a high-level 'scripting' language like Python or Ruby instead of C++, Java, or Go. The result is not always as fast as it could be but often it will be pretty good (sometimes excellent), much of what most people are doing doesn't need the flat out speed, and the whole thing is easy to set up, quite convenient, and does a lot for you, including things you may not initially realize that you want or need.
Apache is not likely to be the highest performing or least resource using web server you can configure (although you might be surprised). However in real life most of the web sites most people set up do not need to support massive loads (and when they do you may find that the bottlenecks are not in the web server). And what Apache gives you is almost unmatched support for easily set up and deployed web app systems. CGI scripts are very simple; PHP files can simply be dropped into place; Python applications need only mod_wsgi; even Ruby on Rails apparently has an Apache module. If you want to use FastCGI and app daemons and other higher-capacity things, well, Apache supports that too. Not infrequently it will give your simple applications practical performance boosts, for example by compressing their HTTP responses when possible.
(In my personal interest of TLS security I've also wound up feeling that Apache is in practice at the forefront of allowing you to set up good TLS configurations. Other web servers can have spotty support for various things, but Apache is almost always there. These days I recommend that people who care about TLS look very carefully at anything other than Apache.)
Apache can have what I've called the Apache configuration tax, where for simple setups (like static file service) other web servers can have simpler setups with less configuration work to do. On the other hand I've found that sensible Apache configurations create very simple Apache setups. And any complex server design is probably going to be complex to express in a configuration, regardless of what web server you're using.
All of this leaves me feeling three things about Apache. First, Apache is a perfectly sensible choice as your web server, one that shouldn't need defending any more than a choice of a more sexy and talked about web server like nginx (or lighttpd, which no longer seems to be the latest hotness). Second, Apache probably should be your default choice of web server. Unless you already have special expertise and tooling for other web servers or unless you can clearly see why Apache is a bad fit for your target environment, just going with Apache will probably make your life easier and work just as well. And finally, that Apache deserves more love in general. It's really time for Apache to (re)take its place as a perfectly respectable web server, not the embarrassing relative we don't talk about.
(I'll admit that this is kind of a rant because I've decided that I don't want to feel defensive any time I talk about using Apache. But my views have really shifted on this over time, as I used to share the common attitude that Apache was outdated or at least a catastrophic performer (not really, actually, to my surprise in that situation). My personal site still runs lighttpd, but I'm not convinced that that's the right decision; it persists partly out of inertia and partly because Fedora's Apache configuration setup is terrible.)