Wandering Thoughts archives

2014-12-15

How a Firefox update just damaged practical security

Recently, Mozilla pushed out Firefox 34 as one of their periodic regular Firefox updates. Unfortunately this shipped with a known incompatible change that broke several extensions, including the popular Flashblock extension. Mozilla had known about this problem for months before the release; in fact the bug report was essentially filed immediately after the change in question landed in the tree, and the breakage was known when the change was proposed. Mozilla people didn't care enough to do anything in particular about this beyond (I think) blacklisting the extension as non-functional in Firefox 34.

I'm sure that this made sense internally in Mozilla and was justified at the time. But in practice this was a terrible decision, one that's undoubtedly damaged pragmatic Firefox security for some time to come. Given that addons create a new browser, the practical effect of this decision is that Firefox's automatic update to Firefox 34 broke people's browsers. When your automatic update breaks people's browsers, congratulations, you have just trained them to turn your updates off. And turning automatic updates off has very serious security impacts.

The real world effect of Mozilla's decision is that Mozilla has now trained some number of users that if they let Mozilla update Firefox, things break. Since users hate having things break, they're going to stop allowing those updates to happen, which will leave them exposed to real Firefox security vulnerabilities that future updates would fix (and we can be confident that there will be such updates). Mozilla did this damage not for a security critical change but for a long term cleanup that they decided was nice to have.

(Note that Mozilla could have taken a number of methods to fix the popular extensions that were known to be broken by this change, since the actual change required to extensions is extremely minimal.)

I don't blame Mozilla for making the initial change; trying to make this change was sensible. I do blame Mozilla's release process for allowing this release to happen knowing that it broke popular extensions and doing nothing significant about it, because Mozilla's release process certainly should care about the security impact of Mozilla's decisions.

FirefoxUpdateSecurityFail written at 22:14:18; Add Comment

2014-12-06

Browser addons can effectively create a new browser

In many ways, what a browser is is defined by its user interface. These days all browsers display web pages and run JavaScript; what really differentiates them is the experience of using them. The logical consequence of this is that there are any number of browser addons that change either the user interface itself or just the experience of using the browser to such a degree that you can wind up with what might as well be a different browser. Your browser is still based on Chrome or Firefox but it is not Chrome or Firefox as such.

(Certainly this is the case for me with my extensions. Adding gestures to Firefox significantly changes the UI I experience, while NoScript and other screening tools give me a much different and more pleasant view of the web.)

The direct consequence of this is that in many cases, people's core addons are not optional. If your addons stop working, what you wind up with is effectively a different browser; its UI is different, its behavior is different. This means that from a user's perspective, breaking addons can be the same as breaking the browser. Regardless of the technical details about what happened, you wind up in a browser that doesn't work right, one that no longer behaves the way it used to.

(A corollary is that once your browser is broken, you may well have no particular reason to stay with the underlying base it was built on. Humans being humans, you are probably much more likely to feel angry that your browser has been broken and switch to a browser from some other people.)

This is of course not the case for all addons or all people. Some addons have too small an effect, and not everyone will do much with addons that can have major effects on the UI or the browsing experience. But even small addons may have big effects for some people; if you use an addon that tweaks a site that you use all the time in a way that strongly affects your use of the site, you've kind of got a different browser. Certainly losing the addon would significantly change your experience even though that one site is only a small part of the web. I'm sure there are people using extensions related to the big time-consuming social websites who fall into this category.

(If you install a gestures extension but rarely use gestures, or install NoScript but whitelist almost everything you run into, you're not really changing your browsing experience much from the baseline browser.)

AddonsCreateNewBrowser written at 00:03:34; Add Comment

2014-12-02

The unreasonable effectiveness of web crawlers

I have a few test copies of Wandering Thoughts and all of CSpace sitting around here and there; I use them for things like trying out new CSS and other layout stuff, playing with code changes, testing the full weight of CSpace in different web environments, and so on. As it happens, one of those copies sometimes exists on my personal domain. I don't link to these copies from anywhere, of course, as they're test things and I access them from direct URLs. So you can imagine my surprise when one day I discovered that Googlebot and several other crawlers were rummaging through that copy on my personal domain. Of course I became very curious about how they could possibly have found it.

The answer turned out to be that lurking in the DWiki install on my personal domain was a single stray file copied over from CSpace that had a link to '/~cks/'. This link had been there for years but equally had led to nothing for those years until I brought up the test install and left it there. Crawlers had been trying the link all that time and getting 404s on it, but within a few days of the link switching to working Googlebot tried the URL again, found a page there, and started crawling through the links it found (another crawler also showed up). And it was crawling quite enthusiastically at that, not going all that slowly.

(Fortunately I noticed almost immediately and turned the whole thing off again. This was mostly luck; I was watching the logs because I'd been doing some experimentation, so I actually noticed the explosion in traffic volume. Normally I don't look at the logs there for long periods of time.)

What this has shown me rather vividly is that web crawlers are unreasonably effective. If there's a link to something lurking somewhere, no matter how obscure, they're likely to find it, follow it, and crawl everything behind it. Of course I already knew this in theory, since there have been all sorts of stories over the years of search engines indexing things that no one expected them to stumble over (or to stumble over that fast), but it's one thing to read the stories and another thing to have it happen to you.

(The next time around I'll try to remember to put access restrictions up for whatever I'm testing. And to do it before I bring up the test setup.)

CrawlerFindingPower written at 00:18:18; Add Comment

2014-11-22

The effects of a moderate Hacker News link to here

A few days ago my entry on Intel screwing up their DC S3500 SSDs was posted to Hacker News here and rose moderately highly up the rankings, although I don't think it made the front page (I saw it on the second page at one point). Fulfilling an old promise, here's a report of what the resulting traffic volume looked like.

First, some crude numbers from this Monday onwards for HTTP requests for Wandering Thoughts, excluding Atom feed requests. As a simple measurement of how many new people visited, I've counted unique IPs fetching my CSS file. So the numbers:

(day) (that entry) (other pages) (CSS fetches)
November 17 0 5041 453
November 18 18255 6178 13585
November 19 17112 10141 11940
November 20 908 6341 876
November 21 228 4811 530

(Some amount of my regular traffic is robots and some of it is from regular visitors who already have my CSS file cached and don't re-fetch it.)

Right away I can say that it looks like people spilled over from the directly linked entry to other parts of Wandering Thoughts. The logs suggest that this mostly went to the blog's main page and my entry on our OmniOS fileservers, which was linked to in the entry (much less traffic went to my entry explaining why 4K disks can be a problem for ZFS). Traffic for the immediately preceding and following entries also went up, pretty much as I'd expect, but this is nowhere near all of the extra traffic so people clearly did explore around Wandering Thoughts to some extent.

Per-day request breakdowns are less interesting for load than per minute or even per second breakdowns. At peak activity, I was seeing six to nine requests for the entry per second and I hit 150 requests for it a minute (for only one minute). The activity peak came very shortly after I started getting any significant volume of hits; things start heating up around 18:24 on the 18th, go over 100 views a minute at 18:40, peak at 19:03, and then by 20:00 or so I'm back down to 50 a minute. Unfortunately I don't have latency figures for DWiki so I don't know for sure how well it responded while under this load.

(Total page views on the blog go higher than this but track the same activity curve. CSpace as a whole was over 100 requests a minute by 18:39 and peaked at 167 requests at 19:05.)

The most surprising thing to me is the amount of extra traffic to things other than that entry on the 19th. Before this happened I would have (and did) predict a much more concentrated load profile, with almost all of the traffic going to the directly linked entry. This is certainly the initial pattern on the 18th, but then something clearly changed.

(I was surprised by the total amount of traffic and how many people seem to have visited but that's just on a personal basis where it's surprising for so many people to be interested in looking at something I've written.)

This set of stats may well still leave people with questions. If so, let me know and I'll see if I can answer them. Right now I've stared at enough Apache logs for one day and I've run out of things to say, so I'm stopping this entry here.

Sidebar: HTTP Referers

HTTP Referers for that entry over the 18th to the 20th are kind of interesting. There were 17,508 requests with an empty Referer, 13,908 from the HTTPS Hacker News front page, 592 from a google.co.uk redirector of some sort, 314 from the t.co link in this HN repeater tweet, and then we're down to a longer tail (including reddit's /r/sysadmin, where it was also posted). The Referers feature a bunch of various alternate interfaces and apps for Hacker News and so on (pipes.yahoo.com was surprisingly popular). Note that there were basically no Referers from any Hacker News page except the front page, despite that as far as I know the story never made it to the front page. I don't have an explanation for this.

HackernewsEffectSize written at 00:46:51; Add Comment

2014-11-17

Why I need a browser that's willing to accept bad TLS certificates

One of my peculiarities is that I absolutely need a browser that's willing to accept 'bad' TLS certificates, probably for all species of bad that you can imagine: mismatched host names, expired certificates, self-signed or signed by an unknown certificate authority, or some combination of these. There are not so much two reasons for this as two levels of the explanation.

The direct reason is easy to state: lights out management processors. Any decent one supports HTTPS (and you really want to use it), but we absolutely cannot give them real TLS certificates because they all live on internal domain names and we're not going to change that. Even if we could get proper TLS certificates for them somehow, the cost is prohibitive since we have a fair number of LOMs.

(Our ability to get free certificates has gone away for complicated reasons.)

But in theory there's a workaround for that. We could create our own certificate authority, add it as a trust root, and then issue our own properly signed LOM certificates (all our LOMs accept us giving them new certificates). This would reduce the problem to doing an initial certificate load in some hacked up environment that accepted the LOMs out-of-box bad certificate (or using another interface for it, if and where one exists).

The problem with this is that as far as I know, certificate authorities are too powerful. Our new LOM certificate authority should only be trusted for hosts in a very specific internal domain, but I don't believe there's any way to tell browsers to actually enforce that and refuse to accept TLS certificates it signs for any other domain. That makes it a loaded gun that we would have to guard exceedingly carefully, since it could be used to MITM any of our browsers for any or almost any HTTPS site we visit, even ones that have nothing to do with our LOMs. And I'm not willing to take that sort of a risk or try to run an internal CA that securely (partly because it would be a huge pain in practice).

So that's the indirect reason: certificate authorities are too powerful, so powerful that we can't safely use one for a limited purpose in a browser.

(I admit that we might not go to the bother of making our own CA and certificates even if we could, but at least it would be a realistic possibility and people could frown at us for not doing so.)

AcceptBadCertNeed written at 23:11:39; Add Comment

2014-11-10

Why I don't have a real profile picture anywhere

Recently I decided that I needed a non-default icon aka profile picture for my Twitter account. Although I have pictures of myself, I never considered using one; it's not something that I do. Mostly I don't set profile pictures on websites that ask for them and if I do, it's never actually a picture of me.

Part of this habit is certainly that I don't feel like giving nosy websites that much help (and they're almost all nosy). Sure, there are pictures of me out on the Internet and they can be found through search engines, but they don't actually come helpfully confirmed as me (and in fact one of the top results right now is someone else). Places like Facebook and Twitter and so on are already trying very hard to harvest my information and I don't feel like giving them any more than the very minimum. For a long time that was all that I needed and all of the reason that I had.

These days I have another reason for refusing to provide a real picture, one involving a more abstract principle than just a reflexive habit towards 'none of your business' privacy. Put simply, I don't put up a profile picture because I've become conscious that I could do so safely, without fear of consequences due to people becoming aware of what I look like. Seeing my picture will not make people who interact with me think any less of me and the views I express. It won't lead to dismissals or insults or even threats. It won't expose me to increased risks in real life because people will know what I look like if they want to find me.

All of this sounds very routine, but there are plenty of people on the Internet for whom this is at least not a sure thing (and thus something that they have to consider consciously every time they make this choice) or even very much not true. These people don't have my freedom to casually expose my face and my name if I feel like it, with no greater consideration than a casual dislike of giving out my information. They have much bigger, much more serious worries about the whole thing, worries that I have the privilege of not even thinking about almost all of the time.

By the way, I don't think I'm accomplishing anything in particular by not using a real picture of myself now that I'm conscious of this issue. It's just a privilege that I no longer feel like taking advantage of, for my own quixotic reasons.

(You might reasonably ask 'what about using your real name?'. The honest answer there is that I am terrible with names and that particular ship sailed a very long time ago, back in the days before people were wary about littering their name around every corner of the Internet.)

PS: One obvious catalyst for me becoming more aware of this issue was the Google+ 'real names' policy and the huge controversy over it, with plenty of people giving lots of excellent arguments about why people had excellent reasons not to give out their real names (see eg the Wikipedia entry if you haven't already heard plenty about this).

PPS: Yes, I have plenty of odd habits related to intrusive websites.

WhyNotProfilePictures written at 23:59:07; Add Comment

2014-10-19

Vegeta, a tool for web server stress testing

Standard stress testing tools like siege (or the venerable ab, which you shouldn't use) are all systems that do N concurrent requests at once and see how your website stands up to this. This model is a fine one for putting a consistent load on your website for a stress test, but it's not actually representative of how the real world acts. In the real world you generally don't have, say, 50 clients all trying to repeatedly make and re-make one request to you as fast as they can; instead you'll have 50 new clients (and requests) show up every second.

(I wrote about this difference at length back in this old entry.)

Vegeta is a HTTP load and stress testing tool that I stumbled over at some point. What really attracted my attention is that it uses a 'N requests a second' model, instead of the concurrent request model. As a bonus it will also report not just average performance but also on outliers in the form of 90th and 99th percentile outliers. It's written in Go, which some of my readers may find annoying but which I rather like.

I gave it a try recently and, well, it works. It does what it says it does, which means that it's now become my default load and stress testing tool; 'N new requests a second' is a more realistic and thus interesting test than 'N concurrent requests' for my software (especially here, for obvious reasons).

(I may still do N concurrent requests tests as well, but it'll probably mostly be to see if there are issues that come up under some degree of consistent load and if I have any obvious concurrency race problems.)

Note that as with any HTTP stress tester, testing with high load levels may require a fast system (or systems) with plenty of CPUs, memory, and good networking if applicable. And as always you should validate that vegeta is actually delivering the degree of load that it should be, although this is actually reasonably easy to verify for a 'N new request per second' tester.

(Barring errors, N new requests a second over an M second test run should result in N*M requests made and thus appearing in your server logs. I suppose the next time I run a test with vegeta I should verify this myself in my test environment. In my usage so far I just took it on trust that vegeta was working right, which in light of my ab experience may be a little bit optimistic.)

VegetaLoadTesting written at 02:03:33; Add Comment

2014-10-08

Simple web application environments and per-request state

One of the big divides in web programming environments (which are somewhat broader than web frameworks) is between environments that only really have per-request state and every new request starts over with a blank slate and environments with state that persists from request to request. CGI is the archetype of per-request state, but PHP is also famous for it. Many more advanced web environments have potential or actual shared state; sometimes this an explicit feature of the environment over simpler ones.

(One example of a persistent state environment is Node and I'd expect the JVM to generally be another one.)

I have nothing in particular against environments with persistent state and sometimes they're clearly needed (or at least very useful) for doing powerful web applications. But I think it's clear that web environments without it are simpler to program and thus are easier to write simple web things in.

Put simply, in an environment with non-persistent state you can be sloppy. You can change things. You can leave things sitting around the global environment. You can be casual about cleaning up bits and pieces. And you know that anything you do will be wiped away at the end of the request and the next one will start from scratch. An environment with persistent state allows you to do some powerful things but you have to be more careful. It's very easy to 'leak' things into the persistent environment and to modify things in a way that unexpectedly changes later requests, and it can also be easy to literally leak memory or other resources that would have been automatically cleaned up in a per-request environment.

(At this point the pure functional programmers are smugly mentioning the evils of mutable state.)

Speaking from personal experience, keeping track of the state you're changing is hard and it's easy to do something you don't realize. DWiki started out running in a purely non-persistent environment; when I also started running it in a semi-persistent one I found any number of little surprises and things I was doing to myself. I suspect I'd find more if I ran it for a long time in a fully persistent environment.

As a side note, there are some relatively obvious overall advantages to building a web application that doesn't require persistent state even if the underlying web environment you're running in supports it. This may make it useful to at least test your application in an environment that explicitly lacks it, just to make sure that everything still works right.

NonpersistentStateSimple written at 00:41:09; Add Comment

2014-09-26

The practical problems with simple web apps that work as HTTP servers

These days there are a number of languages and environments with relatively simple to learn frameworks for doing web activity (I am deliberately phrasing that broadly). Both node and Go have such things, for example, and often make a big deal of it.

(I know that 'let's do some web stuff' is a popular Go tutorial topic to show easy it is.)

All of this makes it sound like these should be good alternatives to the CGI problem (especially with their collections of modules and packages and so on). Unfortunately this is not the case in default usage and one significant part of why not is exactly that these systems are pretty much set up to be direct HTTP servers out of the box.

Being a direct HTTP server is marvelously simple approach for a web app if and only if you're the only thing running on the web server. If you have a single purpose web server that exists merely for your one web application, it's great that you can expose the web app directly (and in simple internal setups you don't particularly need the services of Apache or nginx or lighttpd or the like). But this single purpose web server is very rarely the case for simple CGI-level things. Far more common is a setup where you have a whole collection of both static pages and various simple web applications aggregated together under one web server.

(In general I feel that anything big enough for its own server is too big to be sensible as a 'simple CGI'. Good simple CGI problems are small almost by definition.)

If you try hard you can still make this a single server in Go or node but you're going to wind up with kind of a mess where you have several different projects glued together, all sharing the same environment with each other (and then there are the static files to serve). If the projects are spread across several people, things will get even more fun. Everything in one bucket is simply not a good engineering answer here. So you need to separate things out, and any way you do that makes more and more work.

If you separate things out as separate web servers, you need multiple IPs (even if you embody things on the same host) and multiple names to go with them, which are going to be visible to your users. If you separate things out with a frontend web server and reverse proxying, all of your simple web apps have to be written to deal with this (and with the various issues involved in using HTTP as a transport). Both complicate your life, eroding some of the theoretical simplicity you're supposed to get.

(However Go does have a FastCGI package (as well as a CGI package, but then you're back to CGI), apparently with an API that's a drop in replacement for the native Go HTTP server. It appears that node has at least a FastCGI module that's said to be a relatively drop in replacement for its http module. FastCGI does leave you with the general problems of needing daemons, though.)

PS: I'm handwaving the potentially significant difference in programming models between CGI's 'no persistent state between requests' and the shared context web app model of 'all the persistent state you want (or forget to scrub)'. I will note that the former is much simpler and more forgiving than the latter, even in garbage collected environments such as Go and Node.

Sidebar: the general issues with daemons

Although it is not specific to systems that want to be direct HTTP servers, the other problem with any sort of separate process model for simple web apps is exactly that it involves separate processes for each app. Separate processes mean that you've added more daemons to be configured, started, monitored and eventually restarted. Also, those daemons will be sitting there consuming resources on your host even if their app is completely unused at the moment.

You can make this easy if you try hard. But today it involves crafting a significant amount of automation because pretty much no out of the box Unix system is designed for this sort of operation. Building this automation is a not insignificant setup cost for your 'simple' web apps (well, for your first few).

(If you try really hard and have the right programming model you can get apps to be started on demand and stopped when the demand goes away, but this actively requires extra work and complexity in your systems and so on.)

HTTPAppProblem written at 03:09:43; Add Comment

2014-09-25

Why CGI-BIN scripts are an attractive thing for many people

The recent Bash vulnerability has people suddenly talking about CGI-BIN scripts, among other things, and so the following Twitter exchange took place:

@dreid: Don't use CGI scripts. For reals.

@thatcks: My lightweight and simple deployment options are CGI scripts or PHP code. I'll take CGI scripts as the lesser evil.

@eevee: i am pretty interested in solving this problem. what are your constraints

This really deserves more of a reply than I could give on Twitter, so here's my attempt at explaining the enduring attraction of CGI scripts.

In a nutshell, the practical attraction of CGI is that it starts really simple and then you can make things more elaborate if you need it. Once the web server supports it in general, the minimal CGI script deployment is a single executable file written in the language of your choice. For GET based CGI scripts, this program runs in a quite simple environment (call it an API if you want) for both looking at the incoming request and dumping out its reply (life is slightly more difficult if you're dealing with POST requests). Updating your CGI script is as simple as editing it or copying in a new version and your update is guaranteed to take effect immediately; deactivating your script is equally simple. If you're using at least Apache you can easily give your CGI script a simple authentication system (with HTTP Basic authentication).

In the category of easy deployment, Apache often allows you to exercise a lot of control over this process without needing to involve the web server administrator to change the global web server configuration. Given .htaccess control you can do things like add your own basic authentication, add access control, and do some URL rewriting. This is part of how CGI scripts allow you to make things more elaborate if you need to. In particular, if your 'CGI script' grows big enough you don't have to stick with a single file; depending on your language there are all sorts of options to expand into auxiliary files and quite complicated systems (my Rube Goldberg lashup is an extreme case).

Out of all of the commonly available web application systems (at least on Unix), the only one that has a similar feature list and a similar ease of going from small to large is PHP. Just like CGI scripts you can start with a single PHP file that you drop somewhere and then can grow in various ways, and PHP has a simple CGI-like API (maybe even simpler, since you can conveniently intermix PHP and HTML). Everything else has a more complex deployment process (especially if you're starting from scratch) and often a more complex management process.

CGI scripts are not ideal for large applications, to say the least. But they are great for small, quick things and they have an appealingly simple deployment story for even moderate jobs like a DHCP registration system.

By the way, this is part of the reason that people write CGI scripts in (Bourne) shell. Bourne shell is itself a very concise language for relatively simple things, and if all you are doing is something relatively simple, well, there you go. A Bourne shell script is likely to be shorter and faster to write than almost anything else.

(Expert Perl programmers can probably dash off Perl scripts that are about as compact as that, but I think there are more relatively competent Bourne shell scripters among sysadmins than there are relatively expert Perl programmers.)

CGIAttractions written at 02:04:20; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.