Wandering Thoughts archives

2009-02-26

What I learned from Google Mail's recent outage

Suppose you have a system with nodes and work items. You have to assign work items to nodes somehow; one way to do it is to randomly distribute work items around the nodes, but another is to assign them based on some sort of fixed-outcome affinity function (like 'is topologically nearest'). Now consider what happens when a node overloads or fails (or is just taken out of service) and its work has to be reassigned to new nodes.

In a random-assignment system, the failed node's work is smeared broadly over all of the remaining nodes; each node only has to absorb a little bit of extra work. But in a fixed-affinity system, you are going to assign all of the work from the failed node to only a few nodes, the nodes that are 'closest' to the failed node. This will add significant load to them and may push one of those nodes into an overload failure; if this happens it adds yet more load to the remaining nearby nodes, and suddenly you have a cascade failure marching through your system.

(The more neighbors each node has the better, here, and conversely the fewer it has the more likely an overload is to happen.)

This possibility is probably not something that would have occurred to me until I read Google Mail's description of their recent outage (although I'm sure it's well known to experienced people in this field). Thus, the title of this entry is not sarcastic; Google's willingness to describe this led to me learning something potentially quite useful (or perhaps becoming conscious of it is a better description).

(Hence my quite generic description of the problem, since I think it can happen in any system with these characteristics. Distributed systems without fast work reassignment and some sort of load cutoff may be especially at risk, but I suspect that this also comes up in situations like scheduling processes and allocating memory on NUMA machines where CPU modules can be unplugged.)

AffinityCascadeProblem written at 01:14:18; Add Comment

2009-02-23

A problem with microtransactions

One corollary of Internet scale security is to point out a lurking issue with microtransactions (one of the perennial Internet enthusiasms in some quarters). The problem is how you handle authorizing microtransactions.

If you prompt the user every time they spend a cent, I think it's very like that people will rapidly find this far too annoying and stop using microtransactions at all. If you do not require the user to authorize transactions you open yourself (and the user) up to attacks where the user's browser or other agent is either subverted or fooled into authorizing undesired things.

Normally something that nets only, say, a cent per transaction would have too small a payoff to be worth attacking. However, Internet scale means that if it can be done automatically without user authorization it can be automated and done in mass, either to individual users or across a lot of users. And that can rapidly turn into enough real money to be worth it, especially if running the attack has very low overhead for the attacker (much like mass ssh scanning).

I suspect that the fix for this will have to be client-side, where whatever program is authorizing these transactions has the sort of anti-fraud precautions that credit card processors do; volume limits, things that look for odd patterns, and so on. (Requiring people to authorize the first few microtransactions with a given vendor is sensible but, I think, doesn't entirely help in the long run.)

(Of course, microtransactions have a lot of other problems, including that people much prefer predictable flat rate expenses over unpredictable variable ones.)

MicrotransactionsProblem written at 01:38:43; Add Comment

2009-02-22

Internet scale security: the impact of cheapness

Ten years ago or so, mass login and password guessing attacks were essentially a non-issue; old news, an attack whose time had passed not only because everyone knew how to counter them but because they had such a low payoff that no one bothered doing the tedious work (and if someone did, you pitied them).

Then the Internet happened and everything changed. Suddenly mass password guessing attacks were not a theoretical issue; instead, they were cluttering up your logs every day. This happened not because mass password guessing had gotten any more effective and successful, but because the Internet had made it dirt cheap. If you were a cracker with a bunch of compromised machines that you weren't doing anything particularly important with, starting up a brute force ssh scanner cost you essentially nothing and might get you a nice payoff.

Or in short: cheapness makes low-probability and even low-payoff mass attacks worthwhile, and the Internet has delivered cheap computing to attackers. These days, lots of cheap computing.

(In the larger scale of things, this is nothing new; probably everyone has heard the stories of bank frauds that involved taking the fractional cents on various interest payments, known as salami slicing.)

This means that you need to design security differently when you are designing a system on the Internet. At Internet scale, computing is cheap and readily available, and attacks are almost certainly cheap to mount; either they can be automated or they can be contracted out to low-wage places, making all sorts of things feasible that would normally be too much effort if done by hand by the primary attacker. And as we've seen with spam, if you can be exploited, sooner or later you will be; obscurity is not a defense if you have something that attackers want.

(One corollary is that you need to worry about even low-payoff attacks if they can be done against you in mass, and they probably can be. See, for example, this.)

InternetScaleSecurity written at 00:59:34; Add Comment

2009-02-18

My theory on why people wind up using common passwords

When we talk to people about passwords, especially about website passwords, generally what we teach them is some variant of 'do not write down your passwords and always use different ones on each service'. Time after time, what happens is that people follow half of this; they have only one common password, but they don't write it down.

It has recently occurred to me that there is a sensible explanation for this result (beyond the obvious one that it is the more convenient option if you are only going to follow one half of the teaching): the risks of common passwords are less intuitively obvious than the risks of writing passwords down. People can easily imagine the problems with writing things down, because it involves simple physical risks like someone stealing your piece of paper, but the problems of common passwords are more abstract and distant; attackers stealing password databases and the like are not so familiar for most people, and probably not as real as a result.

Thus, more or less faced with choosing between mitigating a risk that they easily understand and mitigating a risk that they don't really grasp, it's not surprising that most people mitigate the risk that they can easily imagine, even if this risk is in practice the smaller one.

(We already know that people routinely mis-estimate risks in various ways, so this should not be very shocking.)

Applications to other security risks and scenarios are left as an exercise to the reader, but I am already peering at all sorts of things through this new prism of moderate insight.

WhyCommonPasswords written at 01:45:42; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.