Wandering Thoughts archives

2008-05-29

Why web spiders should not crawl syndication feeds

On the surface, crawling syndication feeds looks like an attractive idea for web spider operators (although I am not convinced that the metadata they get is on the whole any better than the metadata on web pages). But as things are today, it is a terrible idea and is highly likely to provoke bad reactions if attempted.

The big problem is that right now, if you turned a web spider loose on syndication feeds it would pull far too many of them. This is because people (and websites) have lots of feeds that are either empty or contain overlapping content, but there's no way for a web spider to tell beforehand about this sort of thing. Pulling anyways is bad, because web spiders pulling those feeds puts a pointless burden on web sites (the spider gets nothing new out of it, but the web site is forced to generate and send the data). And this is not just a theoretical issue, as feed 'over-pulling' has affected actual people with real websites.

Or in short, there currently is no good way to do automated discovery of syndication feeds that people actually want spiders to pull. Since there are clearly lots of feeds that are pointless to pull, and since the current default is not to pull feeds, changing the default to 'we pull feeds unless you tell us not to' is going to get bad reactions.

(I suspect the reaction of most people would be 'we refuse to mark up our websites so that you'll stop abusing us', followed by strategic additions to robots.txt.)

An associated issue is that repeatedly pulling syndication feeds has additional requirements in order to not be antisocial, and web spiders have traditionally not done very well at following these requirements. Widespread repeated crawling of syndication feeds would make this even more irritating and painful for web site operators than it already is (especially since getting well behaved web spiders is hard enough as it is).

WhyNoFeedCrawling written at 00:10:48; Add Comment

2008-05-18

The threat model for website logins

One of the things that security people always say is that the first step in doing a decent security analysis is to figure out your threat model. So, what is the threat model for website logins, in other words what sort of attacks are you likely to face that you need to defend against?

My belief is that there are two or maybe three significant threats these days:

  • phishing, for which the best defense is getting your users out of the habit of entering their passwords at all; either have them logged on all the time or have their browser memorize their password or both. That way actually being prompted for a password has a much better chance of raising alarm bells in the user's mind (and they might have forgotten their password, so digging it out will give them even more time to realize that something is wrong).

  • compromised machines. There's no defense against these, although using one-time passwords can help mitigate the damage. But unless you're actually handling the user's money, you're unlikely to persuade users to put up with the annoyance of any one-time password scheme.

  • maybe cross-site request forgery, which you can defend against in part by getting your users to log out regularly, which works best if logging in again is easy.

To bang on yesterday's issue again, you aren't protecting against any of these when you block browsers from memorizing password information for your site. The only one that comes close is compromised machines, but with them it doesn't matter whether or not the browser has the password stored; you've lost either way. At best you've forced the malicious payload to do more work, but keyloggers are not exactly difficult to find these days.

(My personal feeling is that the average website is much more at risk from phishing than from compromised machines, because phishing attacks are easier to put together and yield far more immediate and targeted results.)

WebloginsThreatModel written at 23:41:12; Add Comment

Counterproductive password security

Certain websites and I have a disagreement of opinion. To wit, they feel that my account is vitally important, so important that they must save me from myself by refusing to let my browser remember the password for me. I disagree with them, because ultimately they are just another website. Sure, it would be annoying if an attacker deleted my account or the like, but in the global scale of things it is not that big a deal.

(I will excuse people being paranoid if they are holding my money; then there is something at stake beyond my activities on their websites.)

In the local scale of things, they win; I have yet to figure out how to override whatever they've told my browser. In the global scale of things they lose, because I now have my login information written down in a plain text file so that I can find it again when I need it. Since I cut and paste it into their login form, I may even someday have a paste accident (no matter how much I try to avoid those). The net result of the website's security paranoia is that my account is now less secure.

(In theory I am sure that the website wants me to memorize my password. In practice, see that bit about the disagreement; the whole situation is not important enough for me to spend that effort. Besides, I picked a completely random password when I set up the account, since I was counting on the browser to remember it for me and a completely random password maximizes security against various guessing attacks.)

PasswordOversecurity written at 01:01:06; Add Comment

By day for May 2008: 18 29; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.