Wandering Thoughts archives

2012-10-12

SSL CAs have an impossible job (if you want them to be thorough)

In extremely idealized theory, the job of an SSL CA is to verify the identity of the entities that they issue certificates to. People are forever clamouring for real SSL CAs to live up to this idealized image; they want SSL certificates to 'mean something' instead of being given out to anyone who can scare up a domain name and a credit card that will pass basic billing verification. I've recently been struck by the somewhat depressing realization that this is an impossible job.

By this I do not mean that it would cost too much to implement real identity checking, or that it would be subject to all sorts of undetectable fraud and confusion, although both are true. What I mean is that even if you assume money is not an issue and other idealized situations, no system that involves human attention, checking, and judgement can possibly work at anything approaching the scale we need SSL CAs to work at.

The problem is that people habituate to things very fast. The human beings in SSL CA verification are essentially serving as gatekeeping sentries; their job is to dutifully inspect everything going past them just on the off chance that there is something wrong with it. Almost all of the time there isn't. It is human nature to habituate to this and from then on see what you expect to see (and not see what you don't expect to), almost regardless of what's really there.

What this means is that people almost literally cannot provide the verification you want unless they do it in such low volume that they can avoid habituating to good certificate requests. If people process certificate requests in volume, they are not really verifying them to the degree you want; over time, attackers will be able to slip any number of sufficiently small things past them. Or to put it another way, you've turned people into bad robots and as such they will robotically approve quite a lot of things.

(I'm not too broken up about this, since I don't think the model works in practice anyways for all sorts of reasons.)

SSLCAsImpossibleJob written at 03:41:55; Add Comment

2012-10-09

The negative results problem with search engines

Here is another problem that I see for new search engines, ones that people are unfamiliar with (to go with yesterday's other set of them).

Not everything is on the web, or at least not everything is findable in search engines with a sensible amount of effort. This means that when you get what is essentially a negative result in a search engine, you're confronted with a question. Did you phrase your query badly (for this search engine), or is there really no sensibly findable result for what you're looking for?

The more familiar you are with a search engine and how to write queries for it, the more you can feel confidant about which option is the correct answer. The less familiar you are with a given search engine, the more uncertainty you have. If you care about the results, this uncertainty means that with an unfamiliar search engine you're going to wind up spending more time convincing yourself that no, really, there really are no results for this (as you try different searches, different phrasing, and so on).

(This is the related issue with negative results that I was referring to in yesterday's entry.)

Sidebar: what a negative result is

The obvious negative result is 'no pages that match your search', but this rarely happens for most non-artificial searches these days. Instead, a negative result can be summarized as 'no useful links appear before your eyes glaze over trying to find them'. In practice, a useful link on page ten of your search results might as well not exist; you're extremely unlikely to see it.

SearchNegativeResults written at 00:01:19; Add Comment

2012-10-08

Acclimatization makes competition in web search engines hard(er)

A while back I gave DuckDuckGo a try (for various reasons that I'm not going to try to summarize here). One of the things the experience brought home to me is how hard it is for anyone to compete with Google in search, at least general Internet search. The crawling is hard enough on its own, not just because of the scope but also because you have to tacitly persuade a critical mass of webmasters to not block your robot, either out of hand or when it misbehaves (and apparently everyone's web spider misbehaves sooner or later, even Google's). But my DuckDuckGo experience has led me to conclude that the real hard bit is getting your query results to be good.

Part of this is simple quality of implementation issues, which should not be surprising. For all that people harsh on Google results and Google's recent moves to personalize them more and more, Google has spent engineer-decades of effort on improving search results and tuning them. It would be a little bit surprising if that work was easily duplicated or improved on.

But I think that this is also partly because long-term Google users have quietly learned how to write search queries that Google likes. And in fact 'learned' is not quite the right word for the process, because I don't think it's been a conscious one. Instead we've just acclimatized ourselves to Google much as it has to us, absorbing (and creating) an idiosyncratic collection of tricks and tools for getting the results we want. It's highly unlikely that these reflexive tricks will work the same or be as effective on any other search engine. The result is that until people unlearn those reflexes and acclimatize to the new search engine, no search engine is going to work as well as the old, familiar Google.

(I'm assuming roughly equivalent basic results. If a new search engine gives better enough results than Google's, it can overcome this effect. How much better depends on how big the effect is, which varies from person to person.)

This was pretty much the feeling that I had when I used DuckDuckGo. I was reasonably confidant that DDG knew the information that I wanted; I just didn't know the tricks to make it appear and to bubble up to the top of search results. As a result I went back to Google; whether or not the basic quality of search results is a bit better on one or the other and regardless of other factors, in practice Google was easier for me to get the results I wanted.

(There's also an related issue with negative results, but that's another entry.)

HardSearchCompetition written at 00:35:34; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.