Wandering Thoughts archives

2006-03-06

How not to set up your mail server (part 1)

From our SMTP server logs:

remote from [64.151.68.164]
HELO ADV-RH9-NOCP.YOUR-DOMAIN-HERE.com
554 Unresolvable HELO name: ADV-RH9-NOCP.YOUR-DOMAIN-HERE.com

There is such a thing as taking the instructions too literally. (Or perhaps the error here is in not following instructions; who knows.)

This appears to be a PLESK-based setup (judging from the web page on that IP address), which does not make me any better inclined towards these people. Sometimes I think it has become too easy to set up machines on the Internet.

For bonus points, the IP address claims to have the name 'customer-reverse-entry.64.151.68.164'.

sysadmin/HowNotToDoMailI written at 16:29:19; Add Comment

A thought about Technorati

Technorati famously has some problems indexing blogs. I believe that a lot of these issues may come down to something simple: Technorati's real problem is that it predates the syndication feed revolution.

Post syndication revolution blog search engines (like Google Blogsearch, Feedster, IceRocket, and Bloglines) are actually syndication feed search engines. They operate by finding feeds and mining them for the entries (or at least URLs to the entries, if your feed has partial text).

But before the syndication revolution there were no widespread syndication feeds to mine. Instead, you had to spider the blog web pages themselves and then reverse engineer the HTML to try extract the blog entries.

This seems to be how Technorati operates; we can see a hint of this in their publishers help page, where they ask people to add special markup that will give their parser more clues. And as Chris Linfoot has noticed, they don't seem to pull feeds very much.

Technorati can hard-code handling for common blogging sites and blog packages, but in general this sort of heuristic reverse engineering is a hard task that needs continued tweaking. It's not surprising that it's prone to problems; perhaps it's more surprising that it works as well as it does without more help from bloggers.

Fundamentally I think that being a pre syndication era blog search engine is a significant handicap for Technorati. Syndication feeds are simply at least an order of magnitude easier to parse and work with than raw HTML pages; until Technorati does as much as possible with syndication feeds, and only falls back to parsing raw pages as a last resort, it's going to be working harder than competitors like IceRocket for less results and more problems.

Honesty compels me to admit that there's a certain amount of sour grapes in this, because Technorati is barely indexing WanderingThoughts at all. (Yes, I ping them like clockwork when I post. Yes, my HTML and my feed validates (or at least the feed validated back when the feed validator would still validate Atom 0.3 feeds).)

web/TechnoratiProblem written at 02:52:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.