Wandering Thoughts archives

2013-12-29

Broad thoughts on tags for blog entries

Yes, I know, tags are on my mind lately. In particular I've been thinking about what I want to do with them. Ultimately what it comes down to is supporting real blog usability, specifically both encouraging and rewarding blog visitors for exploring outwards from whatever entry they initially landed on. When I wrote that entry I said that the most powerful way to do that was probably some sort of 'related entries' feature; tags are an obvious way of providing that.

There is an important corollary: for this to work, the tags must not merely lead to related entries but they must be related in a way that your visitors are interested in. Some tags will be too general to be useful (these are really broad categories) while others will be too uninteresting or obscure. This means that creating useful tags requires thinking about the relationships that visitors will want to explore; in other words, what about any particular entry that people will want to read more of.

(This is one reason that I think tags will be somewhat retrospective; you won't necessarily realize those interesting relationships until you have another entry to relate the first entry to.)

Also, tags aren't enough by themselves because they are too unspecific. There are at least three sorts of more specific relationships that I think will get lost in a general tag cloud and should be handled differently by at least the blog's UI: 'related entries', the more specific form of related entries that is 'entries in a series', and 'this entry is updated by ...'. Related entries is more specific than merely sharing tags; one way I can look at it is entries that share a topic even if they aren't specifically a series. Entries that are one entry in a series should have strong support in the UI for real blog usability because these are the entries that a visitor is most likely to want to read more of if they liked their initial entry.

(So in UI priority it should be 'this entry is updated by ...', 'entries in series', 'related entries', and then general tags, based on what I expect visitors to be most interested in and what's most important.)

In thinking about this I've wound up with the feeling that tags are going to work quite well for certain sorts of entry to entry relationships but not necessarily very well for others. Probably I won't fully understand this until (and if) I implement some sort of tags and other relationships in DWiki and start using them.

(As a result, any scheme I set up in DWiki should be flexible about what sorts of relationships it can associate with an entry or a bunch of entries. I will probably want to use it for more than tags.)

TagsBroadThoughts written at 02:42:28; Add Comment

2013-12-27

A reason to keep tags external in 'entry as file' blog engines

In EntryAsFileTagProblem I ran over the problem 'entry as file' blog engines have with tags (because they need efficient two-way queries on the mappings between tags and entries) and suggested that one solution was a completely external mapping file (or files) of tag information. I've since realized that there is an additional reason to like this approach.

Put simply, having tag/entry mappings in an external file allows you to change the tags associated with an entry without editing the actual entry's file; especially you can retrospectively add tags to old entries. This is based on my feelings about two issues (feelings that other people may not share).

First off, I think that a decent amount of tagging is probably going to be done after the initial publication. Tagging is taxonomy and sometimes it's only going to be obvious when you write the second (or third, or whatever) entry that touches on a particular thing. In addition I'm biased against single entry tags (they're not merely pointless but distracting) so I'm not even likely to put in obvious tags unless I'm relatively confident that I'll write at least a second entry with the same tag.

(Fundamentally tagging as exposed in a blog is about luring people to read additional entries by giving them a way to follow interests. If you're interested in a particular tag, you can find and read other entries that have that tag. If there are no other entries when you click through the tag's link, I've wasted your time. I can use single-entry tags internally for tracking or taxonomy purposes, but I shouldn't expose them to visitors.)

Second, I'm strongly biased against modifying entry files after their initial publication; I would like to do it as little as possible. If the master source of tag information is in the file and it's common to modify tags after publication, well, I'm going to have to edit entry files much more than I'd like. Putting the same information into a separate set of files is less problematic this way.

(One issue with editing entry files is that it opens you up to making larger edits than you intend, because the tag metadata is mingled with other metadata and the actual entry text. No matter what you do to a tag metadata file, it only affects tag metadata.)

EntryAsFileExternalTagWin written at 02:13:19; Add Comment

2013-12-24

The 'entry as file' blog engine problem with tags

I've recently started feeling a desire for DWiki to support some form of tags for entries (for reasons involving my personal site). This is a problem, because DWiki is very strongly a file based blog engine, one where every entry is a file in a directory hierarchy and everything you see is just a wrapper around that. In fact this is yet another example of the metadata problem that file based engines have in general.

In the abstract the problem with tags is that you want two-way querying on them: you want to know what tags an entry has when displaying the entry and you want to know what entries have a tag when displaying the tag. This is a terrible fit for a pure file based engine, because in such an engine the entry's file is the only source of information about it. There's no problem embedding tag metadata in entries (you can invent a number of schemes for it), but this only lets you handle one of the two queries efficiently. To find all entries that have a tag you have to both find and read all entries; this does not scale, to put it one way.

(In the old days you could sort of shrug and live with a very slow tag to entry generation process on the grounds that very few people were probably ever going to explore through your tags. There are several things wrong with this, but the killer is that on the modern web, everything gets visited. Web spiders grinding your blog into the ground is no fun.)

If you try hard you can use the filesystem as a database to solve this problem; for example, you might make a directory for each tag and then symlink (or note) each entry into its appropriate tag directories. This gives you quick lookup of the tag to entry query (you list the directory) but it requires a manual step that can get out of sync with the actual entry. This is the point where doom starts to descend and you invent schemes like the blog engine automatically maintaining these tag directories as it reads entries.

You can abandon the idea of the entry as the canonical source of all information and maintain tag mapping information in a separate file that the engine reads and turns into the obvious set of internal maps. This has the advantage of being easy and probably efficient (it depends on how big the tag file gets) and if you want you can automatically generate the file from metadata you add to the entries themselves. But it gets away from the purity of the whole entry as file concept; now there's a separate pile of information alongside the entry file.

I don't have any answers here and I've probably missed some options. I'm just thinking out loud so far.

(Right now I'm most inclined towards tag mapping information in a separate file rather than trying to abuse the filesystem as a database. It answers both queries at once and it's less of an ugly hack.)

EntryAsFileTagProblem written at 02:08:59; Add Comment

2013-12-20

A realization: on the modern web, everything gets visited

Once upon a time, a long time ago, you could have public web apps that exposed a few quite slow heavy-weight operations and expect to get away with this because users would only use those operations very occasionally. These might be things like specialized syndication feeds or looking up all resources with a particular label (tag, category, etc). You wouldn't want to be serving those URLs very often, but once in a while was okay and it wasn't worth the complexity of making even the stuff in the corner go fast.

Then the web spiders arrived. These days I automatically assume that any visible, linked-to URL will get found and crawled by spiders. It doesn't matter if I mark every link to it nofollow and annotate it with a content-type that should be a red flag of 'hands off, nothing interesting to you here'; at least some spiders will show up anyways. The result of this is that even things in the corner need to be fast because while humans may not use them very often, the spiders will. And there is are a lot of spiders and spider traffic these days (I remember seeing a recent estimate that over half of web traffic was from spiders).

(Spiders probably won't visit your really slow corners any more than the rest of your site. But unlike humans they won't necessarily visit them any less. URLs are URLs. And if your slow corners are useful indexes to your content, spiders may actually visit them more. I certainly wouldn't be surprised to find out that modern web crawlers keep track of what pages provide the highest amount of new links or links to changed content on an ongoing basis.)

One more or less corollary of this is that you (or at least I) probably want to plan for new URLs (ie, new features) to be efficient from the start. In the old days you had some degree of ramp up time, where you could deploy an initial slow version, see it get used a bit, tweak it, and so on; these days, well, the spiders are going to be arriving pretty soon.

(I have very direct experience that it doesn't matter how obscure or limited your links are; if links exist in public pages, spiders will find them and begin crawling through them. And one single link to an island of content is enough to start an avalanche of crawling.)

PS: all of this only applies to public web apps and URLs, and so far only to GET URLs that are exposed through links in HTML or other content. Major spiders do not yet stuff random things into GET-based forms and submit them to see what happens.

EverythingGetsVisited written at 01:54:49; Add Comment

2013-12-19

Your (HTML) template language should have conditionals

You could call this a war story, or just learning from my painful experience.

Back when I was first writing DWiki and didn't have enough experience to know better, I decided that it should have a quite simply template language because it would be easy to implement and it would avoid the temptation to do lots of things in templates. As part of this simplicity I decided that there would be no 'if' conditional operator.

(There is a slightly subtle reason for avoiding this. If you have no 'if' and no looping, you can do template expansion one operator at a time with text substitution because you need no lookahead; you just find the next operator, expand it on the spot, and insert the expansion. If you add 'if' or loops, you now have a pattern of '<if ...> text <else> other text <endif>' and you need some parsing.)

Of course I wound up needing conditional stuff anyways, so I invented conditional template inclusion and some other tricks. This preserved the simplicity of my template expansion code but had another drawback. Instead of having templates with conditional logic, I instead wound up with an explosion of templates; every separate 'if' clause basically had to become a separate (conditionally included) template.

(You don't want to know how the actual conditions are implemented. I swear, it made sense at the time.)

An explosion of templates is not a usability improvement over conditional logic in templates. Rather the contrary, because it's much harder to see and follow things across five or six separate files than it would be if everything was in one place. In the end almost any templating system is going to have some form of conditionality; the only question is where it emerges and how ugly the result is. I've come around to the view that having actual 'if' conditional operators is the least bad way even if they do complicate various things and risk your templates becoming increasingly hard to follow.

(And I know, real templating systems have dealt with all of these issues long ago and for any real project you should almost certainly not try to write your own templating system but adopt an existing one, which renders all of this moot. DWiki was a special case for all sorts of reasons.)

TemplatesNeedConditionals written at 02:46:14; Add Comment

2013-12-14

Why I'm not likely to use Chrome much in the future

I'll start with the story. For a while now, support for mouse gestures has been an important part of browsing for me. In Firefox I currently use FireGestures while in Chrome I used to use something called Smooth Gestures. Smooth Gestures started out as the same sort of completely free extension as FireGestures. After a while it grew a couple of sorts of ads that defaulted to on (and got reset that way periodically) but could be turned off if you wanted (in a typical move, you got a vague guilt trip about turning them off). Recently the developer of Smooth Gestures used Chrome's mandatory auto-update mechanism for extensions to push out an update that quietly makes some of the ads mandatory unless you pay for the extension. I uninstalled Smooth Gestures as soon as I found this out, as have quite a lot of people, and then I went looking for a replacement. What I found (especially among high ranked, apparently functional gestures extensions) is almost entirely things like Smooth Gestures or worse (one explicitly harvests theoretically anonymous information about your browsing, for example).

While the lack of gestures provides a good reason to not be fond of Chrome, that's not really why I now feel uninterested in it. The real problem is that this is the sign of a drastic difference in the culture of extensions between Chrome and Firefox, one that is very much not in Chrome's favour. To simplify things, Firefox started with a genuine FOSS extensions culture and as far as I can tell has mostly retained that. I don't know if you can charge for extensions now, but for a long time you couldn't and a great many highly used and core extensions are fully free and open source. By contrast, Chrome sure seems to have what I will call an 'app store' culture of extensions. You get Chrome extensions through the 'Chrome Web Shop', for example, and as we've seen many of them are no more free than the 'free' apps in the iOS and Android app stores. The behavior of the Smooth Gestures developers is perfectly in line with the norms of app store culture.

What this means to me is that I can't trust Chrome extensions any more. Even if I carefully inspect the documentation for an extension and it isn't lying about being harmless today, the mandatory silent auto-updates of extensions can change that tomorrow if the developer wants and the Chrome extension culture clearly condones and even possibly approves of this. I rather expect that to a lot of Chrome extension developers I am a resource to be monetized not someone to be respected. Given Google's general behavior I can't count on them policing this even if I thought they were interested in doing so, and I don't think they are (to put it one way, people who run a 'Web Shop' do not make money from refusing to list things in their shop).

I won't pretend that Firefox is immune from an individual extension trying to sneak something evil in (although I believe Mozilla has some sort of auditing procedure). However I fundamentally trust the Mozilla people to be on my side in a way that I don't for Chrome. I'm pretty sure that if any equally popular Firefox extension tried to pull what Smooth Gestures did there would have been a huge uproar and they might well have been thrown off addons.mozilla.org for misleading and sleazy behavior, either immediately or in response to community pressure. And pragmatically I think the culture of Firefox extension developers is such that people who would do this sort of stuff don't develop extensions for Firefox in the first place.

I still have a couple of Chrome extensions left and I'll probably keep them around. But my use of Chrome is now going to pretty much be only for my Chrome Incognito hack.

ChromeWhyNot written at 02:11:09; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.