Wandering Thoughts archives

2006-05-27

Today's dilemma: wiki page or blog entry?

As time goes by, there's an increasing amount of stuff I want to write down. One of the things that stalls me out on this (apart from the sheer effort of writing) is the question of where and how I should put them; in particular, should I make a wiki page or write a blog entry?

In theory this shouldn't come up; WanderingThoughts is part of a hybrid wiki-blog, which I've extolled the virtues of before. In practice this is an illusion; the real difference between wikis and blogs isn't in the technology, it's in expectations and how pages are connected to each other. The two worlds cannot be so neatly joined, because they work in ways that clash.

Blog pages are only weakly connected to each other (sometimes they're only linked by time). This makes them easy to create but hard to navigate through to find anything as they grow. Categories, tags, and 'related entries' setups are all attempts to layer on something like the strong web of connections wikis have between their pages.

(Partly this happens because blog entries don't have to be even conceptually connected to each other, whereas wiki pages usually exist precisely because of their connections to other wiki pages.)

This is where the expectations come in. Blog entries are points in time (partly because the primary presentation is by time), which makes revising one significantly over time feel more than somewhat odd. The way you revise a blog entry to account for a better understanding of the subject is to write a new one (and maybe add an 'Updated: see ...' to the old one).

The much lower friction for blogging creates a constant temptation for me to just write stuff down as blog entries instead of putting up a wiki page and hooking it into the overall structure of CSpace. The result is that some entries that have long-term interest are probably being lost in the darkness, because there is no really good navigation to them.

(Yeah, yeah, 'set up better navigation'. There's always more things to write, which is part of the problem; it's easy to walk away from a blog entry once you've written it. Possibly writing this entry will help prod me onto the thorny path of virtue, too.)

WikiPageVsBlogEntry written at 03:31:16; Add Comment

2006-05-26

The problem with treating RAID arrays as single disks

A lot of hardware RAID systems (whether controller based or SANs) like to present a multi-disk RAID array to the host operating system as a single device. While attractive, this can lead to hard to diagnose performance issues under load (well, a random IO load).

The basic problem is that you get the best performance by keeping all of your disks busy. But when you aggregate multiple disks into what the operating system sees as one disk, the operating system is going to schedule IO for one disk; the result is not necessarily even loading on all of the actual disks in the array.

The usual approach is to use SCSI TCQ or the equivalent to push as many outstanding IO requests into the array as possible, and let the array schedule it internally. The problem is that this is counting on statistics to keep things balanced, because the array can't selectively accept TCQ requests. If a disk backlogs, more and more IOs for it will pile up in the array's pool, potentially choking out other drives.

It may not take much of a hiccup, either, because arrays often have surprisingly low limits for how many outstanding commands you can push to them (partly because the controller itself has to store all the outstanding commands). SANs may suffer the most from this, since they often have to split the controller's pool over multiple hosts and multiple arrays.

(In general you may have remarkably low per-disk numbers; a 16-drive array set for 64 outstanding commands is only averaging 4 outstanding commands per drive, for example.)

Even when controllers are capable of lots of outstanding commands the operating system sees the array as a single disk, and many operating systems default to relatively low per-disk TCQ limits (because these are what makes sense for real, physical disks). In fact a lot of OS level queues are often sized to be sensible for physical disks, and so may need expansion when your 'disk' is actually a big array.

Also, even if you successfully push lots of commands into the array, you've moved IO scheduling from the operating system to the array, which means that any smart IO scheduling the OS is trying to do is ineffective. (In extreme situations, ordering guarantees may require the OS to stall write IO to the array.)

While one might say that this is no different from how modern disks hide their internal geometry from the outside world, it's not; however complex they are internally, modern disks don't have multiple (fully) independent mechanisms. By contrast, disk arrays fundamentally need to be driven in parallel to deliver their performance.

I wish I had some nice pat conclusion for all this, but I don't. Having OSes see RAID arrays as single disks isn't going to go away any time soon, so all I can suggest is keeping your eyes open about the resulting issues.

RaidArraysAsDisksProblem written at 01:30:22; Add Comment

2006-05-14

Absolute versus relative URLs in syndication feeds

I just changed DWiki to generate absolute URLs for the '(N comments)' links at the bottom of entries in my Atom syndication feeds, instead of the absolute path URLs it used to generate (URLs without the http://host/ portion) . This scrubs out the last non-absolute URLs in my syndication feed; URLs in the text of entries in the feed have always been absolute, because I'm cautious and cynical.

In theory absolute URLs are unnecessary in Atom entries, because Atom has rules for how to handle relative URLs. And if you believe that all feed readers properly implement those rules, I have a pony for you. In theory the programmers of the bad feed reader are bozos, because the Atom spec is clear; in practice, even with Atom people creating syndication feeds have a choice between purity and having your feed being widely read. Using only absolute links is one of the aspects of that choice.

(The difference between Atom and RSS in this is that who is the bozo is very clear.)

Back when I added syndication feeds to DWiki I made a choice to be pessimistic about feed readers getting relative URLs right all the time, and modified the DWikiText to HTML converter to generate absolute links for syndication feeds. The '(N comments)' link is generated separately, so I missed it; a problem report today validated my cynicism and pushed me to make this change.

(Depressingly, I believe the feed reader that had a problem was NetNewsWire 2.1; I had expected a bit better of it since it's well regarded.)

Other people feel differently, and deliberately stick to their guns in order to push the technology forward and so on. For example, Tim Bray uses fully relative URLs for the images in his feeds combined with XHTML and xml:base declarations (themselves relative to his feed's URL); the result is a nice test of proper XHTML and XML handling in feed readers. (Some fail, liferea included, but this encourages people to get them fixed.)

AbsoluteUrlsInFeeds written at 21:15:58; Add Comment

2006-05-04

A subtle advantage of simple wikis

One of the interesting things about programming DWiki is how it's wound up making me notice and appreciate the subtle cleverness of Ward Cunningham's original simple wiki design, in particular the choice of a flat page namespace. Many of the advantages are relatively technical but there are more abstract ones, one of which I stumbled into today:

You do not have to taxonomize information before you put it into a flat wiki; you just have to give it a reasonably good name.

When you have a page hierarchy of some sort, you have to worry about just where something new goes. And because cool URLs are permanent, you really want to think about it before you create the page, so you can put it in the right URL. In the extreme case you may have to create an entire hierarchy before you make one new page.

A flat page namespace doesn't have this issue; you just make up a decent name and go. You can layer taxonomies on top of it, but they're necessarily more fluid; more tags than hierarchies. And that means they can be added afterward, so you don't have to worry about them up front. Less worrying and planning, more actual writing.

(I still don't regret using a directory hierarchy in DWiki; it makes sense for what I want. But I can definitely appreciate the shade of the grass on the other side of the fence.)

SimpleWikiAdvantage written at 02:25:22; Add Comment

By day for May 2006: 4 14 26 27; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.