2006-03-28
Using threading to implement a 'busy' cursor (a tale from long ago)
A very long time ago, I spent a year being an Amiga programmer as part of a small team writing an application. We wanted it to have a busy mouse cursor (the general hourglass 'this is going to take a while' cursor good apps put up during long operations), but what with one thing and another, we wound up with two problems:
- we weren't actually sure what user actions would take a long time.
- we got to 'implement a busy cursor' rather late in the whole process.
This left us in a kind of pickle; we couldn't just mark all of the slow operations, and there was no central place to flip to a busy cursor if things were taking too long. In fact, the application didn't even process events during long actions. But the Amiga was a novelty at the time: a personal computer with a preemptive multitasking environment. This led us to a semi-ugly hack: put the busy cursor handling in a second thread.
We added a little 'I am not busy' spot in memory that got flipped on every time we went through the main event loop. A second thread periodically checked the flag and flipped it off; if the flag was already off, it changed the cursor into the 'busy' cursor. Later, the main event loop flipped the cursor back to normal if necessary. The preemptive multitasking took care of everything else.
(The actual implementation was slightly more complicated than this.)
I call this 'semi-ugly' because of the time lag between a long operation starting and the cursor turning into the busy cursor. In the worst case your operation would finish just after the cursor had gone busy, giving the cursor a kind of a time-lag stutter. However, the thread-based implementation was small and really simple, requiring basically no changes to the main application, which meant that we at least got some sort of busy cursor into the program.
I've always had a fondness for this hack; it was the first time where I really used multitasking, and it could only have been done on the Amiga.
2006-03-23
Side effects are bad: a personal illustration
One of the things they tell you in programming school is 'side effects are bad; you should avoid them if possible'. Apparently every now and then I need a reminder of this, because side effects were at the bottom of my feed whoops from last night.
One of the most expensive bits of DWiki is converting pages from DWikiText into HTML. Originally the conversion was simple and straightforward; stick wikitext in, get HTML out. But as DWiki grew more features, the wikitext started having additional information: whether people could see it, whether people could comment on the page, and so on. These days converting a page to HTML generates all of:
- the HTML
- whether the page restricts access and/or allows comments
- various fuzzily defined features of the page
- the title of the page, if any
The conversion routine returned the HTML and recorded everything else by
manipulating a per-request context structure; in other words, through
side effects.
One of the problems with side effects is that you forget them, which is just what I did when I added caching of the conversion process. I remembered the other two side effects and cached them (partly because I'd been working in the area recently), but forgot about page titles. When the cached data was restored, page titles weren't set and things reverted to default titles in feed entries.
(One of the reasons I forgot about the page titles is the other problem with side effects, namely that they're usually scattered over the code instead of centralized in one spot. If the renderer had returned all of the results in one structure and broken them up later, I might not have wound up in the pickle I did.)
Update, April 1st: I just discovered another side effect I had
forgotten; updating the timestamp for the request's Last-Modified:
header. Oh the irony; apparently I like playing April Fools' jokes
on myself.
2006-03-20
An optimization thought
Today's random optimization thought, brought to my mind by the process of trying to make DWiki run slightly faster:
If you can't make it faster, maybe you can call it less.
Caches are the degenerate brute force case of this, but program restructuring is better. Repeatedly calling the same core routine can be a sign that the upstream interfaces aren't really doing what higher levels want, so the higher levels have to do redundant work.
Besides, if you can figure out a way to call something less you don't have to worry about cache invalidation.
For example, DWiki's routine to get a page from the file store used to
first call store.exists() to see if the page existed, and then call
store.get() to retrieve a low-level file object. Since the file store
is deliberately paranoid for security reasons, this
resulted in redundant calls to the 'is this a valid filename?' routine
from the two file store routines. Some sort of cache would have been
the wrong answer; the right one was to fiddle the interface to .get()
so the higher level 'get a page' routine could stop calling .exists()
beforehand.
(The 'is this a valid path' routine is simple and small, but it got called enough to turn up in profiles. DWiki can do a lot of path lookups; with a couple of tweaks, a typical page dropped from 144 calls to 105.)
2006-03-18
Some little things Firefox gets right
One of the marks of good software is that it understands how users think it works, and makes itself work that way. Two little bits of how Firefox works make a good illustration of this.
Start with Control-+, the keyboard shortcut for increasing the font size. You may have hit this all the time, but let me ask you: do you type it with Shift? The top number row '+' is a shifted key; without shift, you have actually been typing Control-=. Firefox is specifically programmed to recognize that in addition to a real Control-+ (as you might type using the numeric keypad).
(In fact the irony grows; Firefox on Unix does nothing in response to the nit-pickingly correct Shift-Control-= version. This shows how infrequently people actually type it 'right'.)
A bigger example is Control-W, the standard shortcut for 'close window'. When I first tried out tabs I opened up a number of them, switched to one, finished reading it, and automatically closed it with Control-W. It wasn't until somewhat later that I realized how nice it was that Firefox hadn't just closed the entire window, as it had a nominal perfect right to.
This shows that Firefox has the right conceptual model of Control-W's behavior; to people, it's not 'Close Window', it's 'get rid of this web page'. So when you're using tabs, the right thing to do is close the tab not the entire window, and that's what Firefox does.
In both of these cases, Firefox is not sticking to a strict model of its behavior. Control-+ is not actually Control-+; Control-W is not always 'Close Window'. Instead it's going with how users think it works, and the result is much nicer.
(Yes, I know Firefox changes the 'Close' label in the File menu when you have tabs open. That's just another of those little things.)
2006-03-10
Why dynamic linking
Originally, shared libraries and dynamic linking was sold as the way to make (newly) bloated libraries tolerable: you'd only have one copy of the library on disk and in memory, no matter how many Motif applications you had running. (Motif in specific and GUI libraries were really the original poster boys for shared libraries.)
As R. Francis Smith notes in his comment on my previous entry, these reasons are not entirely relevant any more. Disk space is cheap, and (although I haven't measured) memory usage by libraries is not likely to be the big memory consumer for modern applications.
One big reason shared libraries and dynamic linking persists is simply that it's there. Having gone through all the effort to implement it, no one is going to stop using it. (There's a general lesson here.)
But there are three big things that shared libraries are good for these days:
- they are the underpinnings of
dlopen(), which has significant uses in its own right. - one step fixes for bugs and security problems in library routines.
- picking optimized CPU and system dependent routines at runtime.
The latter two are examples of runtime substitution, in these cases of
non-buggy functions and of optimized functions respectively. Runtime
substitution has also been used as a debugging aid, for example to
transparently add a tracking malloc() to programs that are suspected
of having memory leaks, and there are probably applications using it
to pull off other tricks.
The dynamic linking tax on fork()
Most people will tell you that dynamic linking is an unalloyed good, or at least that any effects on performance are small (for simple programs that don't cascade to a huge set of shared libraries). This isn't necessarily so.
A long time ago, back when dynamic linking was new and people were
suspicious of it, I conducted some timings of how dynamic linking
affected the speed of fork(). To my surprise, the impact was
significant, and on the hardware of the day it was actually worth
static-linking my shell.
Today, I dug up my test program from 1991 (which just repeatedly fork()s, has the child immediately exit(), and the parent wait()s for the child), and measured how much slower the dynamically linked version ran on various systems that I have convenient access to. The results are:
- Solaris 9 on an Ultra 10: 2.82 times worse
- FreeBSD 4.10 on an SMP Pentium II: 2.12 times worse.
- FreeBSD 3.4 on an SMP Pentium III: 2.48 times worse.
Linux needs a table:
| Kernel | CPU | How much worse |
| 2.6.14.4 | Athlon | 3 times |
| 2.6.14.4 | SMP Pentium III | 2.45 times |
| RHEL 4 2.6.9-derived | 64-bit SMP Opteron | 2.72 times |
| FC4 2.6.13-derived | Pentium 4 | 1.70 times |
| 2.4.32-rc1 | Pentium III | 1.84 times |
As an experiment, I statically linked the program with dietlibc as well as glibc on the two 2.6.14.4 machines. On the Athlon the dietlibc version was 7.7% faster, on the SMP P3 it was 10% faster. (The ratios in the table are against the static glibc version.)
I'm surprised that the SMP machines didn't pay a worse penalty than uniprocessor machines. It's annoying that Linux still has about the same penalty as Solaris, but on the other hand Linux forks pretty fast to start with; it's in the hundred microsecond range even for dynamically linked code.
One anomaly is that odd things seem to start happening on the FreeBSD 4.10 machine as the number of forks gets higher and higher; the execution times don't scale the way they should. (It's possible that some sort of PID or resource wrapping issue is responsible.)