Wandering Thoughts archives

2011-04-16

Cache validators versus cache invalidation

There are two general ways of handling out of date cache entries: cache invalidation, where you explicitly mark them invalid or remove them, and having validators, so that you check to see whether an entry is valid as you retrieve it from the cache.

Cache invalidation has a number of advantages. First, it's clearly superior to validators if you're running into the size limits of your cache, because it immediately frees up now-dead entries instead of leaving them around to steal space from the live entries that are actually useful. It's also probably more resource efficient, since it handles invalidation once (on write) instead of checking every time on reads. In most situations, cache entries are read more than they're written, ideally a lot more times; otherwise you're probably wasting cache space.

Validators have two potentially significant advantages, however. First, a validator only requires you to track what components got used when you build a page (or fragment, or etc). This is a far easier problem than knowing everywhere that depends on a particular component (so that the component can invalidate the cached versions of everyone who uses it when it changes), partly because you naturally discover this information in the progress of building a page or fragment.

The second is kind of an anti-requirement; with cache invalidation you have to be able to take immediate positive action on changes. If changes can happen behind your back, outside of your system, this doesn't work too well. Validators cope with this situation.

(To borrow from something I read via Hacker News recently, cache invalidation is edge triggered while validators are level triggered. Edge triggered systems require you to catch every edge transition and bad things happen if you can't.)

PS: since DWiki is file-based, almost everything happens behind its back and so it has to use validators for its cache.

(This is a secondary reason why requiring an explicit publication step is on my list of things I would do differently in a new file based blog.)

CacheValditorsVsInvalidation written at 01:22:18; Add Comment

2011-04-08

An example of modifying a Firefox extension

I've written vaguely a while back about modifying Firefox extensions. I just needed to do this again, so this time I am going to write down a much less vague and more detailed discussion of what I did.

The basics are that I modified the current version of All-in-One Gestures to add new 'View page in no style' and 'View page in basic style' actions that could be associated with a gesture (these are normally accessible through View → Page Style). I wanted this because I use 'view page in no style' fairly frequently to deal with websites with bad formatting, bad colour choices, and so on.

The procedure I followed goes like this:

  • find the directory for this extension, following the outline in the previous quick intro.
  • find the actual .jar file, which in this case was chrome/allinonegest.jar.
  • copy the .jar to /tmp and unpack it in a subdirectory, in this case /tmp/b.

  • look through the resulting collection of files for likely files to look at to find the code for actions. I settled on content/allinonegest/gestimp.js, which turned out to be what I want; it had a big array of the available actions with the code that they ran. However, it did not have the actual names of the various actions, just labels for them like g.pasteAndGo.

    Having found the right place, I added the basic skeleton of an implementation in the form of two table entries to call aioNoStyle() and aioBasicStyle() functions, and stub implementations of those two functions.

    (The 'aio...' names are the style of function naming that was already in use in gestimp.js. When I'm hacking stuff into foreign code like this I tend to follow its style whenever possible.)

  • grep through all of the extension's files to find out where the names of the actions were defined, using one of the existing action labels. This turned up a bunch of locale files for various languages, of which I only cared about locale/en-US/allinonegest/allinonegest.properties.

    Having found the right place, I added appropriate text names for my new actions.

    (If I was doing this for general use I would need to figure out what to do in other locales. Since the only locale I use myself is en-US, I don't care and I can take a shortcut.)

  • it turned out that I wasn't quite done yet, because this didn't make my new additions appear in All-in-One Gestures' preferences system. Searching through the extension's files once again for an existing action label turned up content/allinonegest/pref/customize.js, which had a big array (again) of all of the available actions. So I added my new actions to this array.

  • now I needed an actual implementation of turning off and on the page styles. The easiest way to do this was to steal it from the existing Firefox code; in order to find that code, I worked backwards from the menu name using my copy of the Firefox source.

    • search through the Firefox source for where the text 'No Style' occurs. This pointed to browser/locales/en-US/chrome/browser/browser.dtd, and inspection of that showed that the label for this text was pageStyleNoStyle.
    • search through the Firefox code again for where pageStyleNoStyle is used, which turned up browser/base/content/browser-menubar.inc. Inspection of this showed that this menu item had JavaScript defined that simply called setStyleDisabled(true);.

    (As a cautious check I then found setStyleDisabled()'s implementation and verified that it looked usable and, in this case, simple.)

  • having found what I needed to do, I modified gestimp.js to change my stub implementations of aioNoStyle() and aioBasicStyle() to just call setStyleDisabled() with appropriate arguments. (I correctly guessed that setStyleDisabled(false); would do the 'view page in basic style' action.)

  • repack the jar by running zip -f ../allinonegest.jar in /tmp/b.
  • install the new .jar by quitting Firefox, making a backup copy of the original .jar, copying the new .jar into the All-in-One Gestures extension directory, and restarting Firefox.

With all of this done I could call up the addon preferences for All-in-One Gestures and add actual gestures for my new actions.

(As you might guess, this actually took two iterations; on the first iteration I didn't know that I needed to modify a third file to make my new actions appear in the AiO preferences.)

After I did all of this I unpacked a pristine copy of the extension in /tmp/a and made myself a diff of the two trees for future use. For example, should All-in-One Gestures ever have another official release and I need to re-apply my modifications.

(Sadly, AiO appears to be more or less abandoned by its original author. This is a real pity and is one of the reasons that I don't feel all that enthused about Firefox 4.)

AllInOneCustomization written at 23:59:02; Add Comment

2011-04-04

Why logging to syslog is different than logging to standard error

In the previous entry I wrote that logger is not a good solution for converting a program that logs to standard error into a system that logs to syslog. In fact, the issue is more general than just logger; logging to standard error is significantly different from logging to syslog that it is hard to turn the former into the latter.

The issue is that syslog handles a bunch of important things for you that you have to handle yourself in logging to standard error. Syslog automatically attaches a timestamp, the program name, and the process ID (if you ask it to). Since all of these are important pieces of information, when you log to standard error you should be supplying them yourself in order to provide complete logging. If something then simply echoes your full log lines to syslog, you get at least duplicate information; the logs will have timestamps from both syslog and your own program.

(As far as I know there is no way to tell syslog to use your timestamps instead of its own.)

There are a number of solutions, all of them not very great:

  • you can have a logger equivalent that knows about the format of your messages and so strips off the redundant information before feeding it to syslog. It's clearly specific to your program.

  • your program can skip putting timestamps (and perhaps other information) on log messages, leaving that to another program to do if necessary. You now intrinsically require a second program in order to get decent logs, even if you are only logging to a file. Better supply at least a pointer to such a program in your documentation, and expect sysadmins to be unhappy about your system no longer being a self-contained thing.

  • you can just have your program stutter, putting redundant information into its syslog messages. This makes syslog a second class citizen in your logging world.

The net result of all of this is that it is much easier to start with a program designed to log to syslog and extend it to log to standard error than to take a program designed to log to standard error and bolt on an extension to divert the straight log messages to syslog.

(If you can change the program itself, you can do better. But the kind of people who have their program log to standard error and nothing else generally aren't interested in taking patches to add syslog logging. If you're lucky they might take patches to add an option to turn off timestamps, although now the main program configuration becomes dependent on how you're doing logging.)

Sidebar: the problem abstracted

What this boils down to is that log lines are not simply plain text. Instead they are data (the log message) plus some metadata (the time, the program name, the process ID, etc). Many variant logging systems will want to do custom things with various pieces of the metadata; for example, if you're pushing logs into a SQL database you probably want to put the bits of metadata into separate database fields.

The response to this view is that there is more useful metadata than this in typical log messages, so much metadata (sometimes) that there is no sensible way for the programs generating messages to split them up for you. Since you need some custom parsing no matter what, you might as well bite the bullet and do custom parsing of plain text to extract all of the metadata, even generic metadata like the timestamp.

I don't really like this response, but I think it's probably true in practice; I suspect that everyone who is routing log messages from standard programs into custom monitoring solutions has probably written their own parsing and analysis systems that are specific to the exact programs they're dealing with.

(Well, specific overall. I'm sure people write generic components such as a generic log message parsing program that is driven by pattern matching, but it uses program specific patterns that you write.)

SyslogVsStderr written at 17:00:10; Add Comment

Logs are not just streams

Via Hacker News, I recently read Logs Are Streams, Not Files. In it, Adam Wiggins suggests that because logs are just streams, your program should just write log messages to standard error as the most flexible, most stream-based way of operating on Unix.

I'm sorry, I'm a sysadmin. I just saw red.

Logs are not just streams. Logs are crucially different from streams in one respect; logs can and in fact must be segmented and sectioned periodically. In the sysadmin business, we call this 'log rolling' (or 'log rotation'), and we do it both because we don't have the disk space to keep all of your log messages forever and because it's much easier to work with relatively small chunks of logs that cover restricted time intervals than to have to work with everything from the start of time.

Writing log messages to standard error does not cope well with log rotation. To rotate a log, your program needs to stop writing to one file and start writing to another one, and standard error is simply a stream; even if we ask your program nicely, it cannot close standard error and then reopen it. This means that a program that logs to standard error cannot be deployed as a standalone entity; we cannot run your program as just 'program 2>/var/log/program', because that would give us no way to rotate /var/log/program. Instead, your program's output must be fed to another program that actually does the work of logging things in some manner.

This is sort of okay if you provide us with this extra program and the infrastructure to configure it and run it along with your main program. But if you simply hand us your main program and claim that its ready to go, we see red because now we have to supply the program (and the infrastructure) to actually make your logging work.

(Perhaps Unix should have a standard program for this, but it doesn't. Note that logger is not a good option for this even if we want things to go to syslog.)

LoggingAndStreams written at 00:24:56; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.