Wandering Thoughts archives

2009-01-31

When can you assume UTF-8 filenames?

Here is an interesting question: when is it safe to assume that all of the filenames on your Linux machine are encoded in UTF-8?

The simple answer: it's only safe to make this assumption when you (and your system) have lived entirely within a hermetically sealed UTF-8 only bubble, never coming in touch with filenames from outside that bubble. Now, this is a pretty big bubble and it is slowly expanding, given that basically all new Linux machines default to UTF-8, but it still a bubble, and there are still lots of things outside it.

(Thus, if you are writing general software the actual answer is 'never'.)

Unfortunately this is an easy mistake to make. If you live within the bubble and are sufficiently far from its edges that they are out of sight, you can be ignorant of its existence (and many people probably are). And even if you aren't exactly ignorant of its existence, you can still be overly optimistic about the size of the bubble.

(It's also possible that I'm being overly pessimistic about the size of the bubble. But I don't think that UTF-8 only systems are anywhere near as universal as people would like them to be, and I do think that they are fragile; there are lots of ways for 'bad' filenames to seep into the bubble, including various programs that make no attempt to guess filename encodings and transcode filenames into valid UTF-8 when they unpack archives.)

Or in short: if everything you see is UTF-8, it is easy to assume that everything in general is UTF-8.

(See also why the kernel shouldn't try to enforce UTF-8 only filenames.)

AssumingUTF8Filenames written at 00:37:10; Add Comment

2009-01-30

A surprising lack on Linux: browsers for camera RAW photos

Linux has a reasonably good set of open source programs for processing and editing photographs shot in the various camera RAW formats (ufraw is the one I have the most exposure to). But, to my surprise, it seems to be mostly lacking image viewers that I could use to browse through my pictures to pick out the ones that are actually worth processing. Or at least image viewers that meet my criteria, which include 'must not want to own the world'.

(This criteria disqualifies DigiKam and F-Spot, among others, although apparently at least F-Spot may have a mode that avoids that issue.)

Of the programs I've looked at so far:

  • ImageMagick can display RAWs, but only by running them through ufraw, which is both overkill and very slow.

    (It is overkill because basically all RAW formats already include a full-sized JPEG version of the picture, as well as a thumbnail. So a quick browser doesn't really need to understand and process all the actual RAW formats, it just needs to extract the JPEG version, for which there are well-developed libraries.)

  • ufraw has no browsing at all.
  • Raw Therapee and Rawstudio can both browse but they're more processing and editing applications.

    (Also, Raw Therapee isn't open source, just free, and Rawstudio currently doesn't like my camera's RAWs.)

  • gthumb sort of mostly works but with various issues, as does geeqie (with more issues).

There are probably other RAW-capable image browsers and viewers that are packaged for Fedora, but this is where I make a grumpy observation that Fedora doesn't seem to have a web page that breaks down Fedora RPMs by category.

(Also, there's a mid-2008 discussion of this general subject here.)

RawBrowserLack written at 00:25:28; Add Comment

2009-01-29

An RPM packaging utter FAIL

Every so often, I run into examples of bad RPM packaging. Today's encounter takes the cake for utter packaging failure, though:

The RPM for VMWare Workstation 6.5.1 contains exactly one file, which the postinstall script runs and then removes. This one file is of course the regular self-extracting installer that VMWare gives you if you opt for the non-RPM version of VMWare Workstation, which means that it dumps all sorts of things over your filesystems, thereby completely defeating almost all of the purpose of installing an RPM in the first place.

(One of the things defeated is the ability to automate the installation of RPMs, because it quizzes you about agreeing to the license agreement. They went out of their way to enable this in the postinstall script.)

This is especially sad because previous versions of VMWare Workstation had reasonably well done RPM packages. I speculate wildly that VMWare felt that it was now too much work to package things up properly, but if that was so I wish they would just stop having an RPM package entirely; it would be more honest, and I'd know what I was getting into ahead of time.

(I disagree about the amount of work, as you might expect. If you are packaging something in any way, you ought to have a list of the files that get installed and where they get installed to, and that is all you need to have in order to generate a sensible RPM. Just run your installer and package up the extracted files using the list.)

VMWareRPMPackagingFail written at 17:52:55; Add Comment

2009-01-19

Using iptables to get around the policy based routing limitation

A while back I discovered a limitation in Linux's policy based routing, where you couldn't use the straightforward means to flexibly route outgoing traffic on a dual identity machine over different interfaces (for example, to force all SSH traffic to flow over one link, regardless of the destination). At the time I wrote:

To fix this situation up, you need to change the source IP address of the packets to fix them up. Unfortunately the only way I know of doing this is to use source-NAT on appropriate outgoing packets, [...]

You know what? Sometimes I'm too obscure for my own good. Since I tripped over this today, let me be explicit about what iptables rules I need to use, because this paragraph in my original entry led me to try to do this with SNAT alone, which doesn't work.

First, start with the basic dual identity routing policy based setup. Then I need:

iptables -t mangle -A OUTPUT -p tcp --dport 80 -j MARK --set-mark 2048
ip rule add fwmark 2048 priority 4999 table 11

iptables -t nat -A POSTROUTING -m mark --mark 2048 -j SNAT --to-source O

(Since this uses iptables marks to select what to act on, additional things to redirect can be set up with only an additional iptables rule to mark them. Also, this assumes that O is not the default route choice.)

We need all three pieces because SNAT alone won't (and can't) change the outgoing interface; the outgoing interface is set by the time the packet goes through the POSTROUTING chain (as the chain's name says), and SNAT can only be used there. Thus the first two lines will route the outgoing packets properly but with the wrong origin IP address, and then SNAT fixes the origin IP address up without altering the routing.

(If you just use SNAT, you get packets with the right origin address going out the wrong interface. If these packets still get to the destination, you might not notice this for a while.)

ForcingOutgoingInterface written at 01:52:22; Add Comment

2009-01-17

Practical issues with getting ZFS on Linux

When discussions of ZFS on Linux (for real, as more than a user-level filesystem) come up, the usual issue that gets brought up is the licensing problem; Sun's CDDL is incompatible with the kernel's GPL requirement. But Sun could always change that if they wanted to, and I think there's another, more serious problem.

To put it simply, my impression is that the Linux kernel people are generally strongly opposed to what I could call 'code drops', where foreign code is parachuted into the Linux kernel. They want code in the Linux kernel to be real Linux kernel code, in other words to look like it was actually written for the Linux kernel, to be in the same style and use the same idioms as other Linux kernel code. They do not want compatibility layers, a completely different style than the rest of the kernel, and so on.

(The reasons for this are very sensible; 'foreign' code imposes a maintenance cost on everyone who has to deal with it, and if it is in the Linux kernel that potentially means every kernel developer.)

An approach where the hypothetical ZFS in Linux code was basically the Solaris ZFS code base with a compatibility layer to provide Solaris kernel APIs and suchlike on Linux would be unlikely to be accepted by the kernel developers; from their perspective, the long term costs imposed by such an approach aren't worth the gains. To get ZFS into Linux, it would almost certainly need to be significantly modified in order to fit into the rest of the Linux kernel code.

(This isn't just a matter of reformatting the code and calling different functions for things like memory allocation. How the Linux kernel likes to do things is almost certainly significantly different from how the Solaris kernel works, so the code would probably require significant structural modifications to work the Linux way.)

This has two problems. The lesser one is that it's a lot of work, much of it grindingly picky and uninteresting, that needs to be done by someone with enough Linux kernel experience to write code that fits nicely into the Linux kernel. The bigger one is that such a code divergence between 'Solaris ZFS' and 'Linux ZFS' would make it hard to keep the Linux code up to date (or to adopt fixes from Linux back in to the main code base), which implies a lot of work on an ongoing basis (and creates practical concerns for people thinking of using Linux ZFS).

(The one example of something similar to this being tried is SGI's work to get XFS into the kernel. In the end I believe that it took years of significant work on SGI's part, and that it did indeed require restructuring how the code worked. I don't know if SGI was able to maintain much commonality between the Irix XFS code and the Linux XFS code, or if they basically forked once and stayed diverged.)

ZFSOnLinux written at 03:16:12; Add Comment

2009-01-14

Documenting the kernel.sem sysctl

Programs on our web server machine recently started complaining about being unable to set up semaphores (well, once we figured out what the error message meant). This rapidly sent us on an expedition into the underdocumented mists of the kernel.sem sysctl, and so I'll write down what I've learned about what I think is going on.

In the grand style of System V IPC in general, what programs allocate is not semaphores, but semaphore arrays (officially called 'semaphore sets'). A semaphore array has one or more actual semaphores (pretty much always one these days) and a unique semaphore identifier. In ipcs -s output, these are nsems (how many semaphores are in the array) and semid respectively.

The kernel.sem sysctl lets you set three different limits related to this:

  • how many semaphore arrays can be allocated (SEMMNI, the fourth field).
  • how many semaphores can be allocated in total (SEMMNS, the second field).
  • how many semaphores can be in a single semaphore array (SEMMLS, the first field).

(The third field has to do with the semop(2) system call.)

The limit you're most likely to run into is how many semaphore arrays can be allocated, which is a pretty low number (128, regardless of how much memory you have). I believe that bumping it up is pretty much completely harmless, especially as all the kernel uses this value for is to limit how many semaphore arrays can be in existence at once.

As a side note that's unlikely to ever be important, there's an undocumented hard limit of 32768 on SEMMNI (IPCMNI, in include/linux/ipc.h).

(For more details on all of this, see the proc(5) and semget(2) manpages. Unfortunately the proc manpage doesn't have a pointer to semget; if it did, things would be much less confusing.)

SemSysctlExplained written at 01:34:04; Add Comment

2009-01-07

Two suggestions for improving Fedora's PreUpgrade experience

Having just gone through a PreUpgrade experience (updating from Fedora 8 to Fedora 10, yes I am behind the times), I have two suggestions for how the whole experience could be made even nicer.

First, there should be an option to apply the current updates for the new distro version as part of the upgrade process, or at least to download them all in advance so that when I reboot and immediately do a yum update, I do not then have to sit around as my machine downloads another gigabyte. The nice implementation of this would merge the updates into the base install, but that might complicate Anaconda's and PreUpgrade's life too much. The ideal implementation would be capable getting updates from well known third-party repositories as well, so that you had exactly the RPMs that a post-reboot yum update would install.

(In fact, all installers should offer this option on Internet-connected machines. The very first thing I want newly installed machines to do is to get current on security patches, and doing it before they reboot and become live is better than doing it afterwards. I am willing to sit around in the installer waiting for the downloads, since I am only going to do this afterwards anyways.)

Second, make the upgrade environment a LiveCD environment, so that I can still do things while I am waiting for the upgrade to complete. LiveCDs are a great idea for all sorts of reasons but especially so during upgrades, which otherwise take your machine away from you when you had previously been productively using it. As it is, I can tell that I am going to be taking a long walk or two when I upgrade my home machine.

(I will also have to figure out how to avoid having the machine automatically reboot at the end of the upgrade process, since I am a bit nervous about upgraded machines coming up live without me around to watch them.)

Yes, I am aware that perhaps I should just get over my cautions and do a live yum upgrade. Maybe next time.

(The drawback of live yum upgrades is that it's much more complicated to download all of the necessary RPMs in advance, so that your machine does not spend a day and a half fetching things. I suppose that this is a good reason to figure out how to set up and use local yum repository mirrors.)

PreUpgradeImprovements written at 00:38:27; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.