Wandering Thoughts archives

2006-01-23

Why case independent filenames are a bad idea

People keep pushing the idea of case independent filenames (and mocking systems without them), but the whole idea has three big problems: case conversion is a lot more complicated than people think, locking a character set encoding into your OS damages its ability to evolve, and case folding is language specific.

The handy example of the last is Turkish (and Azeri). In Turkish, the capitalization of 'i' is not 'I' but the dotted I (Unicode U+0130); the lowercase version of 'I' is not 'i' but U+0131 (LATIN SMALL LETTER DOTLESS I). If you don't ignore Turkish, your system has some very interesting decisions to make: what happens when a Turkish user creates files calls 'ISTAN' and 'istan', and a German user tries to open 'Istan' and 'iSTAN'?

(Judging from the Unicode SpecialCasing file, Lithuanian may be another example.)

English case folding is simple, but for other languages and character sets it gets tricky. Issues include:

  • single letters can be equivalent to multiple letters.
  • some case folding is context dependent.
  • in Unicode, straight case folding apparently may not preserve proper normalization.

(Since all of this can be beaten to death with enough code, it's only a pragmatic issue. For more fun, apparently Unicode is still revising the case folding rules for some characters.)

Because only characters have 'case', not bytes, the OS has to decide what character set encoding its filenames are in; combined with case independence, this means filenames can't reliably be arbitrary streams. If in the future people want to put filenames in your system that don't fit in your character set encoding, you're in trouble (a mistake that's been made as recently as the people who picked UCS-16). Unicode is generally held to be the last word on this particular issue, but that still leaves you needing an encoding; UTF-8 annoys the Asian languages, while UCS-32 annoys everyone evenly.

(While Unicode has interesting traps for the unwary, things like normalization forms are mostly irrelevant for the narrow issue of case independent filenames.)

Assuming that you brush the whole issue of Turkish under the carpet, supporting 'case independent' filenames still requires a great deal of code (and associated character data), involves a significant amount of fun at runtime with interesting pathological cases, and gives your operating system heartburn if it turns out that Unicode and your chosen Unicode encoding are not actually the last word in character sets.

(The best reference for this I've found is Globalization Gotchas, in the 'Text Transformations' section. A long discussion of Unicode case mappings is in one spot in Unicode standard annex #21, or rolled into the core standard as described here.)

(PS: writing this entry has vividly shown me that I can't spell 'independent' without prodding from spell.)

CaseIndependentFilenames written at 02:42:03; Add Comment

2006-01-17

The economics of CPU performance

Recently, Intel and AMD have been telling everyone who would listen that single CPU performance is more or less all tapped out, and that future improvements would come from various multi-* developments; multi-core, multi-processor, and so on. Ominous pronouncements have emerged about how programmers need to bite the bullet and move to (highly) concurrent programming if they want their gravy train to continue.

When reading this sort of news coverage, it's worthwhile to remember what sells CPUs and software. Namely: CPUs sell on better performance, but software sells on better features.

If Intel and AMD are unable to deliver better performance than current systems, their gravy train derails in a big way. But flat CPU performance still leaves programmers with years of features that they can add and sell. (Some new features need better performance to be feasible, but there are lots that don't.)

I think Intel and AMD talk like this partly because they would love to persuade programmers that they have no choice but to spend a lot of money to help Intel and AMD sell CPUs. This strikes me as a bad deal for the programmers, though.

(It also reminds me of Intel's story with the Itanium.)

EconomicsOfCPUPerformance written at 02:00:22; Add Comment

2006-01-10

The peculiar effects of grant funding at universities

One of the things that makes universities such peculiar computing environments is grant funding.

The straightforward effect is to inject uncertainty into the year to year budgeting process, since few grants have a sure renewal. Even when the total amount of grant funding stays more or less the same, who gets it (and thus what projects are funded) can vary a lot, with the attendant localized lurches. (The uncertainty of grant money may be one reason that people are often far more willing to pay for equipment and consulting than to hire staff.)

For all the agonizing that grant uncertainties create, this is actually the small effect. The big effect of grant funding is what it does to power balances, because most grant funding goes to people, not to the university's general budget.

People in companies have budgets. Grant funded people at universities have cold hard cash, and how they spent it is up to them. The immediate casualty is any plan to have a homogeneous environment by controlling purchasing, as grant funded people buy whatever equipment they like.

This also puts limits on irritating restrictions in the university infrastructure, because grant funded groups can just opt out. (This is how Unix spread at a lot of universities; early Unix ran on machines cheap enough that a research group could buy one out of grant funds. The subsequent withering of central mainframe computing is not a coincidence.)

Ironically, another effect is that grant funded people have a disproportionate amount of political power, because universities have evolved a lot of clever ways to extract money from grants (so many ways that most grants come with legal restrictions on what the money can be spent on). Often grant funded people club together to buy infrastructure, which can wind up effectively putting an entire department on their side.

UniversityMoney written at 00:23:05; Add Comment

2006-01-02

Universities are peculiar places

In particular, universities have a computing environment that is more peculiar than I think a lot of people realize (sometimes even people inside universities, because from some angles it is very easy to miss). I wrote about one example of this (and the practical effects it has) back in UniversityFirewalls.

One of the ways that they are peculiar, a way that I think is at the core of a lot of things, is this:

In most parts of most companies, communicating with the rest of the company is far more important than being on the Internet. (Hence, among other things, the enduring success of company firewalls.)

In a university, most groups are the other way around: if forced to choose they'd take being on the Internet, probably without having to think hard. A physics department would be annoyed if it couldn't talk with the rest of the university; it would be crippled if it couldn't talk with the rest of the Internet.

(Maybe it wouldn't be crippled in the short term. But in the long term you would find everyone working at home with GMail accounts and DSL lines, or something similar.)

UniversitiesArePeculiar written at 01:56:03; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.