Wandering Thoughts archives

2018-05-05

Modern Unix GUIs now need to talk to at least one C library

I've written before about whether the C runtime and library are a legitimate part of the Unix API. The question matters because some languages want to be as self contained as possible on Unix (Go is one example), and so they don't want to have to use anything written in C if at all possible. However, I recently realized that regardless of the answer to this question, it's essentially impossible to do a good quality, modern Unix GUI without using at least one C library.

This isn't because you need to use a toolkit like Gtk+ or QT, or that you need a C library in order to, for example, speak the X protocol (or Wayland's protocol). You can write a new toolkit if you need to and people have already reimplemented the X protocol in pure non-C languages. Instead the minimal problem is fonts.

Modern fonts are selected and especially rendered in your client, and they're all TrueType fonts. Doing a high quality job of rendering TrueType fonts is extremely complicated, which is why everyone uses the same library for this, namely FreeType. FreeType is written in C, so if you want to use it, you're going to be calling a C library (and it will call on some additional services from something like the C runtime, although apparently you can shim in your own versions of some parts of it).

(Selecting fonts is also a reasonably complicated job, especially if you want to have your fonts match with the rest of the system and be specified in the same way. That's another C library, fontconfig.)

There's no good way out from calling FreeType. Avoiding it requires either abandoning the good modern fonts that users want your UI to have, implementing your own TrueType renderer that works as well as FreeType (and updating it as FreeType improves), or translating FreeType's C code into your language (and then re-translating it every time a significant FreeType update comes out). The latter two are theoretically possible but not particularly practical; the first means that you don't really have a modern Unix GUI program.

(I don't know enough about Wayland to be sure, but it may make this situation worse by essentially requiring you to use Mesa in order to use OpenGL to get decent performance. With X, you can at least have the server do much of the drawing for you by sending X protocol operations; I believe that Wayland requires full client side rendering.)

The direct consequence of this is that there will never be a true pure Go GUI toolkit for Unix that you actually want to use. If the toolkit is one you want to use, it has to be calling FreeType somewhere and somehow; if it isn't calling FreeType, you don't want to use it.

(It's barely possible that the Rust people will be crazy enough to either write their own high-quality equivalent of FreeType or automatically translate its C code into Rust. I'm sure there are people who look at FreeType and want a version of it with guaranteed memory safety and parallel rendering and so on.)

UnixGUIsNeedC written at 00:13:25; Add Comment

2018-05-04

Why you can't put zero bytes in Unix command line arguments

One sensible reaction to all of the rigmarole with 'grep -P' I went through in yesterday's entry in order to search for a zero byte (a null byte) is to ask why I didn't just use a zero byte in the command line argument:

fgrep -e ^@ -l ...

(Using the usual notation for a zero byte.)

You can usually type a zero byte directly at the terminal, along with a number of other unusual control characters (see my writeup of this here), and failing that you could write a shell script in an editor and insert the null byte there. Ignoring character set encoding issues for the moment, this works for any other byte, but if you try it you'll discover that it doesn't work for the zero byte. If you're lucky, your shell will give you an error message about it; if you're not, various weird things will happen. This is because the zero byte can't ever be put into command line arguments in Unix.

Why is ultimately simple. This limitation exists because the Unix API is fundamentally a C API (whether or not the C library and runtime are part of the Unix API), and in C, strings are terminated by a zero byte. When Unix programs such as the shell pass command line arguments to the kernel as part of the exec*() family of system calls, they do so as an array of null-terminated C strings; if you try to put a null byte in there as data, it will just terminate that command line argument early (possibly reducing it to a zero-length argument, which is legal but unusual). When Unix programs start they receive their command line arguments as an array of C strings (in C, the argv argument to main()), and again a null byte passed in as data would be seen as terminating that argument early.

This is true whether or not your shell and the program you're trying to run are written in C. They can both be written in modern languages that are happy to have zero bytes in strings, but the command line arguments moving between them are being squeezed through an API that requires null-terminated strings. The only way around this would be a completely new set of APIs on both sides, and that's extremely unlikely at this point.

Because filenames are also passed to the kernel as C strings, they too can't contain zero bytes. Neither can environment variables, which are passed between programs (through the kernel) as another array of C strings.

As a corollary, certain character set encodings really don't work as locales on Unix because they run into this. Any character set encoding that can generate zero bytes as part of its characters is going to have serious problems with filenames and command line arguments; one obvious example of such a character set is UTF-16. I believe the usual way for Unixes to deal with a filesystem that's natively UCS-2 or UTF-16 is to encode and decode to UTF-8 somewhere in the kernel or the filesystem driver itself.

NoNullsInArguments written at 00:08:25; Add Comment

By day for May 2018: 4 5; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.