2012-05-24
Today's Mercurial command alias: a short form hg incoming
Like other modern VCSes, Mercurial
allows you to define command aliases. This is a feature that I don't
use as much as I could, but every so often I work out one that's really
handy. Today's alias is something I've called 'hg pending', a short form
summary version of 'hg incoming'.
From my .hgrc:
[alias]
pending = incoming --template '* {desc|firstline}\n'
This shows something that looks like:
* exp/html: detect "integration points" in SVG and MathML content
* runtime: faster GC mark phase
* cmd/cc: fix uint right shift in constant evaluation
* cmd/6g: peephole fixes/additions
When I'm checking up on what's coming from an upstream repository that I'm tracking, this is basically just what I want; I don't care about who authored the change or its long description or the other full information, I'd just like to get a quick overview of what I'll get with a pull.
(This assumes that you are tracking Mercurial repositories that use the decent and correct format for commit messages.)
Possibly I should make a version of this for hg log too, but I
haven't felt the urge yet.
(I have no strong reason for calling it 'pending' instead of anything else and there is probably a better term for it since I am bad with names.)
Bonus: sdiff, a sysadmin-friendly diff alias
Another Mercurial 'alias' (sort of) that we've become very accustomed
to is something we call hg sdiff:
[extensions]
hgext.extdiff =
[extdiff]
cmd.sdiff = diff
opts.sdiff = -Nr
This gives us an old-style diff of changes. This is not something you would use as a programmer, but for sysadmins it turns out that the context from unified diffs is basically pointless for many changes to configuration files. In fact, zero-context diffs can be easier to read because there is less clutter obscuring the actual changes.
(The one drawback of this is that Mercurial doesn't report the file
that has changed if there's only one. 'hg status' is your friend
here.)
This especially matters to us because we often record configuration
file changes (in some diff form) in worklog entries (which also helps other sysadmins to
stay on top of the changes). When we
initially switched to Mercurial from RCS,
the increased verbosity of the default hg diff unified diff output
became vaguely annoying. Creating hg sdiff helped significantly.
(Having written this, I guess we could experiment with forcing zero lines of context for unified diffs. However we're well accustomed to old-style diff output as it is, so I suspect we wouldn't really find it much of an improvement.)
2012-05-23
Some notes on using XFT fonts in TK 8.5
Due to Ubuntu 12.04 switching to TK 8.5 by default, I recently discovered that I had never migrated to using XFT fonts in exmh when I moved the rest of my environment to them (and sort of to UTF-8). Deciding to fix that led me into a number of adventures, partly because exmh itself generated XLFD font names in a number of important places. So here is what I have learned about XFT font support in TK 8.5.
To start with, TK 8.5 has the XFT font name problem. It accepts a very limited subset of
normal XFT font naming, as seen in the font(n) manpage:
family [size [style ...]]
(You cannot supply styles without giving a size; this is not clearly stated in the manpage.)
One thing you cannot do with the styles is specify that you only
want a monospace or a proportional font. The best you can do is use
'font metrics ... -fixed' to see if you actually have a monospaced
or proportional font (see font(n) for the details). This implies
that unlike with XLFD font names, there is no way to ask for a generic
monospaced or proportional font; the closest you can come is ask for one
of the generic font names (either the customary XFT generic names or TK
8.5's guaranteed to be there generic names).
Although it is not documented as such, if you are specifying this as
all one string (such as in a -font argument), the family must be a
single word without spaces. This is rather unfortunate, because in the
XFT world it is common for font families to have multi-word names (eg
'DejaVu Sans', 'Times New Roman', and so on). You may have to create
some single word FontConfig aliases in order to use the fonts you want
in a TK program.
(This limitation goes away if you use the various font commands to set
up fonts; if you specify the family directly with a -family argument
it can be a multi-word family name. But that may require a significant
rewrite in a program that was written to use -font with font names
assembled on the fly, as is the case with exmh.)
By implication and corollary, TK 8.5 doesn't support the 'family1,family2,...' XFT search notation for family names. If you want to search among multiple alternatives, you need to code it yourself. This is made more difficult by the next issue.
Setting an XFT font as something's font always succeeds (assuming that the font string is syntactically valid). Regardless of whether or not the font you asked for actually exists, using it as a font will never give errors. If your font does not actually exist, TK gives you, well, something. This is especially dangerous because the styles you can specify do not include monospace versus proportional; instead, I believe that you always get the default proportional font if what you asked for doesn't exist.
(This behavior is documented if you read font(n) carefully. I seem to
get the FontConfig default sans-serif font, ie what 'sans-serif' maps
to.)
So how do you check that a font actually exists? Well, there isn't a sure way that I know of; the best one I have so far is:
- first, accept a list of standard font family names that we believe
should always exist. My current one is:
monospace serif sans-serif times courier helvetica
(accepted in any case); this covers the standard FontConfig font names and the standard TK font names.
- use
font actual "...." -familyto get the actual family name for both the family we've been asked for (with the size and styles also specified) and a font family that definitely doesn't exist; the latter gives us the family name of the default font. If what we got isn't what we asked for and is the default font family, the font we asked for probably doesn't exist.(The corner case is where we asked for something that is an alias for the default font; in this case we should say that it exists. I don't know how to detect this. That's why it's important to accept standard aliases immediately.)
You will probably be unsurprised to know that the guaranteed TK 8.5 font families of 'times', 'helvetica', and 'courier' do not necessarily map to the default FontConfig fonts for 'serif, 'sans-serif', or 'monospace'. On Fedora 16 and Ubuntu, TK likes the Nimbus fonts for these while DejaVu fonts are the normal default ones. As far as I know you cannot change what font families TK will use for its guaranteed font families.
(If any of my readers use exmh and are interested in XFT font support for it, I've posted my hack modifications to the exmh-workers mailing list (you can find it on gmane.org). Since I'm only vaguely a TCL/TK programmer, the code is far from ideal.)
Sidebar: What 'font create' fonts aren't
Because I had to confirm this myself just now, I'm going to write
it down: TK 8.5 allows you to create named fonts with 'font create
name ...'. Such fonts are fully specified fonts and cannot be
used as building blocks to derive other fonts; they do not create
aliases for other general font families or the like. In other words you
cannot do:
font create example -family "Liberation Sans"
font actual "example 30 bold italic"
(Well, you can do this but it doesn't get you what you want.)
This is the core distinction between a font family and a realized TK
font. A realized TK font has a size, a set of styles, and so on, and it
has these whether or not you specified them (if you don't specify them
they take on default values). Here we created example as a realized
font and if you inspect its attributes you can see that TK filled in
default values for the size and so on. As a realized font it is not a
font family and cannot be used as one (well, TK doesn't even notice that
we're trying to use it that way; font names and family names could be
described as being in completely different namespaces).
It would be really convenient if font create could be used to create
and redefine font family aliases; then you could do things like redefine
what font family TK would use for 'times' (which would be handy if you
were dealing with a large body of existing code that used these font
family names) or create single-word aliases for font families with
spaces in their names. But it doesn't work that way at all.
2012-05-17
The Go language's problem on 32-bit machines
Recently (for my value of recently) there was somewhat of a commotion of people declaring that Go wasn't usable in production on 32-bit systems because its garbage collection was broken and it would eat all of your memory. Naturally I was interested in this and spent some time digging in to the reports and trying to understand the situation. Today I'm going to try to write down as much as I know about what's going on to get it straight in my head, which is going to involve a trip into the fun land of garbage collection.
To simplify a bit, the purpose of garbage collection is to automatically free up memory that's no longer used. The GC technique everyone starts with is reference counting but since it has various problems (including dealing with circular references) most people soon upgrade to more complex schemes based on inverting the problem: rather than noticing when something stops being used, the garbage collection system periodically finds all of the memory that's still actively used and then frees everything else. This is 'tracing garbage collection' (and garbage collectors), so called because the garbage collector 'traces' all live objects.
One deep but unsexy problem in garbage collection is how your GC system knows what fields in your objects refer to other objects and what fields are just primitive types like numbers, memory buffers, strings, or the like, and how it does this efficiently. This can be a particular issue for a system language where you probably want to have structures and objects that are as simple and dense as possible, with as little overhead from type annotations, inefficient 'boxed' representations, and so on as possible. One solution is to maintain a separate bitmap of what words in an allocated memory area are actually pointers (which the GC can then scan efficiently, and which can be set by the runtime when an object is allocated). Another solution is what gets called 'conservative garbage collection'. The fundamental idea is that in conservative GC, we are willing to over-estimate references (and thus wind up not freeing some unused memory); rather than insisting on knowing about references, the GC system simply scans through allocated memory looking anything that might be a pointer to an allocated object. If it finds one, it conservatively declares that the object is still alive and traces things from there.
Go was initially designed as a system language, although it's no longer described as one. As such, one of the tradeoffs the language designers made is that Go more or less uses conservative garbage collection, as far as I understand, at least for objects or at least memory areas that may contain pointers (some static data that's known to be pointer free may be skipped by the conservative GC). Although there's said to be the start of a more efficient word-bitmap implementation for Go objects, it's not currently usable by the GC (and may not be fully live).
(As far as I can tell from commentary, Go's garbage collector only scans Go's own memory areas; it doesn't make any attempt to scan memory used by outside libraries or code to find references to Go objects. Runtime code that passes a pointer to a Go object to an outside function is apparently required to keep the object alive inside Go, for example by hooking it into a global variable.)
The problem with conservative GC is that it over-estimates memory still in use because it finds false 'references', things that look like pointers to allocated objects that aren't actually that. There are a number of factors that make conservative GC worse:
- the more of your address space is in use for language objects, the more random values can look like references to them. If half of the address space is your objects, half of all properly aligned N-bit patterns look like pointers to your objects (where N is the size of a pointer).
- the smaller the address space is in general, the more of it you're
going to fill up with your objects for the same amount of memory use.
Two GB of objects is half of the 32-bit address space but a tiny
fraction of the 64-bit address space.
- the larger your individual objects are, the more memory a single 'reference' somewhere inside one will prevent from being freed.
- similarly, the more other objects a single object refers to, the more memory will be held down by a single spurious reference to the top object.
Many of these factors are apparently quite bad for 32-bit Go programs that use a significant amount of memory, apparently especially for large objects and when they use objects that the garbage collector treats conservatively. They are drastically reduced on 64-bit machines, where you would generally have to be unlucky in order for the conservative GC to accidentally hold a significant amount of memory busy. However, the problem could still happen with 64-bit Go; it's just less likely.
(The general reference for this is Go language issue 909.)
At this point I have no articulate personal reactions to all of this. As a pragmatic matter I'm not exactly writing Go programs right now for various reasons (although I keep vaguely wanting to because I like Go in the abstract), so if I'm being honest it's all kind of theoretical.
(My problem with Go in practice is partly that I have nothing to really use it on. I need to find a project that calls out for it instead of anything else.)
Sidebar: the 32-bit Windows issue
There's also an issue on Windows machines due to memory fragmentation (via Hacker News). When it starts, the Go runtime tries to allocate a contiguous 512 Mbyte region of virtual address space. Sometimes on Windows machines enough DLLs have loaded in enough places by this point that there isn't such a contiguous chunk of address space left any more, the allocation fails, and the Go runtime immediately exits with an error.
(In theory this sort of address space fragmentation could happen on any 32-bit OS, but apparently Windows is uniquely susceptible for various reasons.)