Making modern FreeType-using versions of xterm display CJK characters

January 9, 2017

For a long time, my setup of xterm has not displayed Chinese and Japanese characters (or Korean ones, although I encounter those less often). Until recently it displayed the Unicode 'no such character' empty box in their place, which was okay and told me that there were problems, but after my upgrade to Fedora 25 it started showing spaces instead (or at least some form of whitespace). This is just enough extra irritation that I've been pushed into figuring out how to fix it.

I switched my xterm setup from old style bitmapped fonts to new style XFT/FreeType several years ago. It turns out that enabling CJK fonts in this environment is actually quite simple, as I found out from the Arch Linux wiki. All you need to do is to tell xterm what font to use for these characters with either the -fd command line argument or the faceNameDoublesize X resource (I recommend the latter, unless you already have a frontend script for xterm).

Well, that elides a small but important detail, namely finding such a font. Modern fonts tend to have a lot more glyphs and language coverage than old fonts did, but common fonts like the monospaced font I'm using for xterm don't go quite as far as covering the CJK glyphs; instead this seems to be reserved for special fonts with extended ranges. Sophisticated systems like Gnome come magically set up to pick the right font(s) in gnome-terminal, but in xterm we're on our own to dig up a suitable font and I'm not quite sure what the right way to do that is.

As far as I know, fontconfig can be used to show us a list of fonts that claim to support a language, for example with 'fc-list :lang=zh-cn family'. A full list of things you can query for is here, and a more useful query may be 'fc-list :weight=medium:lang=zh-cn family', which excludes all of the bold and italic and so on versions.

(It looks like you can find out fonts that include a specific Unicode character by querying for 'charset=<hex codepoint>'.)

What I don't know is whether xterm requires its CJK font to be monospaced (I suspect it does if you want completely correct rendering) and if so, how you tell if any specific font is monospaced in its CJK glyphs. When I ask for 'fc-list :lang=zh-cn:spacing=mono', I get no fonts, although there are CJK fonts with 'Mono' in their names on my system and I'm using one of them in xterm without explosions so far. It may be that CJK fonts with 'Mono' in their name are monospaced in their CJK glyphs even if they are not monospaced in all glyphs. But then there is eevee's exploration into fontconfig, which suggests that 'monospace' in fontconfig is actually kind of arbitrary anyways.

(The other thing I don't know how to do for xterm is set things up if you need multiple fonts in order to get full coverage of the CJK glyphs, possibly in a genuinely monospaced font. This is especially interesting because Google's Noto Sans fonts have a collection of 'Noto Sans Mono CJK <language>' fonts. There appears to be overlap between them, but it's not clear if you need to stich up one (double-width) font for xterm out of them all or some subset.)

Comments on this page:

As a Chinese and Japanese speaker following your great blog, I feel like to comment about the 'Noto Sans Mono CJK <language>' fonts. They are different only for a small set of glyphs in CJK languages, and that is because C/J/K users have different preferences on how certain glyphs should be written. The most obvious ones are 关 复 门. The simplified Chinese uses these Hanzi(Kanji) as simplified version of 關/関 複/復 門, so they often appear in Simplified Chinese articles. The Japanese standard defined these glyphs for "Kanji radicals", so that you can write "関 is composed as 門 with 关". Unfortunately Unicode standard unified these two kinds of usage as single codepoint. So in "Noto Sans Mono CJK JP", 关 is written narrower than other glyphs to reflect the fact that it is not a complete Kanji, while in "Noto Sans Mono CJK CN" 关 is written full-width.

So back to your blog, both "Noto Sans Mono CJK CN" and "Noto Sans Mono CJK JP" cover full set of CJK glyphs. They just have different writing style in certain small-set of glyphs. Personally I recommend using the CN variant because Simplified Chinese use those chars far more often than Japanese. But for some reason I don't know, Chrome(skia) seems to prefer the JP variant when you don't specify a variant.

More details:

Written on 09 January 2017.
« One downside of a queued IO model is memory consumption for idle connections
Picking FreeType CJK fonts for xterm on a modern Linux system »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jan 9 01:26:03 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.