One advantage of 'self-hosted' languages

December 5, 2016

One of the things that people like to do with languages (and language runtime environments) is to make them 'self-hosted'. A self-hosted language is one where the compiler (or interpreter) is almost entirely written in the language itself, instead of being written in another language such as C.

I don't know all of the reasons that people have for self-hosting languages, since I've never participated in language development. But from an outsider's perspective, I can think of one fairly obvious reason to want to self-host your language, which is that it probably increases the number of people who can work on your compiler by reducing what they need to know.

To work on a language (or its runtime), you generally need to know the language itself, and obviously you need to know the language that the compiler or interpreter or runtime is written in. When language X is written in language Y, this means that you need to know both X and Y. When language X is written in itself, you only need to know X. And if you're interested in working on something involving language X you probably know the language.

(In theory you could imagine situations where people who know only language Y could improve the compiler for language X by working on internals with well-defined semantics, like symbol table handling or the like. In practice I think that the people who will be interested in doing such work in the first place are people who are interested in language X.)

Sidebar: The case of LLVM as the exception that proves the rule

LLVM is an increasingly popular compiler backend written in C++ that is used by a number of languages, for example Rust. Obviously this means that these languages aren't self-hosted and probably will never be; self-hosting would require them to duplicate on their own a significant amount of the time and effort that have been poured into LLVM.

That statement right there is, I think, a big reason why people use LLVM. When you use LLVM as your compiler backend, you get to tap into all of the work that other people have done on it (and will continue to do in the future). You get a free ride on a high quality compiler backend, and for some projects this free ride is definitely worth some narrowing of the pool of contributors to your language's front end.

What makes the LLVM situation work in your favour is that it's a shared backend and so attracts people to work on it who don't care about your language (and who probably don't know anything about it; they can work on the LLVM backend despite this because it has well defined interfaces and APIs). An un-shared compiler or interpreter backend doesn't get this advantage.

Written on 05 December 2016.
« Terminals are not enough (personal edition)
My RPM build setup no longer works on Fedora for some packages »

Page tools: View Source.
Search:
Login: Password:

Last modified: Mon Dec 5 01:16:03 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.