V8's neat encoding trick for type tracking

August 22, 2011

Courtesy of v8: a tale of two compilers I ran across a neat trick in how Google's V8 JavaScript compiler encodes types during runtime type tracking. Today I feel like writing it down to make sure that I really understand it.

One of the ways that a runtime compiler speeds up a dynamic language is by generating specialized code that operates only on specific types of objects using more and more native values and code; if your loop is actually only dealing with integers, the compiler wants to compile it down to integer math operations on native integers. To do this the compiler has to track what types of actual objects your code handles as it runs; this is generally called type feedback.

As part of doing this, V8 has a hierarchy of types that it tracks, or really a hierarchy of type distinctions (presumably based on specializations it can make in the generated code). The distinctions are: non-primitive types versus primitive types, then strings versus numbers, doubles versus integers, and finally full-sized integers or small integers. If your code always handles values that have the same type you'll stay at the appropriate spot in the hierarchy, but if the types of the values change V8 must combine your old type information with the new type to determine your new spot in the hierarchy.

To store this information compactly and do this combining operation very efficiently, V8 has come up with a neat encoding trick. Mechanically, V8 represents this type information as an 8-bit bitmap and it combines new type information into the bitmap by and'ing the new type into the current type information. Well, sort of. What it really is doing is using and to selectively remove bits from the type information as the types involved become less and less specific. Let me show you the actual encodings, written as bits (looking at these is how I came to understand what was going on and how it worked):

uninitialized 1111111
non-primitive 1000000
string 0110000
double 0011001
small integer 0010111
integer 0010011
number 0010001
primitive 0010000
unknown 0000000

(Taken from V8's type-info.h.)

We start out with all bits set in the uninitialized state. As we move further and further up the hierarchy and thus know less and less about the specific types involved, the encoding knocks more and more bits out; eventually, we arrive at an encoding with no bits set (when a particular spot of code deals with both primitive types and non-primitive types). Written this way you can see the cascade of removed bits as things get less and less specific (and how related types in the hierarchy share bits), and how the bit patterns are carefully chosen to make this work.

(There are a lot of specific variations possible on this encoding, obviously, since the important thing is how many common bits you have between each type, not exactly where in the bitmap they occur. I assume that Google chose this specific one because it was convenient in some way.)

We can work this in reverse to deduce how many bits a particular sort of type information has set based on how far down the hierarchy it is. Primitive and non-primitive are one step down, so they each have one bit set. Strings and numbers are two steps down, so they have two bits set (one bit from Primitive and one extra bit to tell them apart). And so on down until you get to the furthest step away, small integers, with four bits set. This doesn't necessarily help in understanding V8, but may help if you want to use the same trick in some other context and thus need to come up with your own encoding.

(I believe but have not proved that you need as many total bits as you have steps in your type hierarchy; here V8 has seven steps (not counting either 'uninitialized' or 'unknown') and uses seven bits.)


Written on 22 August 2011.
« The conflict between caching and tracking on the web
Reading the tea leaves about Oracle, Solaris, and universities »

These are my WanderingThoughts
(About the blog)

Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

This is a DWiki.
GettingAround
(Help)

Search:

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 22 00:18:49 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.