V8's neat encoding trick for type tracking
August 22, 2011
One of the ways that a runtime compiler speeds up a dynamic language is by generating specialized code that operates only on specific types of objects using more and more native values and code; if your loop is actually only dealing with integers, the compiler wants to compile it down to integer math operations on native integers. To do this the compiler has to track what types of actual objects your code handles as it runs; this is generally called type feedback.
As part of doing this, V8 has a hierarchy of types that it tracks, or really a hierarchy of type distinctions (presumably based on specializations it can make in the generated code). The distinctions are: non-primitive types versus primitive types, then strings versus numbers, doubles versus integers, and finally full-sized integers or small integers. If your code always handles values that have the same type you'll stay at the appropriate spot in the hierarchy, but if the types of the values change V8 must combine your old type information with the new type to determine your new spot in the hierarchy.
To store this information compactly and do this combining operation very
efficiently, V8 has come up with a neat encoding trick. Mechanically,
V8 represents this type information as an 8-bit bitmap and it combines
new type information into the bitmap by and'ing the new type into the
current type information.
Well, sort of. What it really is doing is using
(Taken from V8's type-info.h.)
We start out with all bits set in the uninitialized state. As we move further and further up the hierarchy and thus know less and less about the specific types involved, the encoding knocks more and more bits out; eventually, we arrive at an encoding with no bits set (when a particular spot of code deals with both primitive types and non-primitive types). Written this way you can see the cascade of removed bits as things get less and less specific (and how related types in the hierarchy share bits), and how the bit patterns are carefully chosen to make this work.
(There are a lot of specific variations possible on this encoding, obviously, since the important thing is how many common bits you have between each type, not exactly where in the bitmap they occur. I assume that Google chose this specific one because it was convenient in some way.)
We can work this in reverse to deduce how many bits a particular sort of type information has set based on how far down the hierarchy it is. Primitive and non-primitive are one step down, so they each have one bit set. Strings and numbers are two steps down, so they have two bits set (one bit from Primitive and one extra bit to tell them apart). And so on down until you get to the furthest step away, small integers, with four bits set. This doesn't necessarily help in understanding V8, but may help if you want to use the same trick in some other context and thus need to come up with your own encoding.
(I believe but have not proved that you need as many total bits as you have steps in your type hierarchy; here V8 has seven steps (not counting either 'uninitialized' or 'unknown') and uses seven bits.)
Written on 22 August 2011.
* * *