Wandering Thoughts archives

2019-06-15

Some notes on Intel's CPUID and how to get it for your CPUs

In things like Intel's MDS security advisory, Intel likes to identify CPU families with what they call a 'CPUID', which is a hex number. For example, the CPUID of the Sandy Bridge Xeon E5 'Server Embedded' product family is listed by Intel as 206D7, the CPUID of the Westmere Xeon E7 family is 206F2, and the CPUID of the Ivy Bridge Xeon E7 v2 family is 306E7. Given that one of these families has a microcode update theoretically available, one of them is supposed to get it sometime, and one will not get a microcode update, it has become very useful to be able to find out the CPUID of your Intel processors (especially given Intel's confusing Xeon names).

On x86 CPUs, this information comes from the CPU via the CPUID instruction, which provides all sorts of information (including the brand name of the processor itself, which the processor directly provides in ASCII). Specifically, it is the 'processor version information' that you get from using CPUID to query the Processor Info and Feature Bits. Many things will tell you this information, for example Linux's /proc/cpuinfo and lscpu, but they decode what it represents to give you the CPU family, model, and stepping (using a complicated algorithm that is covered in that Wikipedia entry on CPUID). Intel's 'CPUID' is it directly in hex, and I don't know if you can reliably reverse a given family/model/stepping triplet into the definite CPUID (I haven't tried to do it).

(Intel's MDS PDF also lists a two-hex-digit 'Platform ID'. I don't know where this comes from or how you find out what yours is. I thought I found some hints, but they don't appear to give the right answer on my test machine.)

There are a variety of ways to get the Intel CPUID in raw hex. The most brute force method and perhaps the simplest is to write a program that uses the CPUID instruction to get this. Keen people can use C with inline assembly, but I used Go with a third party package for this that I found through the obvious godoc.org search:

package main
import (
  "fmt"
  "sigs.k8s.io/node-feature-discovery/pkg/cpuid"
)

func main() {
  r := cpuid.Cpuid(0x01, 0x00)
  fmt.Printf("cpuid: %x\n", r.EAX)
}

This has the great benefit of Go for busy sysadmins; it compiles to a static binary that will run on any machine regardless of what packages you have installed, and you can pretty much cross-compile it for other Unixes if you need to (at least 64-bit x86 Unixes; people with 32-bit x86 Unixes are out of luck here without some code changes, but this package may help).

(Intel also has a CPUID package for Go, but it wants to decode this information instead of just give it to you literally so you can print the hex that Intel uses in its documentation. I wish Intel's left hand would talk to its right hand here.)

On Linux machines, you may have the cpuid program available as a package, and I believe it's also in FreeBSD ports in the sysutils section (and FreeBSD has another 'cpuid' program that I know nothing about). Cpuid normally decodes this information, as everything does, but you can get it to dump the raw information and then read out the one field of one line you care about, which is the 'eax' field in the line that starts with '0x00000001':

; cpuid -1 -r
CPU:
   0x00000000 0x00: eax=0x00000016 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x000906ea ebx=0x04100800 ecx=0x7ffafbff edx=0xbfebfbff
[...]

(This is my home machine, and the eax of 0x000906ea matches the CPUID of 906EA that Intel's MDS PDF says that an i7-8700K should have.)

Perhaps you see why I think a Go program is simpler and easier.

IntelCPUIDNotes written at 23:27:09; Add Comment

2019-06-09

Go recognizes and specially compiles some but not all infinite loops

A while back, for my own reasons, I wrote an 'eatcpu' program to simply eat some number of CPUs worth of CPU time (by default, all of the CPUs on the machine). I wrote it in Go, because I wanted a straightforward program and life is too short to deal with threading in C. The code that uses CPU is simply:

func spinner() {
        var i int
        for {
                i++
        }
}

At the time I described this as a simple integer CPU soaker, since the code endlessly increments an int. Recently, to make my life more convenient, I decided to put it up on Github, and as part of that I decided that I wanted to actually know what it was doing; specifically I wanted to know if it actually was all running in CPU registers or if Go was actually loading and storing from memory all of the time. I did this in the straightforward way of running 'go tool compile -S' (after some research) and then reading the assembly. It took me some time to understand what I was reading and believe in it, because here is the entire assembly that spinner() compiles down to:

0x0000 00000 (eatcpu.go:27)     JMP     0
0x0000 eb fe

(The second line is the actual bytes of object code.)

Go 1.12.5 had recognized that I had an infinite loop with no outside effects and had compiled it down to nothing more than that. Instead of endless integer addition, I had an endless JMP, which was probably using almost none of the CPU's circuitry (certainly it doesn't need to use the integer ALU).

The Go compiler is clever enough to recognize that a variation of this is still an infinite loop:

func spinner2() int {
        var i int
        for {
                i++
        }
        return i
}

This too compiles down to 'JMP 0', since it can never exit the for loop to return anything.

However, the Go compiler does not recognize impossible situations as being infinite loops. For example, we can write the following:

func spinner3() uint {
        var i uint
        for ; i >= 0 ; {
                i++
        }
        return i
}

Since i is an unsigned integer, the for condition is always true and the loop will never exit. However, Go 1.12.5 compiles it to actual arithmetic and looping code, instead of just a 'JMP 0'. The core of the assembly code is:

0x0000  XORL    AX, AX
0x0002  JMP     7
0x0004  INCQ    AX
0x0007  TESTQ   AX, AX
0x000a  JCC     4
0x000c  MOVQ    AX, "".~r0+8(SP)
0x0011  RET

(The odd structure is because of how plain for loops are compiled. The exit check is relocated to the bottom of the loop, and then on initial loop entry, at 0x0002, we skip over the loop body to start by evaluating the exit check.)

If I'm understanding the likely generated x86 assembly correctly, this will trivially never exit; TESTQ likely compiles to some version of TEST, which unconditionally clears CF (the carry flag), and JCC jumps if the carry flag is clear.

(The Go assembler's JCC is apparently x86's JAE, per here, and per this x86 JUMP quick reference, JAE jumps if CF is clear. Since I had to find all of that and follow things through, I'm writing it down.)

On the whole, I think both situations are reasonable. Compiling infinite for loops to straight JMPs is perfectly reasonable, since they do get used in real Go code, and so is eliminating operations that have no side effects; put them together and spinner() turns into 'JMP 0'. On the other hand, the unsigned int comparison in spinner3() should never happen in real, non-buggy code, so it's probably fine for the optimizer to not recognize that it's always true and thus that this creates an infinite loop with no outside effects.

(There is little point to spending effort on optimizing buggy code.)

PS: I don't know if there's already a Go code checker that looks for unsigned-related errors like the comparison in spinner3(), but if there isn't there is probably room for one.

GoInfiniteLoopOptimization written at 21:56:40; Add Comment

2019-06-05

Go channels work best for unidirectional communication, not things with replies

Once, several years ago, I wrote some Go code that needed to manipulate a shared data structure. At this time I had written and read less Go code than I have now, and so I started out by trying to use channels and goroutines for this. There would be one goroutine that directly manipulated the data structure; everyone else would ask it to do things over channels. Very rapidly this failed and I wound up using mutexes.

(The pattern I tried is what I have since seen called a monitor goroutine (via).)

Since then, I have come to feel that this is one regrettable weakness of Go channels. However nice, useful, and convenient they are for certain sorts of communication patterns, Go channels do not give you very good ways of implementing a 'RPC' communication pattern, where you make a request of another goroutine and expect to get an answer back, since there is no direct way to reply to a channel message. In order to be able to reply to the sender, your monitor goroutine must receive a unique reply channel as part of the incoming request, and then things can start getting much more complicated and tangled from there (with various interesting failure modes if anyone ever makes a programming mistake; for example, you really want to insist that all reply channels are buffered).

My current view is that Go channels work best for unidirectional communication, where either you don't need an answer to the message you've sent or it doesn't matter which goroutine in particular receives and processes the 'reply' (really the next step), so you can use a single shared channel that everyone pulls messages from. Implementing some sort of bidirectional communication between specific goroutines with channels is generally going to be painful and require a bunch of bureaucracy that will complicate your code (unless all of the goroutines are long-lived and have communication patterns that can be set up once and then left alone). This makes the "monitor goroutine" pattern a bad idea simply for code clarity reasons, never mind anything else like performance or memory churn.

(This is especially the case if you have a bunch of different requests to send to the one goroutine, each of which can get a different reply, because then you need a bunch of different channel types unless you're going to smash everything together in various less and less type-safe ways. The more methods you would implement on your shared data structure, the more painful doing everything through a monitor goroutine will be.)

I'm not sure there's anything that Go could do to change this, and it's not clear to me that Go should. Go is generally fairly honest about the costs of operations, and using channels for synchronization is more expensive than a mutex and probably always will be. If you have a case where a mutex is good enough, and a shared data structure is a great case, you really should stick with simple and clearly correct code; that it performs well is a bonus. Channels aren't the answer to everything and shouldn't try to be.

(Years ago I wrote Goroutines versus other concurrency handling options in Go about much the same issues, but my thinking about what goroutines were good and bad at was much less developed then.)

(This entry was sparked by reading Golang: Concurrency: Monitors and Mutexes, A (light) Survey, because it made me start thinking about why the "monitor goroutine" pattern is such an awkward one in Go.)

GoChannelsAndReplies written at 01:02:59; Add Comment

By day for June 2019: 5 9 15; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.