Checking RAM DIMM information from inside Linux

December 21, 2017

Suppose, not entirely hypothetically, that you have some machines where you don't know exactly what their DIMMs are and how they're set up, and you'd like to. Obviously you can find out all of this information if you take the machine down, open it up, and inventory the DIMMs, but fortunately for your uptime you can extract a surprisingly large amount of information from within Linux without having to go that far.

If you have a NUMA machine, you can get a bunch of information about the NUMA memory hierarchy. This doesn't directly give you DIMM-level information, but may be necessary in order to figure out how your DIMMs are split up among sockets, NUMA zones, and so on.

The default first stop is often dmidecode, which interprets DMI/ SMBIOS information that's set up by the BIOS before Linux is booted. The BIOS pulls this information from magical sources, but it's usually accurate. The DIMM information is gotten with 'dmidecode --type memory', but what information and fields you get can vary a lot from system to system. The actual DIMMs are 'Memory Device'(s), and may come out like this:

Handle 0x001E, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x001D
        Error Information Handle: 0x002E
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM_A1
        Bank Locator: CPU1
        Type: DDR3
        Type Detail: Synchronous Registered (Buffered)
        Speed: 1600 MHz
        Manufacturer: CE00B304CE00
        Serial Number: 34BD54B9
        Asset Tag: 02411221
        Part Number: M393B1K70DH0-CK0 
        Rank: 2
        Configured Clock Speed: 1600 MHz

Now, here is a tricky question: is this an ECC DIMM, and is it being used in an ECC capable system? Your inclination may be to say it clearly isn't, since the total width is only 64 bits. Unfortunately, if you search for the part number on the Internet, you'll discover that it's Samsung ECC DDR3 memory, and the server is a Dell C6220 blade that is definitely ECC capable. Perhaps something has gone wrong, but my default assumption is that there's ECC, it's just that the SMBIOS information isn't reporting it for some reason.

(dmidecode will report a section on 'Physical Memory Array' that includes a 'Error Correction Type', but it's apparently not clear if this represents the maximum capabilities or the current realities. PC vendors being PC vendors, it probably varies, especially on desktop systems.)

Having looked at a number of our servers, my conclusion is that if dmidecode reports a 'Total Width' larger than the 'Data Width' (typically 72 and 64), you can definitely conclude that you have ECC DIMMs. If it also reports that ECC is enabled in the 'Physical Memory Array' section, ECC is almost certainly on. Otherwise, who knows short of the kernel complaining about ECC problems.

The same information can be obtained through 'lshw -C memory'. This is somewhat more compact and sometimes can decode more things. For example, for the same DIMM, it reports:

    *-bank:0
         description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
         product: M393B1K70DH0-CK0
         vendor: Samsung
         physical id: 0
         serial: 34BD54B9
         slot: DIMM_A1
         size: 8GiB
         width: 64 bits
         clock: 1600MHz (0.6ns)

Here the vendor is correctly reported as Samsung, but we've lost the 'bank locator' that in this case tells us which CPU socket the DIMM is attached to. Lshw gets its DIMM information from the DMI/SMBIOS data, it just prints it differently than dmidecode does.

On some servers, the IPMI system may provide some degree of access to DIMM information, generally under some sort of 'asset management' tag. It's possible that you can get at this information with things like ipmitool, but you may need to talk to the BMC in another way, for example through a web browser to the BMC's web interface (if it has one).

I wish I had better news to report, but as far as I know that's it for finding out DIMM information. You can at least get basic information, which is good enough to answer questions like 'are all the DIMM slots filled on this server' or 'where are all our 8 GB DIMMs', and I think things like the speed information and the part numbers are broadly trustworthy (the speed information probably somewhat more than the part numbers, because less probably breaks if the DIMMs report crazy part numbers to the BIOS and the usual aphorism applies).

Written on 21 December 2017.
« I feel that Firefox forks that would be useful to me are doomed
Our next generation of fileservers will not use any sort of SAN »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 21 01:42:49 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.