What data about your NVMe drives Linux puts in sysfs

November 18, 2021

Linux has a habit of exposing information about various devices in sysfs (normally visible on /sys). NVMe drives are one such device, especially because NVMe drives are PCIe devices. Recently I found myself wondering what information is exposed here. The answer turns out to be less than I expected.

Information for any given NVMe drive is found in sysfs in /sys/class/nvme/nvmeN (as usual, this is a symlink to a subdirectory of the actual PCIe device in /sys/devices). The useful files in here seem to be model (the 'model name' of the drive, whatever your drive vendor thought that should be), serial (its serial number), firmware_rev (its claimed firmware revision), and state, which will normally be "live" for an active drive. There's also a hwmonN subdirectory that contains drive temperature information, especially hwmonN/temp_input, the drive temperature. As seems to be the standard for hardware temperature sensors, the value is in degrees C multiplied by a thousand (so a reported value of '30850' means 30.85 C).

(There's also subsysnqn which I believe is some kind of standard NVMe naming scheme for devices. You'll have to do your own Internet searches for that one.)

Modern versions of smartctl can report on various NVMe device health metrics (the NVMe version of SMART attributes, although I don't know if they're technically considered SMART things); details on this are in the NVMe support smartmontools wiki page. You can also do this with the NVMe CLI tools. As far as I can see, the only part of this information that shows up in sysfs is the temperature.

While this is a little disappointing, it's also pretty much how the kernel traditionally behaves for other devices. The kernel needs to read a certain amount of identifying information about the device when it's initially probed, so it might as well put that in sysfs, but after that the kernel mostly avoids going out to check devices for current information. Maybe it would be different if NVMe devices exposed this sort of information directly through PCIe, as opposed to (apparently) requiring you to make a special request to the drive to get it.

(It would be nice if all of this was in sysfs because then you could do health monitoring purely through sysfs, instead of having to periodically run smartctl or nvme and then post-process their output.)


Comments on this page:

By Zev Weiss at 2021-11-21 18:38:58:

As seems to be the standard for hardware temperature sensors, the value is in degrees C multiplied by a thousand (so a reported value of '30850' means 30.85 C).

Note that using millidegrees C is a convention of the Linux kernel's hwmon subsystem; it is the responsibility of hwmon drivers to convert to that from whatever units/format the hardware they talk to provides (which vary wildly; see Figure 90 here for the format used in NVMe-MI for example).

By Zev Weiss at 2021-11-21 19:10:22:

(Though I should clarify that NVMe-MI is probably not the source of what you're seeing in sysfs, but more likely what a BMC would have access to over SMBus.)

Written on 18 November 2021.
« Why we have a split-horizon DNS setup
Why your Go programs can surprisingly be dynamically linked »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Nov 18 23:40:05 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.