Linux disk names you can encounter in your Prometheus host metrics
We recently added our first server with NVMe drives to our fleet. Naturally we hooked it up to our Prometheus setup so it would show up on our Grafana dashboards, including the dashboards for disk IO. This turned up a little issue, which is that Linux names NVMe drives very differently than SATA (and SAS) drives. Our dashboards had previously been looking only at Prometheus disk metrics for devices with 'sd.*' names, so they needed some changes to add nvme.* names. This set me to wondering what sort of disk device names showed up in our Prometheus metrics across our fleet.
The Prometheus device names come from /proc/diskstats, with some devices excluded by the host agent (see the flag --collector.diskstats.ignored-devices). If I'm decoding it correctly, the host agent's 'ignore' regular expression for devices excludes partitions on common types of disks, and also all 'ram' and 'loop' devices. The current regular expression doesn't exclude partitions on software RAID devices (which are named 'mdNpN'). What is left is:
- real 'sd' disks (of whatever sort, since these days these may be
SATA, SAS, USB, SCSI, or probably other things too), and NVMe drives
under their namespaces. These disks are called 'sd<letter>' or
- software RAID 'md' devices, 'md<number>', and (until this is fixed) the rare partitioned software RAID device, 'md<number>p<number>'.
- device-mapper 'dm-<number>'
software devices, which may be come from any number of things including
LVM and LUKS disk encryption.
- zram disks, 'zram<number>',
which are starting to be used by Linux distributions for swap.
- 'xvd<letter>' Xen disks and 'vd<letter>' virtio disks.
- 'sr<number>' devices for CD and DVD drives connected by SATA, USB, or perhaps even SAS.
- more exotic things like floppy disks ('fd<number>') and probably IDE drives. There's also various sorts of network block devices, the most common of which may be DRBD, which I believe are 'drbd<number>'.
(The Prometheus host agent doesn't try to ignore partitions on the more exotic disk devices; its regular expression for them mostly covers the hd, sd, vd, and xvd prefixes.)
Looking in the kernel source suggests that there are quite a number of possible disk names if you go trawling in the depths of unusual architectures and drivers.
Actual devices without their partitions show up in /sys/block and you can read out their statistics from their <name>/stat sysfs file, but I expect the Prometheus host agent to keep reading /proc/diskstats because it gives you everything's statistics by opening and reading only one file, and it's a fixed file that will never disappear on you because (for example) someone hot-unplugged a drive.
PS: If you're also monitoring OpenBSD machines, they have disks named 'sd<number>' and CD/DVD ROM devices named 'cd<number>'.