Systemd has a problem with SATA disks behind port multipliers

August 10, 2016

The servers for our disk based backup system are all running Ubuntu 12.04, because that's what was current when we last had to touch them. Ubuntu 12.04 is on its way out, so we're starting to rebuild all of our 12.04 machines on 16.04, and today we did this to the first backup server. Fortunately it was a server used for our long-term backups server and thus something we don't need right away, because 16.04 turns out to have a problem here.

All of our backup servers use external SATA disk enclosures and SATA port multipliers, partly because that's what we had available and partly because that was the inexpensive option at the time (and maybe still today). When we first booted the machine in 16.04, it looked like it was only detecting one of the disks on each port multiplier channel instead of all four. Further investigation showed that all the disks were being detected, but only one out of every four was showing up in /dev/disk/by-path, which we rely on to give us stable identifiers for each disk slot in the external enclosure. More than that, the path identifiers are different. On 12.04, we got /dev/disk/by-path identifiers like pci-0000:02:00.0-scsi-2:0:0:0 while on 16.04, they're like pci-0000:02:00.0-ata-1.

One obvious difference is that 16.04 uses systemd and systemd has swallowed udev and in the process likely made a number of changes to it (in the grand systemd tradition). Certainly some Internet searches found suggestive bits. Unfortunately this turns out to be somewhat of a red herring; the real cause is less active damage (by systemd and udev) and more non-benign neglect and ignorance that has been exposed by the kernel changing the underlying sysfs topology to be more honest.

In 12.04 (with what is now an ancient kernel), a typical port multiplier disk shows up in sysfs as (take a deep breath):

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:00.0/host8/target8:0:0/8:0:0:0/block/sdk

This is essentially claiming itself to be a generic SCSI disk. You have to look hard at attributes in various spots in sysfs to find out the truth, and the 12.04 udev does not; it considers this disk to be a generic SCSI disk and handles its naming like any other SCSI controller (we can see that from the '-scsi-' bit that is in the by-path name).

In 16.04, this same disk slot is:

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:00.0/ata1/host0/target0:0:0/0:0:0:0/block/sda

(Yes, the sdX names keep changing. That's why we need /dev/disk/by-path.)

This addition of ata1 to the full path appears to trigger a different code path in systemd's version of udev, one that specifically deals with (S)ATA disks. This code path does not believe that there can be more than one disk per (S)ATA host node (the host0 here), and so it gives all four disks on this port the same ID_PATH value, one based purely on the ATA port number of the port they're all attached through; they are all pci-0000:02:00.0-ata-1. Naturally there can be only one /dev/disk/by-path/pci-0000:02:00.0-ata-1 directory entry, so three out of the four disks lose out.

(You can see this code in src/udev/udev-builtin-path_id.c in handle_scsi_ata(); it's present in both Ubuntu 16.04's systemd-229 and the current git tip here. The corresponding code for the generic case of SCSI-like devices is much more complicated.)

This is what I could politely call an oversight on the part of the systemd/udev conglomerate. The code for giving ATA devices names has stayed unchanged since it was introduced in 2015 (where it replaced 2012 code that skipped them entirely), so it's always had this issue. Had the kernel not switched to honestly reporting these ports as ATA ports instead of generic SCSI host ports, we could have missed seeing this, as the naive ATA-handling code would never have been exercised. Now, though, we're left with the mess. I've filed Ubuntu bug 1611945, although I don't know if it'll do any good.

(Now that writing this entry has caused me to discover the exact problem, I'm going to be able to refine the Ubuntu bug report. Unfortunately I can't report a bug directly to the upstream systemd, although I'm convinced it's still there in their code, because Ubuntu 16.04 doesn't have a systemd version that's within their 'you can report' window.)

What I don't have any answers for is the best way to deal with this issue. We could try 14.04 (although it'd have similar problems if its kernel has the sysfs topology change), or perhaps we could write a bunch of additional udev rules to create our own hard-coded version of /dev/disk/by-path using PCI identifiers and so on. I admit that the idea of writing udev rules is somewhat scary, as the whole area has never struck me as either easy or well documented.

(Probably the udev rule approach is the best solution.)

Written on 10 August 2016.
« A look into a future where things assume you have a smartphone
I think I'm going to shift my style of Python indentation »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Aug 10 23:15:33 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.