Linux servers can still wind up using SATA in legacy PATA mode
One of the things that the BIOS on this machine (and others [of our servers]) is apparently doing is setting the SATA ports to legacy IDE/ata_piix mode instead of AHCI mode. I wonder how many driver & hardware features we're missing because of that.
(The 'ata_piix' kernel module is the driver for legacy mode, while 'ahci' module is the driver for AHCI SATA. If you see boot time messages from ata_piix, you should be at least nervous.)
Modern SATA host controllers have two different modes, AHCI, which supports all of the features of SATA, and legacy Paralle ATA emulation (aka IDE mode), where your SATA controller pretends to be an old IDE controller. In the way of modern hardware, how your host controller presents itself is chosen by the BIOS, not your operating system (or at least not Linux). Most modern BIOSes probably default to AHCI mode, which is what you want, but apparently some of our machines either default to legacy PATA or got set that way at some point.
The simplest way to see if you've wound up in this situation is to
lsblk to see what it reports as the
'TRAN' field (the transport type); it will be 'sata' for drives
behind controllers in AHCI mode, and 'ata' for legacy PATA support.
On one affected machine, we see:
; lsblk -o NAME,HCTL,TRAN,MODEL --nodeps /dev/sd? NAME HCTL TRAN MODEL sda 0:0:0:0 ata WDC WD5000AAKX-0
Meanwhile, on a machine that's not affected by this, we see:
; lsblk -o NAME,HCTL,TRAN,MODEL --nodeps /dev/sd? NAME HCTL TRAN MODEL sda 0:0:0:0 sata WDC WD5003ABYX-1 sdb 1:0:0:0 sata ST500NM0011
It's otherwise very easy to not notice that your system is running in PATA mode instead of AHCI (at least until you attempt to hot-swap a failed drive; only AHCI supports that). I'm not sure what features and performance you miss out on in legacy PATA mode, but one of them is apparently Native Command Queueing. I suspect that there also are differences in error recovery if a drive has bad sectors or other problems, at least if you have three or four drives so that the system has to present two drives as being on the same ATA channel.
Based on our recent experience, my strong belief is now that your system BIOS is much more likely to play around with the order of hard drives if your SATA controller is in legacy mode. A SATA controller in AHCI mode is hopefully presenting an honest view of what drive is cabled to what port; as we've found out, this is not necessarily the case in legacy mode, perhaps because the BIOS always has to establish some sort of mapping between SATA ports and alleged IDE channels.
(SATA ports can be wired up oddly and not as you expect for all sorts of reasons, but at least physical wiring stays put and is thus consistent over time. BIOSes can change their minds if they feel like it.)