2024-07-10
Fedora 40 probably doesn't work with software RAID 0.90 format superblocks
On my home machine, I have an old pair of HDDs that have (had) four old software RAID mirrors. Because these were old arrays, they were set up with the old 0.90 superblock metadata format. For years the arrays worked fine, although I haven't actively used them since I moved my home machine to all solid state storage. However, when I upgraded from Fedora 39 to Fedora 40, things went wrong. When Fedora 40 booted, rather than finding four software RAID arrays on sdc1+sdd1, sdc2+sdd2, sdc3+sdd3, and sdc4+sdd4 respectively, Fedora 40 decided that the fourth RAID array was all I had, and it was on sdc plus sdd (the entire disks). Since the fourth array have a LVM logical volume that I was still mounting filesystems from, things went wrong from there.
One of the observed symptoms during the issue was that my /dev had
no entries for the sdc and sdd partitions, although the kernel
messages said they had been recognized. This led me to stopping the
'md53' array and running 'partprobe
' on both sdc and sdd, which
triggered an automatic assembly of
the four RAID arrays. Of course this wasn't a long term solution,
since I'd have to redo it (probably by hand) every time I rebooted
my home machine. In the end I wound up pulling the old HDDs entirely,
something I probably should have done a while back.
(This is filed as Fedora bug 2280699.)
Many of the ingredients of this issue seem straightforward. The old 0.90 superblock format is at the end of the object it's in, so a whole-disk superblock is at the same place as a superblock in the last partition on the disk, if the partition goes all the way to the end. If the entire disk has been assembled into a RAID array, it's reasonable to not register 'partitions' on it, since those are probably actually partitions inside the RAID array. But this doesn't explain why the bug started happening in Fedora 40; something seems to have changed so that Fedora 40's boot process 'sees' a whole disk RAID array based on the 0.90 format superblock at the end, where Fedora 39 did not.
I don't know if other Linux distributions have also picked up whatever change in whatever software is triggering this in Fedora 40, or if they will; it's possible that this is a Fedora specific issue. But the general moral I think people should take from this is that if you still have software RAID arrays using superblock format 0.90, you need a plan to change that. The Linux Raid Wiki has a somewhat dangerous looking in-place conversion process, but I wouldn't want to try that without backups. And if you have software RAID arrays that old, they probably contain old filesystems that you may want to recreate so they pick up new features (which isn't always possible with an in-place conversion).
Sidebar: how to tell what superblock format you have
The simple way is to look at /proc/mdstat. If the status line for a software RAID array mentions a superblock version, you have that version, for example:
md26 : active raid1 sda4[0] sdb4[1] 94305280 blocks super 1.2 [2/2] [UU]
This is a superblock 1.2 RAID array.
If the status line doesn't mention a 'super' version, then you have an old 0.90 superblock. For example:
md53 : active raid1 sdd4[1] sdc4[0] 2878268800 blocks [2/2] [UU] bitmap: 0/22 pages [0KB], 65536KB chunk
Unless you made your software RAID arrays a very long time ago and faithfully kept upgrading their system ever since, you probably don't have superblock 0.90 format arrays.
(Although you could have deliberately asked mdadm to make new arrays with 0.90 format superblocks.)