Wandering Thoughts archives


Some thoughts on Fedora moving to btrfs as the default desktop file system

The news of the time interval for me is that there is a Fedora change proposal to make btrfs the default file system for Fedora desktop (via, itself via; see also the mailing list post). Given that in the past I've been a btrfs sceptic (eg, from 2015), long time readers might expect me to have some views here. However, this time around my views are cautiously optimistic for btrfs (and Fedora), although I will only be watching from a safe distance.

The first two things to note are that 2015 is a long time ago (in computer time) and I'm too out of touch with btrfs to have an informed opinion on its current state. I'm confident that people in Fedora wouldn't have proposed this change if there weren't good reasons to believe that btrfs is up to the task. The current btrfs status looks pretty good on a skim, although the section on device replacement makes me a little alarmed. The Fedora proposal also covers who else is using btrfs and has been for some time, and it's a solid list that suggest btrfs is not going to explode for Fedora users.

I'm a big proponent of modern filesystems with data and metadata checksums, so I like that aspect of btrfs. As far as performance goes, most people on desktops are unlikely to notice the difference, and as a long term user of ZFS on Linux I can testify how nice it is to not have to preallocate space to specific filesystems (even if with LVM you can grow them later).

However, I do feel that this is Fedora being a bit adventurous. This is in line with Fedora's goals and general stance of being a relatively fearless leading edge distribution, but at the same time sometimes the leading edge is also the bleeding edge. I would not personally install a new Fedora machine with btrfs in the first few releases of Fedora that defaulted to it, because I expect that there will be teething problems. Some of these may be in btrfs, but others will be in system management programs and practices that don't cope with btrfs or conflict with it.

In the long run I think that this change to btrfs will be good for Fedora and for Linux as a whole. Ext4 is a perfectly decent filesystem (and software RAID works fine), but it's possible to do much better, as ZFS has demonstrated for a long time.

FedoraBtrfsDefaultView written at 23:47:04; Add Comment


NetworkManager and (not) dealing with conflicting network connections

I recently tweeted a wish for NetworkManager:

I wish there was some straightforward way to tell NetworkManager to not automatically connect to any wifi networks if my laptop has a wired network connection, while still auto-connecting if there is no wired Ethernet.

In NetworkManager you can set a priority for network connections, but as far as I can tell you can't tell it that two connections conflict with each other and should never be brought up at the same time. You can write scripts that run when connections change and that take down connections (on Twitter @zigford pointed me to a script for this here), but I don't consider this straightforward. So let me give you the story of how I have wound up wanting this.

I have a work laptop, which I've brought home periodically for short periods of light use in the past (for example over our Christmas vacations). At that point I set it up for wireless networking and to automatically connect to my home wireless for convenience. Then, thanks to current world and local events, I took my work laptop home for extended use over a longer period, and soon discovered that my home wireless network has surprisingly high and variable latency. Fortunately I'd also taken the laptop's USB Ethernet adapter and I have a little home switch, so I could change over to having my laptop use a wired connection (although not without fun discoveries).

As it happens, I don't have two home subnets, one wired and one wireless; I have one that everything is on (and the AP just extends it out from wired to wireless). When I initially set up my laptop at home on the wireless, I gave it a fixed IP address and name so that I could easily SSH to it (and because I give everything here a fixed IP address). When I 'switched' my laptop to using a wired connection, I gave that wired Ethernet address the same fixed IP address, because I had no desire to have to SSH to 'laptop-wired' versus 'laptop-wifi' (I just want to SSH to 'laptop'). Unfortunately this means that if both the wired and the wireless connections are active at once, both get the same IP and then fun things happen. Especially, some of my traffic to my laptop goes over the wireless, with increased and variable latencies, which is what I set up a wired connection to avoid.

(I'm honestly surprised that my DHCP server didn't object to handing out the same IP at once to two different things, but then I did tell it that both the wired and the wireless Ethernet addresses could have the same IP. I'm also surprised at how long it took me to notice this; I only did because I was running 'ifconfig -a' for another reason and noticed that my wifi adapter had an IP assigned.)

My current solution is to tell NetworkManager to not automatically connect to my home wireless network. This is less convenient if I want to use my work laptop from somewhere else, but in practice I almost never do (my work laptop is mostly used for video conferencing, since it has a camera and a microphone; actual work happens from my home desktop).

NetworkManagerConnectionConflict written at 22:42:14; Add Comment


Removing unmaintained packages from your Fedora machine should require explicitly opting in

Ben Cotton recently wrote an entry on Fedora potentially removing unmaintained packages from your system under some circumstances, because there is a Fedora change proposing to remove 'retired' packages. The change proposal contains the following remarks:

Upgrade/compatibility impact

During an upgrade, all retired packages will be automatically removed.


How To Test

1. Upgrade to next version of Fedora. 2. Check all retired packages are removed.

In the ending of Ben Cotton's article, he says in passing "[B]ut we have to make sure we’re clearly communicating what is happening to the user" if package removal happens. I will go further than that.

Removing packages from your system on Fedora upgrades should require an explicit opt-in, and this opt-in should be able to show you the list of packages being removed.

Going beyond that, Fedora should never remove unmaintained packages from your system without this opt in, for example they should never push out an updated fedora-retired-packages RPM in Fedora updates.

Removing unmaintained packages from people's systems is removing functionality with no replacement or equivalent. This can break what people are doing with their Fedora machines, and doing so is both morally wrong and dangerous in practice. It doesn't take too many cases of Fedora upgrades or Fedora package updates breaking things without warning for people to stop doing either of them.

Because this requires explicit user opt-in and a UI and so on, and additional unmaintained packages should not be removed during the lifetime of a Fedora release, I think that removing retired packages during upgrades should live in the upgrader, not be implemented as an RPM package (or at least not as an RPM package that's installed by default). The upgrade system is the only place that is in a position to actively prompt the user in a meaningful way to obtain explicit, informed opt-in consent to this.

(The lightweight version of this would be to require people to opt in in advance by installing the special fedora-retired-packages RPM. People who know enough to manually select and install the package can be presumed to know what they're doing and be making an informed choice to accept whatever package retirements Fedora wants to push.)

PS: I was going to consider this different from the existing situation with fedora-obsolete-packages for various hand-waving reasons, but the more I look at what packages Fedora has removed through the fedora-obsolete-packages RPM, the more I think that the two should be mostly merged together and treated very similarly (ie, require explicit opt-in). The current fedora-obsolete-packages goes well beyond merely removing packages that cause upgrade problems (unless you take a rather expansive view of 'upgrade problems').

FedoraRemovingMustBeOptIn written at 22:44:24; Add Comment


How applications autostart on modern Linux desktops

A while back I mentioned that part of Microsoft Teams' misbehavior was autostarting when you logged in; recently, because I was testing some things on my laptop with alternate desktops, that behavior led to me uninstalling Teams. So all in all, it seemed like a good time to read up on how applications get automatically started when you log in (if they do) on modern Linux desktops like Gnome and Cinnamon.

(This is not normally an issue for me in my desktop environment, because everything in it is explicitly started by hand in a shell script.)

Unsurprisingly, there's a freedesktop.org standard on this, the Desktop Application Autostart Specification, which builds on the .desktop file specification. The simple version is that you set applications to autostart by installing an appropriate .desktop file for them into either /etc/xdg/autostart (for a system-wide autostart on login) or ~/.config/autostart (for an autostart for an individual user).

There are a number of special settings keys that can be in these .desktop files. First, you can have a OnlyShowIn or NotShowIn key that controls which desktops should autostart this or not autostart it (the specification misspells these keys in some mentions of them). Second, you can have a Hidden=true key, in which case the application should not autostart (in any desktop). For obvious reasons, the latter key is most useful in your personal autostart directory.

Some desktops have custom keys of their own, and even custom locations for additional .desktop files to autostart; for example KDE before KDE 5 apparently also used ~/.kde/Autostart (see here). An important and common property is X-GNOME-Autostart-enabled, which is (still) in wide use despite apparently being deprecated. In particular, Cinnamon appears to implement disabling of standard autostart things by copying their .desktop file to your ~/.config/autostart directory and adding a line to the end with 'X-GNOME-Autostart-enabled=false'.

(GNOME .desktop files can also have a phase of GNOME's startup that they happen in; see here and here. KDE .desktop files apparently have somewhat similar properties too.)

Some desktops have their own custom locations for various special things, or have had in the past (eg, also, and for LXDE). However, desktops don't necessarily use custom locations and settings. I know that with Cinnamon, if you add a new thing to be done on startup, Cinnamon puts a new .desktop file in your ~/.config/autostart.

More minimal 'desktops' may or may not automatically support .desktop autostarts. However, according to the Arch wiki's page on XDG Autostart, there are standalone programs that will do this for you if you want them to. On my normal machines, my own window manager environment is so divergent that I don't think autostarting .desktop files is of any use to me, so I'm not planning to try any of them.

(My work laptop runs a more or less standard Cinnamon environment, which automatically handles autostarting things. I believe that Cinnamon considers itself a form of GNOME for OnlyShowIn and NotShowIn and so on .desktop keys. Cinnamon certainly disables autostarted things using a GNOME specific key.)

Applications can arrange to autostart in at least two ways, the honest way and the sneaky way. The honest way is to put a copy of their .desktop file into /etc/xdg/autostart. The sneaky way is to wait until you run them once, then copy their .desktop file into your ~/.config/autostart directory (re-copying this file every time they're run is optional). Based on poking through the RPM package for Microsoft Teams (and also how they apparently have a preferences setting about this), Teams appears to do this the sneaky way.

DesktopAppAutostart written at 22:11:48; Add Comment

A scrolling puzzle involving GTK+, XInput, and alternate desktops (on Fedora)

Recently I discovered that cawbird wasn't recognizing certain 'scroll up' events in parts of its interface. This wasn't a new issue, although I originally thought it was; instead for years I'd been missing some of cawbird's functionality without noticing (and before it, corebird). I don't know exactly what the core problem is, but part of it appears to be some sort of interaction between desktop environments (or the lack of them) and the new approach to handling input devices using the X Input Extension.

The actual Cawbird issue is somewhat complicated and tangled, but fortunately it can be boiled down to a simpler situation through a test program that prints GTK mouse scroll event information (a copy of my version is here). For background, when vertical scrolling happens in GDK, you can see either or both of specific smooth scrolling events, with a Y delta of some value, and 'scroll up' and 'scroll down' events, which appear to always have a Y delta of 0.

On my desktop running fvwm outside of a desktop environment like Gnome, what I see from the test program when I use my mouse scroll wheel is just a stream of scroll up and scroll down events from a source device of 'Core Pointer'. On my work laptop running Cinnamon, scrolling on the touchpad generates smooth scrolling events with various Y deltas depending on how fast I'm moving my fingers, while using the scroll wheel on an external mouse generates both a smooth scrolling event (with a Y delta of -1 or +1 depending on the direction) and a 'scroll up' (or 'scroll down') event; these events have a source device of either the touchpad or the USB mouse, although xinput says that there is an overall 'Core Pointer' device.

As far as I can tell, xinput and the X servers are reporting that the mice involved are set up the same way; the physical mouse (and the touchpad) are extended input devices handled by XINPUT. But something about my fvwm environment or how I start X on my desktop is causing these GTK smooth scroll events to not be generated, to the confusion of at least some programs (and there will probably be more in the future). Unfortunately I have no ideas about what it might be or how to track it down.

(After some further testing, I can say that OpenBox on my laptop and Cinnamon inside a VMWare virtual machine both cause GTK to generate smooth scroll events. The VMWare virtual machine is using my desktop's mouse, but the xinput mouse configuration is different because of VMWware stuff.)

XInputGtkScrollPuzzle written at 00:37:08; Add Comment


An interesting combination of flaws in some /etc/mailcap handling

Somewhat recently we ran into an interestingly tangled issue around /etc/mailcap and MIME handlers on our Ubuntu 18.04 user login machines, one of those situations where there seem to be multiple problems that when combined together lead to an undesirable result. What happened is that we installed the docx2txt Ubuntu package after it was requested by someone, but then found that this broke at least exmh's ability to display MS Office 'docx' file attachments. However, the interesting story is why.

As part of its package, docx2txt includes a /usr/lib/mime/packages file to describe what it can be used to display, which then causes update-mime to update the MIME handling information in /etc/mailcap. Because docx2txt prints what it converts to standard output, its mailcap entry has the 'copiousoutput' tag, and also appears to set its priority to 2, which is relatively low (5 is the default). The first thing that goes wrong is that docx2txt has an uncaught typo in this; it actually sets 'prority' to 2, leaving it at the default priority of 5. Also installed on our Ubuntu machines is LibreOffice, and LibreOffice Writer also has /usr/lib/mime/packages file. LibreOffice's entry for docx files has priority 3, theoretically higher than docx2txt's (and a standard condition to say 'I need to be in a GUI session to work'), but docx2txt's typo means that docx2txt's mailcap entry should actually be preferred over LibreOffice's.

The second thing that happens, which is at least unclear, is that update-mime doesn't pass the priority field through to /etc/mailcap. I think update-mime orders the generated /etc/mailcap from highest priority to lowest, and assumes that programs that use mailcap will pick the first matching entry. If this is what you're supposed to do to handle priorities in mailcap entries, I couldn't find anything that explicitly said it. Since this ordering doesn't seem to be explicitly written up, it's at most folk knowledge and you have to hope that the mailcap parser used by any particular program follows this. Update-mime also doesn't reject docx2txt's partially malformed mailcap line; instead it accepts it as an entry with the default priority (and puts the 'prority' field in the generated /etc/mailcap, where it may mislead you if you're reading fast).

The third thing going wrong is that exmh turns out to have bad handling of mailcap entries that write their results to standard output, so that you can theoretically display it inline. What you would expect to happen is that exmh would run the handler (either automatically or on request) and then display the result inline. Instead, it has a little display for that attachment that looks like you can't do anything (normally it will say 'you can view this with ...', so you know the section can be handled), and if you actually ask exmh to run the mailcap handler to generate the output, it writes the generated output to its standard error (which almost certainly isn't connected to anything useful). Given that this is spectacularly useless, exmh clearly hasn't been used very much with mailcap entries that want to do this instead of running an external program that will display things on its own.

Exmh's bad handling of 'copiousoutput' mailcap entries wouldn't be an issue except for the mangled priority field of docx2txt; without that, exmh picks LibreOffice instead (which works fine). Docx2txt's bad 'prority' field wouldn't have persisted if update-mime (or some other tool) checked for and rejected improperly formed mailcap entries; instead update-mime covered up the problem and increased docx2txt's priority over what it had been intended. It took a cascade of flaws to expose this issue.

(Our solution was to uninstall docx2txt again. It's not important enough to break exmh for people using it, and anything else that may also have problems with such an entry. Now that I understand the issue, I will see if I have enough energy to file a Debian bug report against docx2txt, which still has the bug in the current package source. Of course it will be years before any bug fix is available in Ubuntu.)

MailcapDocx2txtTangle written at 23:31:28; Add Comment


My mixed feelings about 'swap on zram' for Linux

Recently I read about how Fedora will be enabling 'swap on zram', including for upgraded machines, in a future version of Fedora. I suspect that a similar change may some day come to Ubuntu as well, because it's an attractive feature from some perspectives. My feelings are a bit more mixed.

Zram is a dynamically sized compressed block device in RAM (ie a compressed ramdisk); 'swap on zram' is using a zram device as a swap device (or as your sole swap device). This effectively turns inactive RAM pages into compressed RAM in an indirect way while pacifying the kernel's traditional desire to have some swap space. The pitch for swap on zram is very nicely summarized on the Fedora page as 'swap is useful, except when it's slow'. Being in RAM, swap on zram is very fast; it's the fastest swap device you can have, faster than SSD or even NVMe.

(This implies that how much of an advantage swap on zram is for your system depends partly on how fast your existing swap storage is. But RAM is still much faster than even NVMe.)

The drawback of swap on zram is that it is not really freeing up all of your memory to 'swap things out'; instead the estimate is that it will generally compress to about half the previous size. This drawback is the source of my mixed feelings about swap on zram for my Fedora desktops and our Ubuntu servers.

On my Fedora desktops, I generally barely use any swap space, which means that swap on zram would be harmless. If I do temporarily use a surge of swap space, being able to get the contents back fast is probably good; Linux has generally had an irritating tendency to swap out things I wanted, like bits of my window manager's processes. Both my home machine and my work machine have 32 GB of RAM, and peak swap usage over the past 120 days has been under a gigabyte, so I'm barely going to notice the memory effects. As a result I'm likely to leave swap on zram in its default enabled state when Fedora gives it to me.

Unfortunately this is not the case for our Ubuntu LTS servers. Those of our Ubuntu servers that use much swap at all tend to eventually end up with their swap space full or mostly full of completely idle data that just sits there. Keeping even a compressed version of this data in RAM is not what we want; we really want it to be swapped out of memory entirely. Swap on zram would be a loss of RAM for us on our Ubuntu servers. As a result, if and when Ubuntu enables this by default, I expect us to turn it off again.

One way to put this is that swap on zram is faster than conventional swap but not as useful and effective for clearing RAM. Which of these is more important is not absolute but depends on your situation. If you're actively swapping, then speed matters (fast swap lowers the chances of swapping yourself to death). If you're instead pushing out idle or outright dormant memory in order to make room for more productive uses of that RAM, then clearing RAM matters most.

SwapOnZramMixedFeelings written at 00:02:36; Add Comment


My various settings in X to get programs working on my HiDPI display

Back when I got my HiDPI display (a 27" Dell P2715Q), I wrote an entry about what the core practical problems with HiDPI seemed to be on Linux and talked in general terms about what HiDPI related settings were available but I never wrote about what specific things I was setting and where. Today I'm going to remedy this, partly for my own future use for the hopeful future day when I need to duplicate this at work. Since I'm doing this two years after the fact, there will be an exciting element of software archaeology involved, because now I have to find all of those settings from the clues I left behind in earlier entries.

As mentioned in my old entry, the Dell P2715Q is a 163 DPI display. To make the X server itself know the correct DPI, I run it with a '-dpi 163' command line argument. I don't use XDM or any other graphical login manager; I start the X server from a text console with a nest of shell scripts, so I can supply custom arguments this way. I don't do anything with xrandr, which came up with plausible reported screen dimensions of 597mm x 336mm and didn't appear to need any changes.

I use xsettingsd as my XSettings daemon, and set two DPI related properties in .xsettingsd:

Gdk/UnscaledDPI 166912
Xft/DPI 166912

Both of these values are my 163 DPI multiplied by 1024. For Xft/DPI, this is documented in the Xsettings registry. I'm not sure if I found documentation for Gdk/UnscaledDPI or just assumed it would be in the same units as Xft/DPI.

There is also an X resource setting:

Xft.dpi: 163

As we can see, this is just the DPI.

Then I set some environment variables, which (in 2018) came from Arch's HiDPI page, the Gnome wiki, and the GTK3+ X page. First there is a setting to tell Qt apps to honor the screen DPI:


Then there is a pair of GTK settings to force GTK+ applications to scale their UI elements up to HiDPI but not scale the text, as explained in more depth in my original entry:

export GDK_SCALE=2
export GDK_DPI_SCALE=0.5

These three environment variables are only necessary for Qt and GTK+ applications, not basic X applications. Basic X applications seem to work fine with some combination of the Xft.dpi X resource and the XSettings system.

If you're running remote X applications from your HiDPI X session, as I am these days, they will automatically see your Xft.dpi X resource and your XSettings settings. They won't normally see your (my) specially set environment variables. Fortunately I mostly run basic X applications that only seem to use X resources and perhaps XSettings, and so basically just work the same as your local versions.

(At least after you fix any problems you have with X cursors on the remote machines.)

At the moment I'm not sure if setting the environment variables for remote X programs (for instance by logging in with 'ssh -X', setting them by hand, and then running the relevant program) works just the same as setting them locally. Some testing suggests that it probably is; while I see some visual differences, this is probably partly just because I haven't adjusted my remote programs that I'm testing with the way I have my regularly used local ones (after all, I normally use them on my work regular DPI displays and hopefully some day I'll be doing that again).

The final setting I make is in Firefox. As mentioned in passing in this entry, I manually set the about:config setting layout.css.devPixelsPerPx to 1.7, which is down from what would be the default of '2' based on my overall settings. I found that if I left Firefox alone with these other settings, its font sizes looked too big to me. A devPixelsPerPx setting of 1.7 is about right for what the Arch Wiki Firefox Tweaks page suggests should be correct here, and it looks good to me which is what I care about most.

Sidebar: X resources tweaks to specific applications

Xterm sizes the width of the scrollbar in pixels, which isn't ideal on a HiDPI display. It is normally 14 pixels, so I increased it to:

XTerm*VT100.scrollbar.width: 24

Urxvt needs the same tweak but it's called something different:

URxvt*thickness: 24

I think I also tried to scale up XTerm's menu fonts but I'm not sure it actually worked, and I seem to have the same X resource settings (with the same comments) in my work X resource file.

HiDPIMyXSettings written at 00:43:43; Add Comment


Switching to the new in-kernel WireGuard module was easy (on Fedora 31)

One of the quietly exciting bits of recent kernel news for me is that WireGuard is now built in to the Linux kernel from kernel 5.6 onward. I've been using a private WireGuard tunnel on my Fedora machines for several years now, but it's been through the additional COPR repository with an additional DKMS based kernel module package, wireguard-dkms. Among other things, this contributed to my multi-step process fo updating Fedora kernels.

When I first updated to a Fedora 5.6 kernel, I wondered if I was going to have to manually use DKMS to remove the DKMS installed WireGuard module in favour of the one from the kernel itself. As it turned out, I didn't have to do anything; current versions of the COPR wireguard-dkms package have a dkms.conf that tells DKMS not to build the module on 5.6+ kernels. Updating to a 5.6 kernel got me a warning from DKMS that the WireGuard DKMS couldn't build on this kernel, but that was actually good news. After a reboot, my WireGuard tunnel was back up just like normal. As far as I can tell there is no difference in operation between the DKMS WireGuard version and the now in-kernel version except that I have one fewer DKMS module to rebuild on kernel updates.

(The one precaution I took with the COPR wireguard-dkms package was to not install any further updates to it once I'd updated to a 5.6 kernel, because that was the easiest way to keep a WireGuard module in my last 5.5 kernel in case I wanted to fall back.)

After I'd gone through enough 5.6.x Fedora kernel updates to be sure that I wasn't going back to a 5.5 kernel that would need a WireGuard DKMS, I removed the WireGuard DKMS package with 'dnf remove wireguard-dkms'. Then I let things sit until today, when I did two more cleanup steps; I disabled the WireGuard COPR repository and switched over to the official Fedora package for WireGuard tools with 'dnf distro-sync wireguard-tools'. Somewhat to my surprise, this actually installed an updated version (going from 1.0.20200102 to 1.0.20200319).

(I believe that dnf hadn't previously recognized this as an upgrade because of a difference in RPM epoch number between the two package sources. This may be deliberate so that COPR packages override regular Fedora packages at all times.)

PS: Now that WireGuard is an official part of the Fedora kernel, I feel that I should do something to set up a WireGuard VPN on my work laptop. Unfortunately this really needs a WireGuard VPN server (or touchdown point) of some sort at work. We don't currently have one and the state of the world makes it unlikely we'll deploy one in the near future, even for private sysadmin use.

WireGuardKernelEasySwitch written at 00:30:30; Add Comment


Linux software RAID resync speed limits are too low for SSDs

When you add or replace a disk in Linux's software RAID, it has to be resynchronized with the rest of the RAID array. As very briefly covered in the RAID wiki's page on resync, this resync process has speed limits that are controlled by the kernel sysctls dev.raid.speed_limit_min and dev.raid.speed_limit_max (in KBytes a second). As covered in md(4)), if there's no other relevant IO activity, resync will run up to the maximum speed; if there is other relevant IO activity, the resync speed will throttle down to the minimum (which many people would raise on the fly in order to make resyncs go faster).

(In current kernels, it appears that relevant IO activity is any IO activity to the underlying disks of the software RAID, whether or not it's through the array being resynced.)

If you look at your system, you will very likely see that the values for minimum and maximum speeds are 1,000 KB/sec and 200,000 KB/sec respectively; these have been the kernel defaults since at least 2.6.12-rc2 in 2005, when the Linux kernel git repository was started. These were fine defaults in 2005 in the era of hard drives that were relatively small and relatively slow, and in particular for you were very unlikely to approach the maximum speed even on fast hard drives. Even fast hard drives generally only went at 160 Mbytes/sec of sustained write bandwidth, comfortably under the default and normal speed_limit_max.

This is no longer true in a world where SSDs are increasingly common (for example, all of our modern Linux servers with mirrored disks use SSDs). In theory SSDs can write at data rates well over 200 MBytes/sec; claimed data rates are typically around 500 Mbytes/sec for sustained writes. In this world, the default software RAID speed_limit_max value is less than half the speed that you might be able to get, and so you should strongly consider raising dev.raid.speed_limit_max if you have SSDs.

You should probably also raise speed_limit_min, whether or not you have SSDs, because the current minimum is effectively 'stop the resync when there's enough other IO activity' since modern disks are big enough that they will often take more than a week to resync at 1,000 KB/sec. You probably don't want to wait that long. If you have SSDs, you should probably raise it a lot, since SSDs don't really suffer from random IO slowing everything down the way HDs do.

(Raising both of these significantly will probably become part of our standard server install, now that this has occurred to me.)

Unfortunately, depending on what SSDs you use, this may not do you as much good as you would like, because it seems that some SSDs can have very unimpressive sustained write speeds in practice over a large resync. We have a bunch of basic SanDisk 64 GB SSDs (the 'SDSSDP06') that we use in servers, and we lost one recently and had to do a resync on that machine. Despite basically no other IO load at the time (and 100% utilization of the new disk), the eventual sustained write rate we got was decidedly unimpressive (after an initial amount of quite good performance). The replacement SSD had been used before, so perhaps the poor SSD was busy frantically erasing flash blocks and so on as we were trying to push data down its throat.

(Our metrics system makes for interesting viewing during the resync. It appears that we wrote about 43 GB of the almost 64 GB to the new SSD at probably the software RAID speed limit before write bandwidth fell off a cliff. It's just that the remaining portion of about 16 GB of writes took several times as long as the first portion.)

SoftwareRaidResyncOnSSDs written at 00:20:57; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.