2018-06-27
My Ryzen-based Linux office machine appears to finally be stable
Back in January, I switched over to my Ryzen-based office workstation and unfortunately more or less immediately hit problems, the most pernicious of which was an ongoing hang under some circumstances when the machine became idle, which turned out to be a known issue that a fair number of people were running into (Fedora, kernel.org, Ubuntu). From the bug reports about the issue, I was able to research some kernel parameters that stabilized my system, but I didn't consider this really satisfactory for various reasons.
For a long time these magic kernel command line parameters and similar tricks were the only workarounds available, at least to me. However there had long been rumors of a magic AMD provided magic firmware option that could work around the problem, generally exposed to you and me in a BIOS setting called 'Power Supply Idle Control', which you allegedly wanted to set to 'Typical current idle'. This apparently became available starting with AGESA 1.0.0.2a, which various motherboard vendors rolled into their overall BIOS at very different times. For bonus fun, apparently not all BIOS vendors even expose these AMD firmware settings, although enthusiast motherboards usually do.
(AMD may have released this as far back as last December, but on the ASUS Prime X370-PRO it appeared no earlier than BIOS 4008, from mid-April, and perhaps required the June 2nd BIOS 4011.)
I've been running my Fedora 27 Ryzen workstation with only this BIOS setting (ie, with no more special kernel command line parameters) since June 11th, using Asus's Prime X370-PRO BIOS version 4011. Although Fedora keeps coming out with kernel updates that get me to reboot the machine, it has been stable overnight and over weekends, which is something that it couldn't manage before on the rare occasions when I took out my kernel parameters workaround as an experiment. Over this time I've used both the Fedora 27 4.16.x kernel and just recently the 4.17.x kernel from the updates-testing repo; both have been stable and free of hangs (so far).
Given my experience so far and that most people who've tried this BIOS option have also reported good results with it, I'm cautiously optimistic that my machine is now stable without needing kernel behavior changes. I haven't re-done my power measurements so I have no idea if the machine uses somewhat more power when deeply idle, and honestly I don't care.
This has been a long time coming, but at least it seems to finally be here.
(It's possible that I could have done this back in mid-April, but the timing was bad to try it out at the time for reasons beyond the scope of this entry. In general I haven't been feeling very enthusiastic about taking stability risks with this machine; once the kernel parameters seemed to work I was willing to let things sit for a while instead of rushing into more experimentation and possible failures and frustrations.)
As a side note, finding the option in your BIOS is generally a bit tricky because it's usually hiding inside an AMD-provided blob of settings. On the Prime X370-PRO (which I believe is typical), you have to go to the 'Advanced' menu of additional settings, then go down to the bottom to something called 'AMD CBS' or 'CBS', and expand it to actually see the setting. Unlike vendor-provided BIOS settings, there probably isn't any documentation.
(The stuff in the AMD CBS submenu is apparently something AMD supplies to vendors as basically a black box blob that they insert somewhere in their UEFI menus. What AMD includes in the settings varies from AGESA version to AGESA version and they're generally mostly undocumented.)
Sidebar: Why I switched from kernel parameters to the BIOS setting
The short version is that I considered the kernel parameters to be
fragile magic, specifically the rcu_nocbs
setting, since it
pretty much had to be staving off the hangs only through some
indirect and perhaps coincidental effect on the overall system's
behavior. The problem with indirect, undesigned, and coincidental
effects is that they can easily go away or change when people make
changes.
The AMD BIOS setting is its own sort of magic, but at least it's
direct magic and hopefully it's at less risk of being destabilized
by kernel or system changes.
I think it's still reasonable to run personal servers on the Internet
In his comment on yesterday's entry, Pete Zaitcev showed me that I should clarify my opinion on running your own personal servers today on the Internet (to the extent that I have an opinion at all). To summarize the rest of this entry, I don't think there's any compelling reason why you shouldn't run a personal server if you want to and you more or less know what you're getting yourself into. At the same time, it's not trivial to do so; it's very much the DIY choice, with all that that implies.
First off, I definitely think that you should have a personal presence on the Internet that's not tied to your (current) employer; in other words, don't make my university sysadmin's email mistake. Having your own domain name is optional and will cost you some money and effort but it probably pays off in the long run, at least for websites (in today's email spam environment, changing email addresses every few years may actually be a feature). However, none of this requires you to have your own servers; plenty of places support you pointing some aspect of your domain at their infrastructure, at least for common things like websites, email, and DNS. Taking advantage of this (either for free or paying people) is definitely the easy way to go.
However, I think that it's still reasonable to have your own server or servers instead, especially now that you can get inexpensive virtual machines that you genuinely run yourself (your choices used to be 'shared hosting' or paying for actual physical hardware and rack space). Modern Unix server software is not full of holes and is generally relatively straightforward to administer, the Internet is not an intrinsically hostile place of DDoS and hate, and most people are still willing to talk to random machines for things like websites (your mileage may vary for things like sending email from your server to GMail). Generally if you put a modern Unix on the Internet for personal use and operate it with decent competence, you'll be okay at one level.
(My impression is that modern VPS providers have done a lot of work to make it very easy for you to bring up a new generic Ubuntu, CentOS, or whatever server that will come up in a sane and operable condition and probably automatically apply security updates and so on. I don't know what Amazon AWS is like, though.)
At another level, by running your own server you're making tradeoffs and accepting limitations. The broad downside is that you've chosen the DIY approach and DIY is always more work and requires more knowledge than getting someone else to do it for you. If you're already a sysadmin it can feel like a busman's holiday, and if you're not a sysadmin or an experienced Unix person you're going to have to turn yourself into one. One dangerous side of this is that it's easy to make mistakes through ignorance, for example not making sure you have some sort of backups. For a personal server, you don't necessarily need everything you want for running one in a company, but there are still a lot of things that may bite you some day. System administration is unfortunately a field so full of trivia that people keep having to rediscover pieces of it the hard way.
Another limitation is that, to put it one way, you're not going to get your own personal GMail, either in its interface or probably in its resilience against spam and other problems. The open source world has produced great marvels and there are things that can come close to some parts of the big company services, but on the whole the DIY approach is going to get you results that are objectively inferior in some ways. It's up to you to decide if you care for your usage; if you read all your email through an IMAP client, for example, the lack of a sophisticated GMail web interface is not an issue.
Judged purely by the end results, this can make running your own server a bad choice. You spend more time, have to learn more things and worry about more issues, and you get an inferior result. If you're going to run your own server anyway, you should have an answer to the question of why, or what you get out of it. One perfectly good answer is 'I want to play around with my own Unix server'; another is 'I don't like having so much of my Internet life at the mercy of big indifferent companies'.
Further, my current broad view is that you shouldn't run anything critical on a personal server unless you're extremely confident that you know what you're doing and that you have working backups (on another provider). Casually operated personal servers are best used for things that you can afford to be down for a few days while you patch things back together from an upgrade, a security problem, an accident, or your VPS provider screwing something up. If you need a highly resilient personal server environment, you're probably looking at a significant amount of work unless you're already an expert in the field and can put together a solid Puppet, Kubernetes, or AWS environment in your sleep.
On the flipside, this is caution speaking. Most of the time you're going to be fine, especially if you pay your VPS provider for some form of backups (and then keep your own offsite copies). Just make sure to apply security updates and as part of this, upgrade or build a new version of the VPS when your Unix or Linux distribution reaches its end of life.
(My personal plan is to use at least two completely separate VPS providers, but that requires getting over my inertia and lack of desire to run my own infrastructure.)
By the way, all of this assumes that you aren't someone who is going to be actively and specifically targeted by attackers. If this is not true, you really need to know what you're doing as far as security goes and you're probably better off in the tender arms of GMail and so on. GMail has a very good security team with a lot of resources, far more than you or I do.