The work that's not being done from home is slowly accumulating for us
At the moment, the most recent things I've seen have talked about us not returning to the office before September, and then not all of us at one time. This gives me complicated feelings, including about what work we are doing. From the outside, our current work from home situation probably looks like everything is going pretty well. We've kept the computing lights on, things are being done, and so far all of the ordinary things that people ask of us get done as promptly as usual. A few hardware issues have come up and have been dealt with by people making brief trips into the office. So it all looks healthy; you might even wonder why we need offices.
When I look at the situation from inside, things are a bit different. We may be keeping the normal lights on, but at the same time there's a steadily growing amount of work that is not being done because of our working from home. The most obvious thing is that ordering new servers and other hardware has basically been shut down; not only are we not in the office to work on any hardware, it mostly can't even be delivered to the university right now.
The next obvious thing is the timing of any roll out of Ubuntu 20.04 on our machines. Under normal circumstances, we'd have all of the infrastructure for installing 20.04 machines ready and probably some test machines out there for people to poke at, and we'd be hoping to migrate a number of user-visible machines in August before the fall semester starts. That's looking unlikely, since at this point all we have is an ISO install image that's been tested only in temporary virtual machines. Since we haven't been in the office, we haven't set up any real servers running 20.04 on an ongoing basis. We're in basically the bad case situation I imagined back in early April.
(And of course many of the people we'd like to have poke at 20.04 test machines are busy with their own work from home problems, so even if we had test machines, they would probably get less testing than usual.)
Another sign is that many of our Ubuntu servers have been up without a reboot for what is now an awfully long time for us. Under normal circumstances we might have scheduled a kernel update and reboot by now, but under work from home conditions we only want to take the risk of doing kernel updates and rebooting important servers if there is something critical. If something goes wrong, it's not a walk down to the machine room, it's a trip into the office (and a rather longer downtime).
There's also a slowly accumulating amount of pending physical networking work, where we're asked to change what networks particular rooms or network ports are on because people are moving around. This work traditionally grows as the fall semester approaches and space starts getting sorted out for new graduate students and so on, although that could change drastically this year depending on the university's overall plans for what graduate students will do and where they will work.
(To put it one way, a great deal of graduate student space is not set up for appropriate physical distancing. Nor is a fair amount of other office and lab space.)
One level up from this is that there's a number of projects that need to use some physical servers. We have a bunch of OpenBSD machines on old OpenBSD versions that could do with updates (and refreshes on to new hardware), for example, but we need to build them out in test setups first. Another example is that we have plans to significantly change how we currently use SLURM, but that needs a few machines to set up a little new cluster (on our server network, as part of our full environment).
(A number of these projects need custom network connectivity, such as new test firewalls needing little test networks. Traditionally we build this in some of our 'lab space', with servers just sitting on a table wired together.)
Much of this is inherent in us having and using physical servers. Having physical servers in a machine room means receiving new hardware, racking it, cabling it up, and installing it, all of which we have to do in person (plus at least pulling the cables out of any old hardware that it's replacing). Some of it (such as our reluctance to reboot servers) is because we don't have full remote KVM over IP capabilities on our servers.
PS: We're also lucky that all of this didn't happen in a year when we'd planned to get and deploy a major set of hardware, such as the year when we got the hardware for our current generation of fileservers.
Link: Code Only Says What it Does
Marc Booker's Code Only Says What it Does (via) is about what code doesn't say and why all of those things matter. Because I want you to read the article, I'm going to quote all of the first paragraph:
Code says what it does. That's important for the computer, because code is the way that we ask the computer to do something. It's OK for humans, as long as we never have to modify or debug the code. As soon as we do, we have a problem. Fundamentally, debugging is an exercise in changing what a program does to match what it should do. It requires us to know what a program should do, which isn't captured in the code. Sometimes that's easy: What it does is crash, what it should do is not crash. Outside those trivial cases, discovering intent is harder.
This is not an issue that's exclusive to programming, as I've written about in Configuration management is not documentation, at least not of intentions (procedures and checklists and runbooks aren't documentation either). In computing we love to not write documentation, but not writing down our intentions in some form is just piling up future problems.