2023-05-17
(Graphical) Unix has always had desktop environments
One of the stories that you could tell about the X Window System and by extension graphical Unix is that first came (simple) window managers and xterm. Only later did Unix developed desktop environments like GNOME and KDE. This is certainly more or less the development that happened on open source PC Unixes and to a certain degree it's the experience many people had earlier on workstation Unix machines running X, but it's actually not historically accurate. In reality, Unix has had full scale desktop environments of various degrees of complexity more or less from the beginning of serious graphical Unix.
(This origin story of desktop environments on Unix is sufficiently attractive that I was about to use it in another entry until I paused to think a bit more.)
In the beginning, there was no X. Workstation vendors had to create their own GUI environments, and so of course they didn't just create a terminal emulator and window management; instead, they tended to create a whole integrated suite of GUI applications and an environment to run them in. One early example of this is Sun's Suntools/SunView, but SGI had one too and I believe most other Unix workstation vendors did as well (plus I believe CMU's Andrew had its own desktop-like environment). When X started to win out, these Unix vendors didn't abandon their existing GUI desktops to fall back to a much less well developed window manager and terminals experience; instead they re-implemented versions of their old desktop environments on top of X (such as Sun's OpenWindows) or created new ones such as the Common Desktop Environment (CDE).
Open source PC Unixes didn't follow this pattern because in the 1990s, there were few or no open source desktop environments. Since X window managers, xterm, and various other graphical programs were free software, that's what people had available to build their environments from, and that's what people did for a while. In this, the open source PC Unixes were recapitulating an earlier history of people running X on Unix vendor desktop workstations before the vendor itself supported X on their hardware and had built a desktop for it (and X itself started out this way, as initially distributed out of MIT and Project Athena).
(Some people then continued to use a basic X environment because they liked their version of it better than the Unix workstation vendor's desktop. Sometimes such a basic X environment ran faster, too, because the vendor had written a bunch of bloatware.)
Having metrics for something attracts your attention to it
For reasons beyond the scope of this entry, we didn't collect any metrics from our Ubuntu 18.04 ZFS fileservers (trying to do so early on led to kernel panics). When we upgraded all of them to Ubuntu 22.04, we changed this, putting various host agents on them and collecting a horde of metrics that go into our Prometheus metrics system, some of which automatically appear on our dashboards. One of the results of this is that we've started noticing things about what's happening on our fileservers. For example, at various times, we've noticed significant NFS read volume, significant NFS RPC counts, visible load averages, and specific moments when the ZFS ARC has shrunk. Noticing these things has led us to investigate some of them and pushed me to put together tools to make this easier.
What we haven't seen is any indication that these things we're now noticing are causing issues on our NFS clients (ie, our normal Ubuntu servers), or that they're at all unusual. Right now, my best guess is that everything we're seeing now has been quietly going on for some time. Every so often for years, people have run jobs on our SLURM cluster that repeatedly read a lot of data over NFS, and other people have run things that scan directories a lot, and I know our ZFS ARC size has been bouncing around for a long time. Instead, what we're seeing is that metrics attract attention, at least when they're new.
This isn't necessarily a bad thing, as long as we don't over-react. Before we had these metrics we probably had very little idea what was a normal operating state for our fileservers, so if we'd had to look at them during a problem we'd have had much less idea what was normal and what was exceptional. Now we're learning more, and in a while the various things these metrics are telling us probably won't be surprising news (and to a certain extent that's already happening).
This is in theory not a new idea for me, but it's one thing to know it intellectually and another thing to experience it as new metrics appear and I start digging into them and what they expose. It's at least been a while since I went through this experience, and this time around is a useful reminder.
(This is related to the idea that having metrics for something can be dangerous and also that dashboards can be overly attractive. Have I maybe spent a bit too much time fiddling with ZFS ARC metrics when our ARC sizes don't really matter because our ARC hit rates are high? Possibly.)
PS: Technically what attracts attention is being able to readily see those metrics, not the metrics themselves. We collect huge piles of metrics that draw no attention at all because they go straight into the Prometheus database and never get visualized on any dashboards. But that's a detail, so let's pretend that we collect metrics because we're going to use them instead of because they're there by default.