2023-11-30
My sysadmin's view of going from Centrex to VoIP (sort of)
I recently read j. b. crawford's Centrex, about the telephone service by that name that was offered to organizations in at least the US and Canada. It sparked not so much nostalgia as memories, because up until very recently the University of Toronto's downtown campus used Centrex, complete with what I now know (from the article) is a distinctive Centrex pattern of five-digit dialing for phone numbers inside the organization. In the University of Toronto's case, most phone numbers were 978-xxxx and so it was common for people and documentation to list numbers as, for example, '8-4942', which is what you actually dialed on a Centrex phone. If you were calling from a non-Centrex line, you were expected to remember the '97' in front.
(It's probably not hard to still find online web pages that talk about our numbers that way, never mind physical signs around the university. There are some remarkably old signs on things.)
I'm not sure what drove the university to move away from Centrex (if I was ever told I've forgotten now); it might have been the potential for cost savings, or it might have been Bell Canada suggesting that Centrex rates might or would be going up. The university's general replacement for our Centrex service was (and is) Voice over IP (VoIP). The actual rollout was rather slow, partly because it was just getting started in early 2020 and you can guess what happened next. You can still get hardwired phone lines if you want to, but they're now much more costly than they used to be (and the cost is paid by your department or group), so they tend to be reserved for situations where you really need them.
(We have one such line in our machine room for safety reasons.)
In general I'm sure Voice over IP is fine, but from a sysadmin's view it has a little problem, at least as implemented here at the university. That issue is that it runs over your regular network. There is no separate physical or even logical network for VoIP phones; instead you plunk them down on your department's network in some convenient place (physically and logically), and then they phone home to establish their VoIP circuits. For regular usage this is fine, but one of the times sysadmins may need to use their phones is exactly when the network is broken and they need to call someone about it. If your phone is VoIP and it runs over your broken network, you have a problem.
(The university's current VoIP setup also has some nice conveniences, like emailing you recordings of any voicemail messages people have left you, so you don't have to deal with the phone's voicemail interface and can just play them on your computer.)
The department could have dealt with this problem for us by leaving us on hardwired phone lines even after Centrex service stopped. However, it was cheaper to get us basic cellphones (with voice-only plans). At this point we've gone through three different models for reasons outside the scope of this entry; the current one and the previous one have been actual Android devices, which I haven't been too impressed with. But they work to make phone calls, probably, and they also do a few other things (also).
(One of the things we don't use these phones for is MFA authentication; instead we have much smaller and more convenient physical tokens. Nor do we normally carry them around, especially out of hours; in my group, everyone's smartphones spend all their time on their office desks. We view them almost entirely as a replacement for our old desk phones.)
2023-11-28
Why we scrape Prometheus Blackbox's metrics endpoint
The Prometheus Blackbox exporter is how you do
many external checks on machines and services ('endpoints' in
Blackbox's jargon), ranging from ping checks up through making HTTPS
requests and checking the results. The Blackbox exporter has a
somewhat confusing usage; unlike most
exporters, you don't so much scrape it as scrape things through it,
using probes against targets. As part of this, each combination
of probe and target is a separate Prometheus scrape, each of which
generates an 'up
' metric for that particular scrape. Unlike regular
Prometheus exporters, these per-scrape 'up
' metrics aren't all
that useful because all they tell you is that your Prometheus server could talk to that Blackbox exporter.
Actual success or failure of your check is communicated through the
'probe_success
' metric, which will be 0 if it failed for some
reason.
The Blackbox exporter also has its own /metrics endpoint that gives
you metrics for Blackbox itself, which are a combination of general
Go and Prometheus exporter metrics with some Blackbox specific ones.
One of the reasons to monitor this metrics endpoint is that it will
tell you if Blackbox has been unable to successfully reload its
configuration for a while, which is something that saved us with
the main Prometheus daemon. However,
another reason that we monitor the Blackbox metrics endpoint is that
scraping Blackbox's own metrics gives us a simple check of whether
or not it's up, with its own 'up
' metric that's convenient to
alert on.
Of course, you can use the 'up
' metrics you get from scraping
targets through Blackbox, but if you do you have some decisions to
make. Do you pick a single probe and target combination that you
expect to always be present in your configuration and alert if its
'up
' is 0? Do you alert if a sufficient number or percentage of
'up
' metrics for Blackbox probes go to zero? If you're using more
than one Blackbox exporter for whatever reason, do you have labels
set that will tell your alerting rule what Blackbox exporter was
used for a particular scrape?
(It turns out that our Blackbox label rewriting doesn't pass through
this information. It's not normally important, which is probably
why the stock example doesn't preserve it, but it becomes potentially
quite relevant if you're using the 'up
' metrics from Blackbox
checks as a health check on Blackbox itself.)
Simply adding a separate scrape of the Blackbox /metrics endpoint is the simple way out. It gives you a scrape that doesn't depend on what things you're checking through Blackbox, the scrape will definitely have labels that tell you what Blackbox you're talking to, and the extra Blackbox health metrics are potentially useful.
2023-11-14
Amanda and deciding between server and client compression
We use Amanda for our backups, which delegates the actual creation of backup blobs (tar archives or what have you) and their restoration to other programs like tar (although it can be clever about dealing with them). One of the things that Amanda can do with these blobs is compress them. This compression can be done with different compressors, and it can be performed on either the Amanda backup server or on the client that is being backed up (provided that you have the necessary programs on each of them). We back up most of our filesystems uncompressed, but we have a single big filesystem that we compress; it's almost all text, so it compresses very well (probably especially the bits of email that are encoded in base64).
When we started compressing the backups of this filesystem, we did it on the Amanda server for an assortment of reasons (including that the filesystem then lived on one of our shared fileservers, which at the time we started this was actually one of our first-generation Solaris 10 fileservers). Recently we switched Amanda's compression to being done on the client instead, and doing so has subtly improved our backup system, due to some of the tradeoffs involved. Specifically, switching to client compression has improved how fast we can restore things from this backup, which is now limited basically by the speed of the HDDs we have our Amanda backups on.
In isolation, the absolute speed of compressing or decompressing a single thing is limited by CPU performance, generally single-core CPU performance. During backups (and also during restores), you may not be operating in isolation; there are often other processes running, and you might even be compressing several different backup streams at once on either the server or the client. Our current Amanda backup servers have single Intel Pentium D1508s, which have a maximum turbo speed of 2.6 Ghz and a base speed of 2.2 Ghz. By contrast, our current /var/mail server has a Xeon E-2226G, with a single core turbo speed of 4.7 Ghz and a base speed of 3.4 Ghz. So one obvious consideration of whether to do compression on the server or the client is which one will be able to do it faster, given both the raw CPU speeds and how loaded the CPU may be at the time.
The CPUs on our backup servers were fast enough that the time it took to back up and compress this filesystem wasn't a problem. But that's because we have a lot of freedom with how long our backups take, as long as they're fast enough (they start in the late evening and just need to finish by morning; these days they only take a few hours in total).
However, things are different during restores, especially selective restores. In an Amanda restore of only a few things from a compressed backup, Amanda will spend most of its time reading through your compressed archive. The faster it can do this, the better, and you may well want restores to finish as fast as possible (we certainly do here). By moving decompression of the backups from the Amanda server (with a slow CPU) to the Amanda client (with a fast CPU), we changed the bottleneck from how fast the Amanda server could decompress things (which was not too fast) to how fast it could read data off the HDDs.
(As a side effect we reduced the amount of data flowing over the network during both restores and backups, since we're now sending the compressed backup back and forth instead of the uncompressed one. In some environments this might be important all on its own; in our environment, both systems have 10G-T and are not network limited for backups and restores.)
Beyond speeding up restores of filesystems with compressed backups, there are some other considerations for where you might want to do compression (mostly focused on backups). First, CPU performance is only an issue if compression is the limiting factor, ie you can both feed it data from the filesystem fast enough and write out its compressed output at full speed. If your bottleneck is elsewhere, even a slow CPU may be fast enough to keep up on backups. If you're compressing the backups of multiple filesystems, you probably care about how many cores (or CPUs) you have and where you have them. If you have fifty filesystems from fifty different backup clients to compress, you're probably going to want to do that on the clients, because you probably don't have that many cores on your backup server.
If you have network bandwidth limits, compressing (and decompressing) on the client reduces the amount of data transferred between it and the server. If the client CPU is slow, this will also naturally further throttle the bandwidth used (although it won't change the total amount of data transferred).
As far as I know, Amanda does all compression on the fly before anything is written to the Amanda holding disk or to 'tape', so if creating the backups, sending them over the network, and compressing them (not necessarily in that order) are all fast enough, where the compression is done doesn't reduce the bandwidth you want for your holding disk. Just as with network bandwidth, slow compression (on either the client or the server) may naturally reduce bandwidth demands on the holding disk.
2023-11-13
Amanda has clever restores from tar archives (sometimes)
Yesterday I wrote an entry about how the Amanda backup system reads all the way through tar archives on restores. Except, it turns out, this is only partially true, because under some circumstances Amanda will thoroughly optimize restores from tar archives (and possibly other archive formats). When everything is lined up right, what you'll observe is that Amanda reads only the data being actually restored, however little or much it is, and as a result restores of small files out of large backups can be quite fast. The situation where we've seen this happen is uncompressed backups made using the amgtar backup application.
Under normal circumstances, Amanda tries to make an index of every backup that says what files and directories are in it. These indexes are used when you use amrecover to look around in a backed up filesystem and do a restore of a single file, for example. Under some circumstances, Amanda will build indexes of tar archives that list not just each name in the archive but also where it starts in the archive (and implicitly where it ends, based on the start of the next item). When you do a restore and the backup blob is on disk (and so can be seek'd around in), Amanda on your backup server will use this index to send just the pieces of the archive that are needed to the machine you're running amrecover on, where they get reconstructed into an apparent tar archive and then fed to tar to be extracted. Since tar is only being fed things that it should extract, it doesn't matter that tar itself wants to read all the way through the archive.
Making this work relies on a lot of things, including that the format of tar archives makes it simple to cut them apart and glue them back together again. A more complex backup format would give Amanda much more heartburn (if, for example, it started with an index of the rest of the data). I also don't know if Amanda will do anything to accelerate restores if it's reading from tape (or a non-seekable source in general). In theory it could at least stop after it's read everything necessary, and it wouldn't have to ship everything to the client.
This doesn't work for compressed tar backups for two reasons. The obvious reason is that Amanda doesn't have any index for where files are in the compressed version of the tar archive, so it can't skip to them and clip them out. The broader reason is that most compression formats don't normally allow you to seek arbitrarily in them, because compression (and decompression) rely on context, which is built and maintained by reading through all of the file or at least large blocks of data.
(Some compressors do, though; in a comment on the first entry, vasi pointed to his pixz, which can create indexed compressed tar archives. I've also found t2sz, which is an indexed zstd compressor that understands tar archives. We use zstd for our compressed backups, but I don't know if it would be particularly easy to wire this up to Amanda.)
2023-11-12
The Amanda backup system completely reads tar archives on restores
Amanda is the backup system that we use, and have used for years.
Strictly speaking, you could say that Amanda is a backup scheduling
and storage management system, in that the actual backups are made
by existing tools such as tar
, placing it on one side of the
divide between backup systems about how much they know about what
they're storing. In
practice this usually doesn't matter; you do backups through Amanda
and normally you do restores through Amanda as well, with Amanda
automatically running the underlying tools for you with the right
arguments.
(Although sometimes you wind up having to look at tar and the other underlying programs because of bugs.)
We have a few giant filesystems that we back up, such as our /var/mail filesystem. Backups of our /var/mail are no problem, especially since it's now on its own server, but recently we tried restoring an inbox from our Amanda backups for the first time in a while and found that it took much longer than we expected. Part of this was that even after the particular inbox we wanted had been extracted by amrecover, amrecover (and the backend Amanda daemons) kept running. What was going on is straightforward; Amanda reads all the way through your (tar) archive on any and all restores, even if you've already extracted what you're looking for. This means that the overall restore process doesn't end (and release the resources it's using) until the full archive is read, which may take a while if your archive is for a 1 TiByte filesystem.
Update: This only happens for some forms of Amanda tar-based backups, such as compressed tar dumps. With at least some uncompressed tar dumps (and storage mediums), Amanda will stop after it's restored your files and may be able to do this quite efficiently.
This isn't an Amanda bug. Instead it's more or less a limitation of Amanda's approach to backups, specifically of delegating the actual backup and restore process to other programs. Since Amanda has to delegate the restore process to tar, it has no idea of when tar is 'done'; all it can do is run tar until tar exits. GNU Tar itself has no feature for 'exit when you've restored everything listed on the command line as a selective restore', and it might sometime be difficult for even tar to know when it was done, depending on what you ask for (and how the tar archive is structured). Plus, if what you want is located toward the end of the tar archive, Amanda and tar have no choice but to read all the way through the archive to it, because neither of them know where anything is located in the archive.
(All of this is part of why some backup tools use their own custom backup formats. A backup system with a custom format can potentially jump to exactly the things you want to restore, pull them out, and know that it's done and can stop work now.)
One of the corollaries for this is that if you want fast recoveries with Amanda, or in general to speed up your recoveries, one of the things you need to look at is how fast you can read through your backup archives and how to speed that up. We've wound up doing some work on that recently (as a result of this slow recovery experience), but that's for another entry.
2023-11-09
Brief early impressions of Emacs' evil Vim emulation
Emacs has a third party package called evil (also, also) that is "an extensible vi layer for emacs". Faced with such a pitch I couldn't resist trying it out just to see it do Vim tricks, and then I experimented to see if it would be useful in one narrow specific situation for me, with inconclusive but educational results. The short summary is that evil is an impressively comprehensive vim emulation (it passes my vim noticeable features checks), but I apparently have deeply embedded Emacs reflexes that get in the way of using it.
Purely as a vim emulation, evil seems quite comprehensive in my limited testing, including features like reflowing paragraphs by piping them through !}fmt. It doesn't get quite everything, for example it doesn't have vim's arithmetic operations, but I'm relatively convinced I could edit in evil without really noticing that it wasn't vim. As someone who knows a bit about how Emacs works, this is somewhere between impressive and scary; there's a lot of hard work involved in making it work so well and for so much. In addition, some of the details are very nice; for example, under X evil will change how the cursor looks depending on the (vim) mode you're in.
I have a long history with Emacs and I'm quite comfortable in it, so I didn't have any interest in switching from normal Emacs bindings to emulated vim ones (and I would expect the combination to be kind of a mess for reasons beyond the scope of this entry). However, since more or less switching from exmh to MH-E in GNU Emacs for my GUI mail reading, I've wound up in a situation where I write most of my replies to email in GNU Emacs but compose most new messages using command line NMH and vim. This is a little bit of whiplash as I go back and forth, so I thought it might be nice to use evil to give the MH-E 'reply to mail' experience the same vim keybindings that I use when writing new email in vim.
What trying this taught me is that my Emacs reflexes are sharp and deeply sunk into me. It didn't matter to my reflexes that I was writing NMH email and up until now I've spent a lot of time doing that in vim and none in Emacs (well, not for a very long time). My reflexes 'knew' that this was Emacs and thus (for example) that I typed ESC < to get to the top of the file, not 1g. If I thought about it I could write email messages somewhat reflexively using vim keybindings and vim reflexes for things like reflowing paragraphs, but every so often I would slam into a wall when my Emacs reflexes triggered.
(It probably didn't help that actually sending the email required dropping back to Emacs reflexes for keybindings like C-c C-c, and also that flyspell would periodically pop up to mark words with squiggles that are never there in my vim sessions.)
I'm keeping evil installed, partly because I admire its sheer achievement and partly because someday I may find a situation where the vim keybindings are clearly more efficient for what I'm doing. It doesn't really hurt to have options, even if I don't expect to use it very often.
PS: If you didn't know Emacs but wanted to use it for an application like Magit, I imagine that using evil could be quite appealing. You'd get all of Magit's power for things like selective commits but you'd get to write commit messages and so on using your familiar vim reflexes, with only a few odd keybindings to invoke Magit or commit your commit messages.
2023-11-06
What client host keys OpenSSH ssh uses for host based authentication
One of the authentication options of OpenSSH, if it's enabled on
both the server and the client, is host based authentication using
the client's SSH host keys. On the client, this is controlled by
EnableSSHKeysign
and HostbasedAuthentication
; on the server,
by HostbasedAuthentication
and perhaps IgnoreRhosts
. Suppose,
not hypothetically, that you use this along with some personal SSH
keys, and some day you try to connect to some new system and get
rudely disconnected before you get prompted for a password. The
direct answer to what's happening is that you've run into the
server's limit on how many different authentication options it will
let you try (this can also come up if you have a lot of personal
keys), set by the server's MaxAuthTries
.
Further, when you run 'ssh -v' to see where all the identities are
coming from, a number of them are client SSH host keys, including
a type of host key that you don't even use (you've explicitly set
HostKey
in your sshd_config
to not include them).
Since you're running into limits on the number of identities you can offer, and some of them are from the host keys, it would be nice to control what client host keys you offer to the server. Or at least to understand where this is set and why, for example, 'ssh -v' says that you're offering a 'ecdsa-sha2-nistp256' host key when you don't even use ECDSA. Unfortunately, today all of this is underdocumented and it's somewhat hard to control, although not impossible.
Based on reading the source code, the decisions about what host keys to try to sign and where to find them are split between ssh-keysign and ssh itself, although in a confusing way. Both ssh and ssh-keysign look for the default, traditional names of the host keys (such as /etc/ssh/ssh_host_rsa_key) and determine what host keys to offer based on a combination of what files are present and what host key algorithms ssh likes. Ssh-keysign is ultimately responsible for reading the keys, so what it looks for and finds limits what ssh can ask for.
(Both ssh and ssh-keysign hard code these key paths, although the ssh-keysign source code has a comment that it should use your sshd_config to determine the key paths.)
There are two ways to control what host keys are ultimately used,
the brute force way and the nominally correct way. The brute force
way is to entirely remove (or rename) the host key files in /etc/ssh
for the host key types you don't use. For example, if you don't use
ECDSA keys and set HostKey
in your sshd_config to exclude
them, remove /etc/ssh/ssh_host_ecdsa_key* (which was probably
created automatically by, for example, the Ubuntu Linux installer).
The nominally correct way is to change what hostbased authentication algorithms ssh will try to use (well, ask ssh-keysign to use), which is controlled by the HostbasedAcceptedAlgorithms directive from ssh_config (in older OpenSSH versions, this is called HostbasedKeyTypes instead). You can set this to a restricted list, or you can use the OpenSSH '-' syntax to remove some algorithms, for example:
HostbasedAcceptedAlgorithms -ecdsa-sha2-nistp256
The host keys that your ssh will offer to the server for hostbased authentication are those keys that are both available under their standard names in /etc/ssh and allowed by HostbasedAcceptedAlgorithms. Note that some key types can be used by more than one algorithm; working out what algorithms you need to disallow is left as an exercise.
(As you'd expect, this doesn't affect what key algorithms can be used for personal keys. If you want to use personal ECDSA keypairs, you still can.)
Because ssh and ssh-keysign don't currently pay attention to the
ssh server configuration, you can play some tricks here that are
probably unwise by moving or renaming some or even all SSH host
keys that the server uses, specifying their new locations with
HostKey
sshd_config
directives. If you move the server's RSA key but not its ED25519
key, the server will identify itself with both but ssh on the server
will only try to use the ED25519 key to authenticate to other
servers. More extreme tricks are at least theoretically possible
but I'm not going to describe them here.
2023-11-05
Exim's options for how to DKIM sign various email headers
Recently, I became aware that Exim has a relatively aggressive
list of message headers to sign with DKIM, and as a result we somewhat
reduced the list of headers to sign in our environment. As it
happens, Exim has several options for how it signs headers, which
are briefly covered in the documentation for the dkim_sign_headers
setting in (DKIM) Signing outgoing messages.
The effects of these options aren't really clear until we understand
the various meanings of DKIM signing message headers.
When you list a header name in the dkim_sign_headers
setting, you
can list it just by name or with either a '+' or an '=' character in
front of it. Listing a header with no prefix has the same meaning as
listing that header in the 'h=' list of the DKIM-Signature header; each
time you list it, it signs one instance of the header in the message. If
the header isn't there (or you've listed it more times than there are
copies in the message), you're oversigning, which prevents adding an
extra copy. So if dkim_sign_headers
includes 'from' once, you're
signing the first instance of From: in the message.
A prefix character of '=' means to sign as many copies of the header as are present in the message. If you specify '=from' and the message has two From: headers, you're signing against both. I believe it also means that if a particular header isn't present, you won't include it at all in the DKIM-Signature headers, so that it can be added later by another party without breaking the DKIM signature. If you want to sign things like the List-* family or the Reset-* family of headers if they're present but allow them to be added later, '=' would be one way.
(Since dkim_sign_headers
is subject to Exim string expansion,
another way is to only conditionally include headers based on whether
or not the message has them.)
Finally, a prefix character of '+' means to oversign the header; Exim will sign as many copies of the header as are present plus one extra, so that no new additional copies can be added later (without breaking the DKIM signature). In the common case this will generate two mentions of the header in DKIM-Signature:, because the header will only occur once in the message being signed. If you use '+' on headers that aren't present in the message, one mention of them will appear in DKIM-Signature, signing that they're not present.
Exim has no prefix character option to sign only the first instance of a header if it's present at all, although you can do that with conditional inclusion of a plain header name. In my view this wouldn't be a useful option; if multiple copies of a particular header are already present, you should either sign them all, which is covered by '=<header>', or reject the message as too suspicious (or at least not DKIM sign it, or perhaps have Exim remove the additional instances of the header).
Exim also has no prefix character option to oversign existing headers but not sign non-present headers at all (although you can do this with conditional inclusion of a '+' prefixed header name). This would let you seal List-* or Resent-* headers if they were present, but allow them to be added by someone else later if they weren't. I think this is a sensible thing to do with headers and DKIM signatures, but I could be wrong.
You can get the default value of dkim_sign_headers
with 'exim4
-bP macro _DKIM_SIGN_HEADERS
'; this (currently) uses the plain
header name format and lists each header once, meaning that Exim
by default signs the first instance of headers that are present and
the absence of headers that aren't present. There's also a
_DKIM_OVERSIGN_HEADERS
variant that puts a '+' in front of every
header name, which signs all present instances of each header name
and makes it so that no new ones can be added.
As a general thing, if you're presented with a message to sign that has multiple From:, Subject:, or whatever headers, I don't know if it's better to sign all of the instances of these headers, which might look suspicious to receivers, or sign only the DKIM first instance and let receivers assume the extra instances were added later, outside of your control. Possibly the best answer is to have Exim remove the extra instances.
2023-11-03
Our varying levels of what you could charitably call 'physical security'
As I mentioned way back when I discussed how rogue wireless access points are a bigger risk at universities, one of the unusual things about universities is that we usually don't have anywhere near as much physical security as, say, a typical company does. This is because in practice most university buildings are open to the public, where anyone can walk in the front door (or any of the generally many side doors) and wander through most or all of the halls. This is especially so for the University of Toronto's main campus, which is embedded in the middle of downtown Toronto with Toronto streets running right through it. This doesn't mean we have no physical access control at all; instead, in practice we have a sliding scale of physical security and thus how exposed our networks are.
General purpose hallways and corridors have functionally no access control. Any networks that are available there, either through wireless signals or through stray network jacks, are fully accessible to potential attackers and have to be assumed to be untrusted. You might think that no one would ever put network jacks out in a hallway, but these days a surprising number of things like display screens need a network connection, and often it's desirable to have them out where the public can see them.
Some spaces are behind doors but the doors are normally left open (or at least unlocked), generally with some administrative staff person there to notice and help people who walk in. Anyone with a reasonably good story could probably get some quiet access to network ports exposed in these areas, and of course you could get at localized wireless networks just by having a device in your back pocket as you innocently ask questions and then thank the nice staff person for their help. We also have various meeting rooms, break rooms, and lounges; when not in use these are sometimes closed and sometimes left open, depending on random factors.
Rooms used for graduate student desks are normally behind closed doors (assuming the doors haven't been left open), but some of them are large and highly populated by a varied group of graduate students. In practice you could probably walk in or talk your way in, although there would be some risk of people eyeing you dubiously. Some groups have small areas with a small number of people, where everyone definitely knows each other and new, strange people will probably get at least some questioning; an attacker would need at least social engineering, rather than merely walking in somehow. Some areas are hybrids; we have at least one where a door lets you in to a corridor of mixed space, with a meeting room, a break room, and assorted, generally open-door small graduate student offices for graduate students from various groups, each with a number of people.
(So as an unsurprising broad generalization, the smaller the area involved and the fewer people who work in it, the more physically secure it will probably be in practice.)
Various of our networks run through all of these sorts of spaces, to greater or lesser degrees, and all of this affects what internal network authentication we need. If an internal network is only available in a small area that has good access control as a result of that, it can be relatively open; if we need to track down a responsible person for some device, it's probably not going to be hard. On the other hand, if an internal network is broadly available through all sorts of our space across multiple buildings, including in large and relatively uncontrolled rooms, then as a practical matter we'd better be able to track each device on it back to a person from our own data. Otherwise, at the very least we're doing a lot of hunting simply to find where the thing is.
(This comes up every so often when unfortunate network connection related mistakes are made.)
The ultimate version of 'broad and open access' is our wireless network, since it extends out into the hallways where anyone can be. You have to know the wireless password, but given that we have a large number of people using it, we assume that the password has leaked long ago and can be found if an attacker looks hard enough.
PS: Some of these physical security oddities in our environment are because different professors and groups have different opinions on how open or closed off they want to be. It's much easier for people outside your group to come by and interact with you if your group space has open doors than if they have to knock or wave to get someone's attention. Some groups have historically wanted a very 'open door' policy because they want to cross-connect, and there are professors who absolutely don't want to be stuck behind closed doors. This has unquestionably influenced the general layout of our space and things like how many general use corridors run through it.
2023-10-28
Thinking about the sensible limits of customization of things
Recently, for reasons beyond the scope of this entry, I've been mostly handling my email in GNU Emacs, with MH-E. GNU Emacs is famously flexible and customizable, mostly through the somewhat challenging method of 'merely' writing the relevant (Emacs) Lisp code to do what you want. I'm capable of writing Emacs Lisp, so armed with a hammer and using a new mail client, I have been finding plenty of things to use that hammer on (sometimes with hackery and Emacs crimes). At this point, I've accumulated something like 1500 lines worth of customization (although that includes a lot of comments), and I can think of more things that I might do.
(Some of this customization is because MH-E doesn't quite do what I want it to do, sometimes it's because I want a more convenient way to do what it already can do inconveniently, and sometimes it's because I'm trying to recreate features I'm used to in exmh.)
Several years ago I read Tobias Bernard's Doing Things That Scale. To summarize the article badly, Tobias Bernard talked about how they had moved away from customizing things (and then having to maintain the customizations) in favour of doing generally useful changes to upstreams. This argument gave me tangled feelings that I've yet to sort out enough to write an entry here about, but part of it is certainly on point. Every customization I make to my Emacs and MH-E setup is effort to write, effort to remember, and effort to maintain. Do I actually need an ultra-hyper-customized MH-E environment? Am I going to even use all of these hacks I've done? Both clothes and mail clients can be comfortable and useful without being form fitting.
Of course, this is a personal version of something that I run into all of the time professionally, as a system administrator. There's a lot of custom scripts, custom Prometheus metrics exporters, custom web things, custom mail system features, and so on that we could write and put into production, but all of them have both a cost and a benefit, and sometimes the costs are not worth the benefits. Just because we can do something doesn't mean that we should. At the same time, sometimes we should, even if the customization is large and thus the costs are significant.
(For example, we have a bespoke custom ZFS spares system, which was a chunk of work to write and tune, but on the other hand has been rock solid and lets us basically not worry about disk problems. And our very custom simple mailing list system is quite appreciated by our users, although the Exim configuration involved is surprisingly complex.)
I don't have any particular answers, either for my sysadmin work or for my GNU Emacs hacking. But I do want to think about the question at least a bit, even (especially) for my own GNU Emacs coding. Just because something is an interesting Emacs Lisp problem doesn't mean that I should solve it, especially if I'm currently immersed in solving Emacs Lisp problems. And just because I can tweak and customize something doesn't mean that I should.
(Solving Emacs Lisp problems is a little bit addicting for me, much like any programming. Sometimes I have to pull myself away from the urge to do a bit more and then a bit more and so on. Hopefully this urge will pass.)