Wandering Thoughts

2017-10-17

My current grumpy view on key generation for hardware crypto keys

I tweeted:

My lesson learned from the Infineon HSM issue is to never trust a HSM to generate keys, just to store them. Generate keys on a real machine.

In my usual manner, this is perhaps overstated for Twitter. So let's elaborate on it a bit, starting with the background.

When I first heard about the Infineon TPM key generation issue (see also the technical blog article), I wasn't very concerned, since we don't have sophisticated crypto smartcards or electronic ID cards or the like. Then I found out that some Yubikeys are affected and got grumpy. When I set up SSH keys on my Yubikey 4, I had the Yubikey itself generate the RSA key involved. After all, why not? That way the key was never exposed on my Linux machine, even if the practical risks were very low. Unfortunately, this Infineon issue now shows the problem in that approach.

In theory, a hardware key like the Yubikey is a highly secure physical object that just works. In practice they are little chunks of inexpensive hardware that run some software, and there's nothing magical about that software; like all software, it's subject to bugs and oversights. This means that in practice, there is a tradeoff about where you generate your keys. If you generate them inside the HSM instead of on your machine, you don't have to worry about your machine being compromised or the quality of your software, but you do have to worry about the quality of the HSM's software (and related to that, the quality of the random numbers that the HSM can generate).

(Another way to put this is that a HSM is just a little computer that you can't get at, running its own collection of software on some hardware that's often pretty tiny and limited.)

As a practical matter, the software I'd use for key generation on my Linux machine is far more scrutinized (especially these days) and thus almost certainly much more trustworthy than the opaque proprietary software inside a HSM. The same is true for /dev/urandom on a physical Linux machine such as a desktop or a laptop. It's possible that a HSM could do a better job on both fronts, but it's extremely likely that my Linux machine is good enough on both. That leaves machine compromise, which is a very low probability issue for most people. And if you're a bit worried, there are also mitigation strategies for the cautious, starting with disconnecting from the network, turning off swap, generating keys into a tmpfs, and then rebooting your machine afterward.

Once upon a time (only a year ago), I thought that the balance of risks made it perfectly okay to generate RSA keys in the Yubikey HSM. It turns out that I was wrong in practice, and now I believe that I was wrong in general for me and most people. I now feel that the balance of risks strongly favour trusting the HSM more or less as little as possible, which means only trusting it to hold keys securely and perhaps limit their use to only when the HSM is unlocked or the key usage is approved.

(This is actually giving past me too much credit. Past me didn't even think about the risk that the Yubikey software could have bugs; past me just assumed that of course it didn't and therefor was axiomatically better than generating keys on the local machine and moving them into the HSM. After all, who would sell a HSM that didn't have very carefully audited and checked software? I really should have known better, because the answer is 'nearly everyone'.)

PS: If you have a compliance mandate that keys can never be created on a general-purpose machine in any situation where they might make it to the outside world, you have two solutions (at least). One of them involves hope and then perhaps strong failure, as here with Infineon, and one of them involves a bunch of work, some persuasion, and perhaps physically destroying some hardware afterward if you're really cautious.

KeyGenerationAndHSMs written at 00:17:55; Add Comment

2017-10-14

A surprise about which of our machines has the highest disk write volume

Once upon a time, hard drives and SSDs just had time-based warranties. These days, many SSDs have warranties that are more like cars; they're good for so much time or so many terabytes written, whichever comes first, and different SSD makers and models can have decidedly different maximum figures for this. So, as part of investigating what SSDs to get for future usage here, we've been looking into what sort of write volume we see on both our ZFS fileservers (well, on the iSCSI backends for them) and on the system SSDs of those of our Ubuntu servers that have them. The result was a bit surprising.

Before I started looking into this, I probably would have guessed that the highest write volume would be either for the SSDs of the ZFS pool that holds our /var/mail filesystem. I might have also guessed that perhaps some of the oldest disks for ZFS pools on our most active fileserver might be pretty active. While both of these are up in the write volume rankings, neither has our highest write volume.

Our highest write volume turns out to happen on the system SSDs in our central mail machine; they see about 32 TB of writes a year, compared to about 23 TB of writes a year on the busiest iSCSI backend disks on our most active fileserver. The oldest and most active SSDs involved in the mail spool have seen only about 10 TB of writes a year, which is actually below many of our more active ZFS pool disks (on several fileservers). The central mail machine's IO activity is also heavily unbalanced in favour of writes; with some hand-waving about the numbers, the machine runs about 80% writes (by the amount of data involved) or more. The disks in the ZFS pools show much lower write to read ratios; an extreme case is the mail spool's disks, which see only 12% writes by IO volume.

My current theory is that this huge write volume is because Exim does a lot of small writes to things like message files and log files and then fsync()'s them out to disk all the time. Exim uses three files for each message and updates two of them frequently as message deliveries happen; updates almost certainly involve fsync(), and then on top of that the filesystem is busy making all the necessary file creations, renames, and deletions be durable. We're using ext4, but even there the journal has to be forced to disk at every step.

(This certainly seems to be something involving Exim, as our external mail gateway has the same highly unbalanced writes to reads ratio. The gateway is doing roughly 4 TB of writes a year, but that's still quite high for our Ubuntu system SSDs.)

PS: All of these figures for SSDs are before any internal write amplification that the SSD itself does. My understanding is that SSD warranty figures are quoted before write amplification, as the user-written write volume.

MTAHighWriteVolume written at 03:17:09; Add Comment

2017-09-20

Wireless is now critical (network) infrastructure

When I moved over to here a decade or so ago, we (the department) had a wireless network that was more or less glued together out of spare parts. One reason for this, beyond simply money, is that wireless networking was seen as a nice extra for us to offer to our users and thus not something we could justify spending a lot on. If we had to prioritize (and we did), wired networking was much higher up the heap than wireless. Wired networking was essential; the wireless was merely nice to have and offer.

I knew that wireless usage had grown and grown since then, of course; anyone who vaguely pays attention knows that, and the campus wireless people periodically share eye-opening statistics on how many active devices there are. You see tablets and smartphones all over (and I even have one of my own these days, giving me a direct exposure), and people certainly like using their laptops with wifi (even in odd places, although our machine room no longer has wireless access). But I hadn't really thought about the modern state of wireless until I got a Dell XPS 13 laptop recently and then the campus wireless networking infrastructure had some issues.

You see, the Dell XPS 13 has no onboard Ethernet, and it's not at all alone in that; most modern ultrabooks don't, for example. Tablets are obviously Ethernet-free, and any number of people around here use one as a major part of their working environment. People are even actively working through their phones. If the wireless network stops working, all of these people are up a creek and their work grinds to a halt. All of this has quietly turned wireless networking into relatively critical infrastructure. Fortunately our departmental wireless network is in much better shape now than it used to be, partly because we outsourced almost all of it to the university IT people who run the campus wireless network.

(We got USB Ethernet dongles for our recent laptops, but we're sysadmins with unusual needs, including plugging into random networks in order to diagnose problems. Not everyone with a modern laptop is going to bother, and not everyone who gets one is going to carry it around or remember where they put it or so on.)

This isn't a novel observation but it's something that's snuck up on me and before now has only been kind of an intellectual awareness. It wasn't really visceral until I took the XPS 13 out of the box and got to see the absence of an Ethernet port in person.

(The USB Ethernet dongle works perfectly well but it doesn't feel the same, partly because it's not a permanently attached part of the machine that is always there, the way the onboard wifi is.)

WirelessCriticalInfrastructure written at 01:22:01; Add Comment

2017-09-15

Ignoring the domain when authenticating your Dovecot users

In this recent entry, I wrote about how some of our users periodically try to authenticate to our Dovecot IMAP server with the user name '<user>@<our domain>' instead of simply '<user>', and said that we were probably going to switch our Dovecot configuration to ignore the domain name. We've now done that so here is an early experience report.

Dovecot is somewhat underdocumented, at least online and in manual pages. Your best source of information for what specific configuration settings do appears to be the various pieces of the example configuration in the source code, which have comments. Quite possibly your OS's packaging of Dovecot reuses these as the standard configuration files, so you can just read the documentation comments there. The comments in the authentication configuration file explains things this way:

# Username formatting before it's looked up from
# databases. You can use the standard variables here,
# eg. %Lu would lowercase the username, %n would drop away
# the domain if it was given, or "%n-AT-%d" would change
# the '@' into "-AT-". This translation is done after
# auth_username_translation changes.
#auth_username_format = %Lu

If you're willing to ignore all domains, so that '<user>@<random garble>' is treated as '<user>', then you can simply set this to:

auth_username_format = %Ln

This is what we did and it works. In current versions of Dovecot this changes Dovecot's view of the username for everything, not just for authentication. And by this I mean that we have some Dovecot settings that use '%u', and after this auth_username_format change they see the username as '<user>', not '<user>@<domain>'. It also changes what Dovecot shows as the username in log messages, stripping out any domain that was originally there. For our purposes this is what we want with only the minor downside of the log message change.

(For a concrete example, we have set mail_location to something that specifies '...:INDEX=/var/local/dovecot/%u' in order to keep Dovecot indexes on the IMAP server instead of on NFS. If you log in as '<user>@<domain>', your index files continue to use just /var/local/dovecot/<user>'.)

Based on what I've read, the Dovecot people are aware of this but don't consider it a bug as such, although they've considered changing it someday. Personally I hope that they don't, or if they do they provide a username_format setting to do this global username change.

If you want to strip only a single domain but leave other domains untouched, so that '<user>@<your domain>' becomes '<user>' but '<user>@<random thing>' stays unchanged, I think that you can do it with a conditional variable expansion. The Dovecot documentation says that conditional expansions can be nested, so you could do this for multiple domains if you were sufficiently determined.

I can see points for either side of being selective here. On the one hand, being selective doesn't help your own users as far as I know, because I believe that regardless of whether they use the wrong domain or the wrong password (or the wrong login), the only error they'll ever get from Dovecot is 'authentication not accepted' (aka 'bad password'). On the other hand, not altering completely wrong domain names means that they will appear in Dovecot's logs intact, so that you can spot people that are trying to use them. If using certain domains are a sign of attackers, preserving these in logs may be valuable.

(My experience from looking at our Dovecot logs was that attackers always tried to use the same domain name that our users did, which is not really surprising. Attackers tried them much more than users did, but that's not much help here.)

DovecotIgnoreDomainOnAuth written at 23:52:11; Add Comment

2017-09-11

Giving users what they want and expect, IMAP edition

Like many places, we have an IMAP server (currently we're using Dovecot). This IMAP server is part of our overall Unix authentication environment, so users log in to their IMAP mailboxes using their Unix login and password (well, their general login name and password, since it's also used for Samba access, authentication with several web servers, and so on). More specifically, users log in to our IMAP server using just their Unix login; they do not use '<user>@<our domain>', as apparently is common at many places.

Well, that's the theory. In practice we get a trickle of users who try to use '<user>@<our domain>' with our IMAP server and then plaintively report problems, because from Dovecot's perspective (or more exactly PAM's), there's no such login 'whoever@us'. We had another one recently, and when I saw the email my first reaction was to think that our support site needed to have a clear 'don't do it this way' note about '<user>@<our domain>', since it keeps coming up. Then I took a step back and reminded myself that users are right, and specifically that if users are doing something naturally and on their own, user education almost never works very well. If some number of users were going to keep putting in our domain on their IMAP login credentials, the best and most friendly thing to do would be to quietly accept it anyway, even if it's not technically correct.

(In a sense refusing '<login>@<our domain>' is robot logic. We know (or can know) what was intended and it's completely unambiguous, it's just that our software doesn't recognize it.)

As far as I can see Dovecot doesn't give you any straightforward way to strip off a specific domain from the login name that users give you. However it does appear to be able to strip all domain names off the login name, so you'll accept '<valid login>@<any random thing>', not just '<login>@<your domain>' (this is allegedly done by auth_username_format to, eg %Ln). In our environment it's probably okay to be so broadly accepting, so I'm going to propose this to my co-workers and then perhaps we'll deal with the whole issue once and for all.

(We should still update our documentation to be explicit here, but once we accept everything that documentation is just a backup. It's not crucial that people actually read it and notice that bit; as long as they're close, their IMAP access will work.)

IMAPAuthAcceptingDomain written at 01:42:50; Add Comment

2017-08-28

OpenSSH has an annoyingly misleading sshd error log message

Suppose that you are trying to get some form of SSH server authentication going that uses hostnames. This might be straightforward host-based authentication to enable root on a master machine to have access to subordinate machines, or it might be hostname-based Match statements to enable or disable some restrictions (as one other example where you can use hostnames). However, it's not working; when you try host-based authentication, for example, you get a log message on the target machine saying something like:

sshd[2540]: userauth_hostbased mismatch: client sends HOSTNAME, but we resolve 128.100.X.Y to 128.100.X.Y

This certainly looks like a straightforward problem where the target machine can't do reverse lookups to turn 128.100.X.Y into a hostname (or perhaps it can't do forward lookups to verify that hostname, although we might expect a slightly different error there). And in fact you can get this message logged when, for example, there is no DNS PTR record for 128.100.X.Y.

But there is another important case where you will get this message, and that is this message is also logged when OpenSSH sshd is not even trying to turn IPs into hostnames. To quote straight from sshd_config(5):

UseDNS
Specifies whether sshd(8) should look up the remote host name, and to check that the resolved host name for the remote IP address maps back to the very same IP address.

If this option is set to no (the default) then only addresses and not host names may be used in ~/.ssh/authorized_keys from and sshd_config Match Host directives.

The name UseDNS is somewhat misleading, because as the manual page correctly describes having this off turns off all forms of looking up the remote host name, not just DNS. For example, if you have backup entries in /etc/hosts, they're not checked either.

There's a bit of history that may catch you out here, which is that the OpenSSH default for UseDNS changed in the relatively recent past. According to the OpenSSH release notes, UseDNS defaulted to yes before OpenSSH 6.8; 6.8 changed the default to no when it was released in early 2015. March 2015 might seem like a long time ago, but Ubuntu 14.04, CentOS 7 (and CentOS 6), and Debian Jessie (the current 'oldstable') are all old enough that they have a pre-6.8 version of OpenSSH. This means that if, say, you moved from Ubuntu 14.04 to Ubuntu 16.04 and you use the stock sshd_config on both systems, you've had UseDNS's value change on you such that your 16.04 systems won't accept hostname-based authentication when your 14.04 systems do.

(FreeBSD 9.3 also has a pre 6.8 OpenSSH, in case you're running a machine that old.)

I can guess how this error message came about; I suspect that when UseDNS is off, the code simply skips trying to resolve the IP and basically acts as if the IP to name resolution failed. However the net effect of not differentiating the two cases is that sshd emits a misleading error message that can lead you on a significant wild goose chase as you try to figure out why OpenSSH is failing to turn your IP addresses into names. Life would probably be a lot simpler if OpenSSH logged a separate message to the effect of 'ignoring client-sent hostname and using only IP address X because UseDNS is off'.

(There's some question about when sshd should log this message. The ideal case would be that it would get logged as an error if UseDNS was off and sshd detected that you were trying to use hostname-based authentication or matching operators. This is definitely a real error that's worth reporting, because sshd knows that you're trying to do something that can never succeed.)

OpenSSHUseDNSErrorAnnoyance written at 22:49:44; Add Comment

2017-08-08

We care more about long term security updates than full long term support

We like running so-called 'LTS' (Long Term Support) releases of any OS that we use, and more broadly of any software that we care about, because using LTS releases allows us to keep running the same version for a fairly long time. This is generally due to pragmatics on two levels. First, testing and preparing a significant OS upgrade simply takes time and there's only so much time available. Second, upgrades generally represent some amount of increased risk over our existing environment. If our existing environment is working, why would we want to mess with that?

(Note that our general environment is somewhat unusual. There are plenty of places where you simply can't stick with kernels and software that is more than a bit old, for various reasons.)

But the general idea of 'LTS' is a big tent and it can cover many things (as I sort of talked about in an entry on what supporting a production OS means to me). As I've wound up mentioning in passing recently (eg here), the thing that we care about most is security updates. Sure, we'd like to get our bugs fixed too, but we consider this less crucial for at least two reasons.

First and most importantly, we can reasonably hope to not hit any important bugs once we've tested an OS release (or at least had it in production for an initial period), so if things run okay now they'll keep running decently into the future even if we do nothing to them. This is very much not true of security problems, for obvious reasons; to put it one way, attackers hit your security exposures for you and there's not necessarily much you can do to stop them short of updating. Running an OS without current security updates is getting close to being actively dangerous; running without the possibility of bug fixes is usually merely inconvenient at most.

(There can be data loss bugs that will shift the calculations here, but we can hope that they're far less common than security issues.)

Second, I have to admit that we're making a virtue of more or less necessity, because we generally can't actually get general updates and bug fixes in the first place. For one big and quite relevant example, Ubuntu appears to fix only unusually egregious bugs in their LTS releases. If you're affected by mere ordinary bugs and issues, you're stuck. This is one of the tradeoffs you get to make with Ubuntu LTS releases; you trade off a large package set for effectively only getting security updates (and it has been this way for a long time). More broadly, no LTS vendor promises to fix every bug that every user finds, only the sufficiently severe and widely experienced ones. So just because we run into a bug doesn't mean that it's going to get fixed; it may well not be significant enough to be worth the engineering effort and risk of an update on the vendor's part.

(There is also the issue that if we hit a high-impact bug, we can't wait for a fix to be developed upstream and slowly pushed down to us. If we have systems falling over, we need to solve our problems now, in whatever way that takes. Sometimes LTS support can come through with a prompt fix, but more often than not you're going to be waiting too long.)

LongtermSecurityVersusSupport written at 01:27:28; Add Comment

2017-08-06

Our decision to restrict what we use for developing internal tools

A few years ago, we (my small sysadmin group) hit a trigger point where we realized that we were writing internal sysadmin tools (including web things) in a steadily increasing collection of programming languages, packages, and environments for doing things like web pages and apps. This was fine individually but created a collective problem, because in theory we want everyone to be able to at least start to support and troubleshoot everything we have running. The more languages and environments we use across all of our tools, the harder this gets. As things escalated and got more exotic, my co-workers objected quite strongly and, well, they're right.

The result of this was that we decided to standardize on using only a few languages and environments for our internally developed tools, web things, and so on. Our specific choices are not necessarily the most ideal choices and they're certainly a product of our environment, both in what people already knew and what we already had things written in. For instance, given that I've written a lot of tools in Python, it would have been relatively odd to exclude it from our list.

Since the whole goal of this is to make sure that co-workers don't need to learn tons of things to work on your code, we're de facto limiting not just the basic choice of language but also what additional packages, libraries, and so on you use with it. If I load my Python code down with extensive use of additional modules, web frameworks, and so on, it's not enough for my co-workers to just know Python; I've also forced them to learn all those packages. Similar things hold true for any language, including (and especially) shell scripts. Of course sometimes you absolutely need additional packages (eg), but if we don't absolutely need it our goal is to stick to doing things with only core stuff even if the result is a bit longer and more tedious.

(It doesn't really matter if these additional modules are locally developed or come from the outside world. If anything, outside modules are likely to be better documented and better supported than ones I write myself. Sadly this means that the Python module I put together myself to do simple HTML stuff is now off the table for future CGI programs.)

I don't regret our overall decision and I think it was the right choice. I had already been asking my co-workers if they were happy with me using various things, eg Go, and I think that the tradeoffs we're making here are both sensible and necessary. To the extent that I regret anything, I mildly regret that I've not yet been able to talk my co-workers into adding Go to the mix.

(Go has become sort of a running joke among us, and I recently got to cheerfully tell one of my co-workers that I had lured him into using and even praising my call program for some network bandwidth testing.)

Note that this is, as mentioned, just for my small group of sysadmins, what we call Core in our support model. The department as a whole has all sorts of software and tools in all sorts of languages and environments, and as far as I know there has been no department-wide effort to standardize on a subset there. My perception is that part of this is that the department as a whole does not have the cross-support issue we do in Core. Certainly we're not called on to support other people's applications; that's not part of our sysadmin environment.

Sidebar: What we've picked

We may have recorded our specific decision somewhere, but if so I can't find it right now. So off the top of my head, we picked more or less:

  • Shell scripts for command line tools, simple 'display some information' CGIs, and the like, provided that they are not too complex.
  • Python for command line tools.
  • Python with standard library modules for moderately complicated CGIs.
  • Python with Django for complicated web apps such as our account request system.

  • Writing something in C is okay for things that can't be in an interpreted language, for instance because they have to be setuid.

We aren't actively rewriting existing things that go outside this, for various reasons. Well, at least if they don't need any maintenance, which they mostly don't.

(We have a few PHP things that I don't think anyone is all that interested in rewriting in Python plus Django.)

LimitingToolDevChoices written at 02:03:10; Add Comment

2017-07-29

The differences between how SFTP and scp work in OpenSSH

Although I normally only use scp, I'm periodically reminded that OpenSSH actually supports two file transfer mechanisms, because there's also SFTP. If you are someone like me, you may eventually wind up wondering if these two ways of transferring files with (Open)SSH fundamentally work in the same way, or if there is some real difference between them.

I will skip to the end: sort of yes and sort of no. As usually configured, scp and SFTP wind up working in the same way on the server side but they get there through different fundamental mechanisms in the SSH protocol and thus they take somewhat different paths in OpenSSH. What happens when you use scp is simpler to explain, so let's start there.

How scp works is the same as how rsync does. When you do 'scp file apps0:/tmp/', scp uses ssh to connect to the remote host and run a copy of scp with a special undocumented flag that means 'the other end of this conversation is another scp, talk to it with your protocol'. You can see this in ps output on your machine, where it will look something like this:

/usr/bin/ssh -x -oForwardAgent=no -oPermitLocalCommand=no -oClearAllForwardings=yes -- apps0 scp -t /tmp/

(This is how the traditional BSD rcp command works under the hood, and the HISTORY section of the scp manpage says that scp was originally based on rcp.)

By contrast, how SFTP works is that it is what is called a SSH subsystem, which is a specific part of the SSH connection protocol. More specifically it is the "sftp" subsystem, for which there is actually a draft protocol specification (with OpenSSH extensions). Since the client explicitly asks the server for SFTP (instead of just saying 'please run this random Unix command'), the server knows what is going on and can implement support for its end of the SFTP protocol in whatever way it wants to.

As it happens, the normal OpenSSH server configuration implements the "sftp" subsystem by running sftp-server (this is configured in /etc/ssh/sshd_config). For various reasons it does so via your login shell, so if you peek at your server's process list while you're running a sftp session, it will look like this:

 cks   25346 sshd: cks@notty
  |    25347 sh -c /usr/lib/openssh/sftp-server
   |   25348 /usr/lib/openssh/sftp-server

On your local machine, the OpenSSH sftp command doesn't bother to have its own implementation of the SSH protocol and so on; instead it runs ssh in a special mode to invoke a remote subsystem instead of a remote command:

/usr/bin/ssh -oForwardX11 no -oForwardAgent no -oPermitLocalCommand no -oClearAllForwardings yes -oProtocol 2 -s -- apps0 sftp

However, this is not a universal property of SFTP client programs. A SFTP client program may embed its own implementation of SSH, and this implementation may support different key exchange methods, ciphers, and authentication methods than the user's regular full SSH does.

(We ran into a case recently where a user had a SFTP client that only supported weak Diffie-Hellman key exchange methods that modern versions of OpenSSH sshd don't normally support. The user's regular SSH client worked fine.)

So in the end, scp and SFTP both wind up running magic server programs under shells on the SSH server, and they both run ssh on the client. They give slightly different arguments to ssh and obviously run different programs (with different arguments) on the server. However, SFTP makes it more straightforward for the server to implement things differently because the client explicitly asks 'please talk this documented protocol with me'; unlike with scp, it is the server that decides to implement the protocol by running 'sh -c sftp-server'. OpenSSH sshd has an option to implement SFTP internally, and you could easily write an alternate SSH daemon that handled SFTP in a different way.

It's theoretically possible to handle scp in a different way in your SSH server, but you would have to recognize scp by knowing that a request to run the command 'scp -t <something>' was special. This is not unheard of; git operates over SSH by running internal git commands on the remote end (cf), and so if you want to provide remote git repositories without exposing full Unix accounts you're going to have to interpret requests for those commands and do special things. Github does this along with other magic, especially since everyone uses the same remote SSH login (that being git@github.com).

SSHHowScpAndSFTPWork written at 01:06:13; Add Comment

2017-07-28

Our (Unix) staff groups problem

Last week, I tweeted:

The sysadmin's lament: I swear, I thought this was a small rock when I started turning it over.

As you might suspect, there is a story here.

Our central Unix systems have a lot of what I called system continuity; the lineage of some elements of what we have today traces back more than 25 years (much like our machine room). One such element is our logins and groups, because if you're going to run Unix for a long time with system continuity, those are a core part of it.

(Actual UIDs and GIDs can get renumbered, although it takes work, but people know their login names. Forcing changes there is a big continuity break, and it usually has no point anyway if you're going to keep on running Unix.)

Most people on our current Unix systems have a fairly straightforward Unix group setup for various reasons. The big exception is what can broadly be called 'system staff', where we have steadily accumulated more and more groups over time. Our extended system continuity means that we have lost track of the (theoretical) purpose of many groups, which have often wound up with odd group membership as well. Will something break if we add or remove some people from a group that looks almost right for what we need now? We don't know, so we make a new group; it's faster and simpler than trying to sort things out. Or in short, we've wound up with expensive group names.

This was the apparently small rock that I was turning over last week. The exact sequence is beyond the scope of this entry, but it started with 'what group should we put this person in now', escalated to 'this is a mess, let's reform things and drop some of these obsolete groups', and then I found myself writing long worklog messages about peculiar groups I'd found lurking in our Apache access permissions, groups that actually seemed to duplicate other groups we'd created later.

(Of course we use Unix groups in Apache access permissions. And in Samba shares. And in CUPS print permissions. And our password distribution system. And probably other places I haven't found yet.)

OurStaffGroupsProblem written at 02:49:44; Add Comment

(Previous 10 or go back to July 2017 at 2017/07/26)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.