Wandering Thoughts

2017-10-21

Multi-Unix environments are less and less common now

For a long time, the Unix environments that I existed in had a lot of diversity. There was a diversity of versions of Unix and with them a diversity of architectures (and sometimes a single vendor had multiple architectures). This was most pronounced in a number of places here that used NFS heavily, where your $HOME could be shared between several different Unixes and architectures, but even with an unshared $HOME I did things like try to keep common dotfiles. And that era left its mark on Unix itself, for example in what is now the more or less standard split between /usr/share and /usr/lib and friends. Distinguishing between 'shared between architectures' and 'specific to a single architecture' only makes sense when you might have more than one in the same large-scale environment, and this is what /usr/share is about.

As you may have noticed, such Unix environments are increasingly uncommon now, for a number of reasons. For a start, the number of interesting computer architectures for Unix has shrunk dramatically; almost no one cares about anything other than 64-bit x86 now (although ARM is still waiting in the wings). This spills through to Unix versions, since generally all 64-bit x86 hardware will run your choice of Unix. The days when you might have bought a fire-breathing MIPS SMP server for compute work and got SGI Irix with it are long over.

(Buying either the cheapest Unix servers or the fastest affordable ones was one of the ways that multiple Unixes tended to show up around here, at least, because which Unix vendor was on top in either category tended to keep changing over the years.)

With no hardware to force you to pick some specific Unix, there's a strong motivation to standardize on one Unix that runs on all of your general-usage hardware, whatever that is. Even if you have a NFS-mounted $HOME, this means you only deal with one set of personal binaries and so on in a homogenous environment. Different versions of the same Unix count as a 'big difference' these days.

Beyond that, the fact is that Unixes are pretty similar from a user perspective these days. There once was a day when Unixes were very different, which meant that you might need to do a lot of work to deal with those differences. These days most Unixes feels more or less the same once you have your $PATH set up, partly because in many cases they're using the same shells and other programs (Bash, for example, as a user shell). The exceptions tend to make people grumpy and often to cause heartburn (and people avoid heartburn). The result may technically be a multi-Unix environment, but it doesn't feel like it and you might not really notice it.

(With all of this said, I'm sure that there are still multi-Unix environments out there, and some of them are probably still big. There's also the somewhat tricky issue of people who work with Macs as their developer machines and deploy to non-MacOS Unix servers. My impression as a distant bystander is that MacOS takes a fair amount of work to get set up with a productive and modern set of Unix tools, and you have to resort to some third party setup to do it; the result is inevitably a different feel than you get on a non-MacOS server.)

unix/MultiUnixEnvNowUncommon written at 01:20:40; Add Comment

2017-10-20

Ext4, SSDs, and RAID stripe parameters

I was recently reading Testing disks: Lessons from our odyssey selecting replacement SSDs (via). In this article, the BBC technical people talk earnestly about carefully picking stride and stripe width size values for ext4 on their SSDs and point to this blog post on it. Me being me, I immediately wondered what effects these RAID-related settings actually had in ext4, so I headed off for the kernel source code to take a look. The short spoiler is 'not as much as you think'.

First, setting both the stripe size and the stride width is redundant as far as the kernel's ext4 block allocation goes; the kernel code only uses one of the two, preferring the stripe width if possible (see ext4_get_stripe_size in fs/ext4/super.c). Setting the stride as well does have a small effect on the layout of an ext4 filesystem; it appears to cause some metadata structures to be pushed up to start on a stride boundary when mke2fs creates the filesystem.

(In the kernel, the stripe width and stride are ignored if they're larger than the number of blocks per block group. According to Ext4 Disk Layout and various other sources, there are normally 32,768 filesystem blocks per block ground, for a block group size of 128 MBytes, so this probably won't be an issue for you.)

As far as I can tell from trying to understand mballoc.c, the stripe size only has a few effects on block allocation. First, if your write is for an exact multiple of the stripe size, ext4 will generally try to align it to a stripe boundary if possible (assuming there's sufficient unfragmented free space). This is especially likely if you write exactly one stripe's worth of data.

The second use is more complicated (and I may not understand it correctly). For small files, Ext4 allocates space out of 'locality groups', which are given preallocated space in bulk that they can then parcel out (among other things, this keeps small files together on disk). When you have a stripe size set, the size of each locality group's preallocated space is rounded up to a multiple of the stripe size and I believe it's aligned with stripe boundaries. Individual allocations within a locality group's preallocated space don't seem to be aligned to the stripe size if they're not multiples of it.

Comments in the source code suggest that the goal in both cases is to avoid fragmenting stripes and fragmenting things across stripes. However, it's not clear to me that most allocations particularly avoid doing either; certainly they don't explicitly look at the relevant C variable that holds the stripe size.

Having gone through reading the ext4 kernel code, my overall conclusion is that you should benchmark things before you assume that setting the RAID stripe width and stride is doing anything meaningful on ext4 on a SSD. Also, for maximum benefit it seems very likely that you want your applications to do their large writes in multiples of whatever stripe width you set. Of course, writing data out in erase-block sized chunks seems like a good idea in general; regardless of alignment issues, it probably gives the SSD firmware its best chance to avoid read-modify-write cycles.

(When you test this, you may want to use blktrace to make sure that ext4 is actually issuing large right-sized writes out to the SSD and isn't doing something problematic like slicing them up into smaller chunks. Some block IO tuning may turn out to be necessary.)

linux/Ext4AndRAIDStripes written at 01:03:17; Add Comment

2017-10-18

Using Shellcheck is good for me

A few months ago I wrote an entry about my views on Shellcheck where I said that I found it too noisy to be interesting or useful to me. Well, you know what, I have to take that back. What happened is that as I've been writing various shell scripts since then, I've increasingly found myself reaching for Shellcheck as a quick syntax and code check that I could use without trying to run my script. Shellcheck is a great tool for this, and as a bonus it can suggest some simplifications and improvements.

(Perhaps there are other programs that can do the same sort of checking that shellcheck does, but if so I don't think I've run across them yet. The closest I know of is shfmt.)

Yes, Shellcheck is what you could call nitpicky (it's a linter, not just a code checker, so part of its job is making style judgments). But going along with it doesn't hurt (I've yet to find a situation where a warning was actively wrong) and it's easier to spot real problems if 'shellcheck <script>' is otherwise completely silent. I can live with the cost of sprinkling a bunch of quotes over the use of shell variables, and the result is more technically correct even if it's unlikely to ever make a practical difference.

In other words, using Shellcheck is good for me and my shell scripts even if it can be a bit annoying. Technically more correct is still 'more correct', and Shellcheck is right about the things it complains about regardless of what I think about it.

(With that said, I probably wouldn't bother using Shellcheck and fixing its complaints about unquoted shell variable usage if that was all it did. The key to its success here is that it adds value over and above its nit-picking; that extra value pushes me to use it, and using it pushes me to do the right thing by fixing my variable quoting to be completely correct.)

programming/ShellcheckGoodForMe written at 23:55:25; Add Comment

I still like Python and often reach for it by default

Various local events recently made me think a bit about the future of Python at work. We're in a situation where a number of our existing tools will likely get drastically revised or entirely thrown away and replaced, and that raises local issues with Python 3 as well as questions of whether I should argue for changing our list of standard languages. I have some technical views on the answer, but thinking through this has made me realize something on a more personal level. Namely, I still like Python and it's my go-to default language for a number of things.

I'm probably always going to be a little bit grumpy about the whole transition toward Python 3, but that in no way erases the good parts of Python. Despite the baggage around it, Python 3 has its own good side and I remain reasonably enthused about it. Writing modest little programs in Python has never been a burden; the hard parts are never from Python, they're from figuring out things like data representation and that's the same challenge in any language. In the mean time, Python's various good attributes make it pretty plastic and easily molded as I'm shaping and re-shaping my code as I figure out more of how I want to do things.

(In other words, experimenting with my code is generally reasonably easy. When I may completely change how I approach a problem between my first draft and my second attempt, this is quite handy.)

Also, Python makes it very easy to do string-bashing and to combine it with basic Unix things. This describes a lot of what I do, which means that Python is a low-overhead way of writing something that is much like a shell script but that's more structured, better organized, and expresses its logic more clearly and directly (because it's not caught up in the Turing tarpit of Bourne shell).

(This sort of 'better shell script' need comes up surprisingly often.)

My tentative conclusion about what this means for me is that I should embrace Python 3, specifically I should embrace it for new work. Despite potential qualms for some things, new program that I write should be in Python 3 unless there's a strong reason they can't be (such as having to run on a platform with an inadequate or missing Python 3). The nominal end of life for Python 2 is not all that far off, and if I'm continuing with Python in general (and I am), then I should be carrying around as little Python 2 code as possible.

python/IStillLikePython written at 02:58:38; Add Comment

2017-10-17

My current grumpy view on key generation for hardware crypto keys

I tweeted:

My lesson learned from the Infineon HSM issue is to never trust a HSM to generate keys, just to store them. Generate keys on a real machine.

In my usual manner, this is perhaps overstated for Twitter. So let's elaborate on it a bit, starting with the background.

When I first heard about the Infineon TPM key generation issue (see also the technical blog article), I wasn't very concerned, since we don't have sophisticated crypto smartcards or electronic ID cards or the like. Then I found out that some Yubikeys are affected and got grumpy. When I set up SSH keys on my Yubikey 4, I had the Yubikey itself generate the RSA key involved. After all, why not? That way the key was never exposed on my Linux machine, even if the practical risks were very low. Unfortunately, this Infineon issue now shows the problem in that approach.

In theory, a hardware key like the Yubikey is a highly secure physical object that just works. In practice they are little chunks of inexpensive hardware that run some software, and there's nothing magical about that software; like all software, it's subject to bugs and oversights. This means that in practice, there is a tradeoff about where you generate your keys. If you generate them inside the HSM instead of on your machine, you don't have to worry about your machine being compromised or the quality of your software, but you do have to worry about the quality of the HSM's software (and related to that, the quality of the random numbers that the HSM can generate).

(Another way to put this is that a HSM is just a little computer that you can't get at, running its own collection of software on some hardware that's often pretty tiny and limited.)

As a practical matter, the software I'd use for key generation on my Linux machine is far more scrutinized (especially these days) and thus almost certainly much more trustworthy than the opaque proprietary software inside a HSM. The same is true for /dev/urandom on a physical Linux machine such as a desktop or a laptop. It's possible that a HSM could do a better job on both fronts, but it's extremely likely that my Linux machine is good enough on both. That leaves machine compromise, which is a very low probability issue for most people. And if you're a bit worried, there are also mitigation strategies for the cautious, starting with disconnecting from the network, turning off swap, generating keys into a tmpfs, and then rebooting your machine afterward.

Once upon a time (only a year ago), I thought that the balance of risks made it perfectly okay to generate RSA keys in the Yubikey HSM. It turns out that I was wrong in practice, and now I believe that I was wrong in general for me and most people. I now feel that the balance of risks strongly favour trusting the HSM more or less as little as possible, which means only trusting it to hold keys securely and perhaps limit their use to only when the HSM is unlocked or the key usage is approved.

(This is actually giving past me too much credit. Past me didn't even think about the risk that the Yubikey software could have bugs; past me just assumed that of course it didn't and therefor was axiomatically better than generating keys on the local machine and moving them into the HSM. After all, who would sell a HSM that didn't have very carefully audited and checked software? I really should have known better, because the answer is 'nearly everyone'.)

PS: If you have a compliance mandate that keys can never be created on a general-purpose machine in any situation where they might make it to the outside world, you have two solutions (at least). One of them involves hope and then perhaps strong failure, as here with Infineon, and one of them involves a bunch of work, some persuasion, and perhaps physically destroying some hardware afterward if you're really cautious.

sysadmin/KeyGenerationAndHSMs written at 00:17:55; Add Comment

2017-10-16

Getting ssh-agent working in Fedora 26's Cinnamon desktop environment

I tweeted:

I have just been through an extensive yak-shaving exercise to use ssh-agent with Cinnamon and have it actually work reliably on Fedora 26.

The first question you might ask is why even use ssh-agent instead of the default of gnome-keyring-daemon. That's straightforward; gnome-keyring-daemon still doesn't support ed25519 keys, despite a very long standing open bug about it (and another bug for ECDSA keys). I'm also not sure if current versions support Yubikey-based SSH keys, which I also care about, and apparently there are other issues with it.

(One charming detail from the GNOME ed25519 bug is that apparently there is no maintainer for either gnome-keyring-daemon as a whole or perhaps just the SSH keys portions of it. This situation doesn't inspire any great fondness in me for gnome-keyring-daemon, to put it one way.)

In Fedora 26, I ran into two problems with my previously-working ssh-agent environment. The first problem is that gnome-terminal doesn't inherit the correct $SSH_AUTH_SOCK setting, even if it's set in the general environment and is seen by other programs in my Cinnamon environment. The core problem seems to be that these days, all your gnome-terminal windows are actually created by a single master process, and in Fedora 26 this is started through a separate systemd user .service. I don't know how that service is supposed to inherit environment variables, but it doesn't get the correct $SSH_AUTH_SOCK; instead it always winds up with /run/user/${UID}/keyring/ssh, which is the gnome-keyring-daemon setting. My solution to this is pretty brute force; I added a little stanza to my session setup script that symlinked this path to the real $SSH_AUTH_SOCK.

(This implies that other systemd user .service units also probably have the wrong $SSH_AUTH_SOCK value, but they're all 'fixed' by my hack.)

The larger issue is that a ssh-agent process was only started the first time I logged in after system reboot. If I logged out and then logged back in again, my session had a $SSH_AUTH_SOCK value set but no ssh-agent process. In fact, it had the first session's $SSH_AUTH_SOCK value, which pointed to a socket that no longer existed because it had been cleaned up on session exit. I'm not sure what causes this, but I have noticed that there are a whole collection of systemd user .service units under user@${UID}.service that linger around even after I've logged out of my session. It certainly appears that while these exist, new Cinnamon sessions inherit the old session's $SSH_AUTH_SOCK value. This inheritance is a problem because of a snippet in /etc/X11/xinit/xinitrc-common:

if [ -z "$SSH_AGENT" ] && [ -z "$SSH_AUTH_SOCK" ] && [ -z "$SSH_AGENT_PID" ] && [ -x /usr/bin/ssh-agent ]; then
    if [ "x$TMPDIR" != "x" ]; then
        SSH_AGENT="/usr/bin/ssh-agent /bin/env TMPDIR=$TMPDIR"
    else
        SSH_AGENT="/usr/bin/ssh-agent"
fi

This starts ssh-agent only if $SSH_AUTH_SOCK is unset. If it's set to a bad value, no new ssh-agent is started and your entire session inherits the bad value and nothing works. My workaround was to change xinitrc-common to clear $SSH_AUTH_SOCK and all associated environment variables if it was set but pointed to something that didn't exist:

if [ -n "$SSH_AUTH_SOCK" ] && [ ! -S "$SSH_AUTH_SOCK" ]; then
   unset SSH_AGENT
   unset SSH_AUTH_SOCK
   unset SSH_AGENT_PID
fi

This appears to make everything work.

After I had worked all of this out and set it up, Jordan Sissel shared a much simpler workaround:

I used a oneliner that would kill gnome-keyring and replace it with ssh-agent on the same $SSH_AUTH_SOCK :\ Super annoying, though.

If I was doing this I wouldn't kill gnome-keyring-daemon entirely; I would just make my session startup script run a ssh-agent on /run/user/${UID}/keyring/ssh (using ssh-agent's -a command line argument).

(It's likely that gnome-keyring-daemon does other magic things that my Cinnamon session cares about. I'd rather not find out what other bits break if it's not running, or have it restart on me and perhaps take over the SSH agent socket again.)

PS: I'd file bug reports with Fedora except that I suspect they'd consider this an unsupported environment, and my track record with Fedora bug reports is not great in general. And filing bug reports with Fedora against gnome-keyring-daemon is pointless; if it's not getting action upstream, there's not much Fedora can do about it.

linux/Fedora26CinnamonSSHAgent written at 00:04:20; Add Comment

2017-10-15

Unbalanced reads from SSDs in software RAID mirrors in Linux

When I was looking at the write volume figures for yesterday's entry, one additional thing that jumped out at me is that on our central mail server, reads were very unbalanced between its two system SSDs. This machine, as with many of our important servers, has a pair of SSDs set up as mirrors with Linux software RAID. In theory I'd expect reads to be about evenly distributed across each side of the mirror; in practice, well:

242 Total_LBAs_Read [...]  16838224623
242 Total_LBAs_Read [...]  1698394290

That's almost a factor of ten difference. Over 90% of the reads have gone to the first SSD, and it's not an anomaly or a one-time thing; I could watch live IO rates and see that much of the time only the first disk experienced any read traffic.

It turns out that this is more or less expected behavior in Linux software RAID, especially on SSDs, and has been for a while. It appears that the core change for this was made to the software RAID code in 2012, and then an important related change was made in late 2016 (and may not be in long-term distribution kernels). The current state of RAID1 read balancing is kind of complex, but the important thing here in all kernels since 2012 is that if you have SSDs and at least one disk is idle, the first idle disk will be chosen. In general the read balancing code will use the (first) disk with the least pending IO, so the case of idle disks is just the limit case.

(In kernels with the late 2016 change, this widens to if at least one disk is idle, the first idle disk will be chosen, even if all mirrors are HDs.)

SSDs are very fast in general and they have no seek delays for non-sequential IO. The result is that under casual read loads, most of the time both SSDs in a mirror are idle and so the RAID1 read balancing code will always choose to read from the first SSD. Reads spill over to the second SSD only if the first SSD is already handling a read at the time that an unrelated second read comes in. As we can see here, that doesn't happen all that frequently.

(Although our central mail server is an outlier as far as how unbalanced it is, other servers with mirrored SSDs also have unbalanced reads with the first disk in the mirror seeing far more than the second disk.)

linux/UnbalancedSSDMirrorReads written at 02:39:17; Add Comment

2017-10-14

A surprise about which of our machines has the highest disk write volume

Once upon a time, hard drives and SSDs just had time-based warranties. These days, many SSDs have warranties that are more like cars; they're good for so much time or so many terabytes written, whichever comes first, and different SSD makers and models can have decidedly different maximum figures for this. So, as part of investigating what SSDs to get for future usage here, we've been looking into what sort of write volume we see on both our ZFS fileservers (well, on the iSCSI backends for them) and on the system SSDs of those of our Ubuntu servers that have them. The result was a bit surprising.

Before I started looking into this, I probably would have guessed that the highest write volume would be either for the SSDs of the ZFS pool that holds our /var/mail filesystem. I might have also guessed that perhaps some of the oldest disks for ZFS pools on our most active fileserver might be pretty active. While both of these are up in the write volume rankings, neither has our highest write volume.

Our highest write volume turns out to happen on the system SSDs in our central mail machine; they see about 32 TB of writes a year, compared to about 23 TB of writes a year on the busiest iSCSI backend disks on our most active fileserver. The oldest and most active SSDs involved in the mail spool have seen only about 10 TB of writes a year, which is actually below many of our more active ZFS pool disks (on several fileservers). The central mail machine's IO activity is also heavily unbalanced in favour of writes; with some hand-waving about the numbers, the machine runs about 80% writes (by the amount of data involved) or more. The disks in the ZFS pools show much lower write to read ratios; an extreme case is the mail spool's disks, which see only 12% writes by IO volume.

My current theory is that this huge write volume is because Exim does a lot of small writes to things like message files and log files and then fsync()'s them out to disk all the time. Exim uses three files for each message and updates two of them frequently as message deliveries happen; updates almost certainly involve fsync(), and then on top of that the filesystem is busy making all the necessary file creations, renames, and deletions be durable. We're using ext4, but even there the journal has to be forced to disk at every step.

(This certainly seems to be something involving Exim, as our external mail gateway has the same highly unbalanced writes to reads ratio. The gateway is doing roughly 4 TB of writes a year, but that's still quite high for our Ubuntu system SSDs.)

PS: All of these figures for SSDs are before any internal write amplification that the SSD itself does. My understanding is that SSD warranty figures are quoted before write amplification, as the user-written write volume.

sysadmin/MTAHighWriteVolume written at 03:17:09; Add Comment

2017-10-13

Working to understand PCI Express and how it interacts with modern CPUs

PCI Express sort of crept up on me while I wasn't looking. One day everything was PCI and AGP, then there was some PCI-X in our servers, and then my then-new home machine had PCIe instead but I didn't really have anything to put in those slots so I didn't pay attention. With a new home machine in my future, I've been working to finally understand all of this better. Based on my current understanding, there are two sides here.

PCI Express itself is well described by the Wikipedia page. The core unit of PCIe connections is the lane, which carries one set of signals in either direction. Multiple lanes may be used together by the same device (or connection) in order to get more bandwidth, and these lane counts are written with an 'x' prefix, such as 'x4' or 'x16'. For a straightforward PCIe slot or card, the lane count describes both its size and how many PCIe lanes it uses (or wants to use); this is written as, for example 'PCIe x16'. It's also common to have a slot that's one physical size but provides fewer PCIe lanes; this is commonly written with two lane sizes, eg 'PCIe x16 @ x4' or 'PCIe x16 (x4 mode)'.

While a PCIe device may want a certain number of lanes, that doesn't mean it's going to get them. Lane counts are negotiated by both ends, which in practice means that the system can decide that a PCIe x16 graphics card in an x16 slot is actually only going to get 8 lanes (or less). I don't know if in theory all PCIe devices are supposed to work all the way down to one lane (x1), but if so I cynically suspect that in practice there are PCIe devices that can't or won't cope well if their lane count is significantly reduced.

(PCIe interconnection can involve quite sophisticated switches.)

All of this brings me around to how PCIe lanes connect to things. Once upon a time, the Northbridge chip was king and sat at the heart of your PC; it connected to the CPU, it connected to RAM, it connected to your AGP slot (or maybe a PCIe slot). Less important and lower bandwidth things were pushed off to the southbridge. These days, the CPU has dethroned the northbridge by basically swallowing it; a modern CPU directly connects to RAM, integrated graphics, and a limited number of PCIe lanes (and perhaps a few other high-importance things). Additional PCIe lanes, SATA ports, and everything else are connected to the motherboard chipset, which then connects back to the CPU through some interconnect. On modern Intel CPUs, this is Intel's DMI and is roughly equivalent to a four-lane PCIe link; on AMD's modern CPUs, this is apparently literally an x4 PCIe link.

Because you have to get to the CPU to talk to RAM, all PCIe devices that use non-CPU PCIe lanes are collectively choked down to the aggregate bandwidth of the chipset to CPU link for DMA transfers. Since SATA ports, USB ports, and so on are also generally connected to the chipset instead of the CPU, your PCIe devices are contending with them too. This is especially relevant with high-speed x4 PCIe devices such as M.2 NVMe SSDs, but I believe it comes up for 10G networking as well (especially if you have multiple 10G ports, where I think you need x4 PCIe 3.0 to saturate two 10G links).

(I don't know if you can usefully do PCIe transfers from one device to another device directly through the chipset, without touching the CPU and RAM and thus without having to go over the link between the chipset and the CPU.)

Typical Intel desktop CPUs have 16 onboard PCIe lanes, which are almost always connected to an x16 and an x16 @ x8 PCIe slot for your graphics cards. Current Intel motherboard chipsets such as the Z370 have what I've seen quoted as '20 to 24' additional PCIe lanes; these lanes must be used for M.2 NVMe drives, additional PCIe slots, and additional onboard chips that the motherboard vendor has decided to integrate (for example, to provide extra USB 3.1 gen 2 ports or extra SATA ports).

The situation with AMD Ryzen and its chipsets is more tangled and gets us into the difference between PCIe 2.0 and PCIe 3.0. Ryzen itself has 24 PCIe lanes to Intel's 16, but the Ryzen chipsets seem to have less additional PCIe lanes and many of them are slower PCIe 2.0 ones. The whole thing is confusing me, which makes it fortunate that I'm not planning to get a Ryzen-based system for various reasons, but for what it's worth I suspect that Ryzen's PCIe lane configuration is better for typical desktop users.

Unsurprisingly, server-focused CPUs and chipsets have more PCIe lanes and more lanes directly connected to the CPU or CPUs (for multi-socket configurations). Originally this was probably aimed at things like multiple 10G links and large amounts of high-speed disk IO. Today, with GPU computing becoming increasingly important, it's probably more and more being used to feed multiple x8 or x16 GPU card slots with high bandwidth.

tech/PCIeAndModernCPUs written at 02:10:23; Add Comment

2017-10-12

I'm looking forward to using systemd's new IP access control features

These days, my reaction to hearing about new systemd features is usually somewhere between indifference and irritation (I'm going to avoid giving examples, for various reasons). The new IP access lists feature is a rare exception; as a sysadmin, I'm actually reasonably enthused about it. What makes systemd's version of IP access restrictions special and interesting is that they can be be imposed per service, not just globally (and socket units having different IP access restrictions than the service implementing them adds extra possibilities).

As a sysadmin, I not infrequently deal with services that either use random ports by default (such as many NFS related programs) or which have an irritating habit of opening up 'control' ports that provide extra access to themselves (looking at what processes are listening on what ports on a typical modern machine can be eye-opening and alarming, especially since many programs don't document their port usage). Dealing with this with general iptables rules is generally too much work to be worth it, even when things don't go wrong; you have to chase down programs, try to configure some of them to use specific ports, hope that the other ports you're blocking are fixed and aren't going to change, and so on.

Because systemd can do these IP access controls on a per service basis, it promises a way out from all of this hassle. With per-service IP access controls, I can easily configure my NFS services so that regardless of what ports they decide to wander off and use, they're only going to be accessible to our NFS clients (or servers, for client machines). Other services can be locked down so that even if they go wild and decide to open up random control ports, nothing is going to happen because no one can talk to them. And the ability to set separate IP access controls on .socket units and .service units opens up the possibility of doing something close to per-port access control for specific services. CUPS already uses socket activation on our Ubuntu 16.04 machines, so we could configure the IPP port to be generally accessible but then lock down the CUPS .service and daemon so we don't have to worry that someday it will sprout an accessible control port somewhere.

(There are also uses for denying outbound traffic to some or many destinations but only for some services. This is much harder to do with iptables, and sometimes not possible at all.)

linux/SystemdComingIPAccessControl written at 01:15:16; Add Comment

(Previous 10 or go back to October 2017 at 2017/10/11)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.