2019-12-18
Linux kernel Security Modules (LSMs) need their own errno
value
Over on Twitter, I said something I've said before:
Once again, here I am hating how Linux introduced additional kernel security modules without also adding an errno for 'the loadable security module denied permissions'.
Lack of a LSM errno significantly complicates debugging problems, especially if you don't normally use LSMs.
Naturally there's a sysadmin story here, but let's start with the background (even if you probably know it).
SELinux and Ubuntu's AppArmor
are examples of Linux Security Modules; each of
them adds additional permission checks that you must pass over and
above the normal Unix permissions. However, when they reject your
access, they don't actually tell you this specifically; instead you
get the generic Unix error of EPERM
, 'operation not permitted',
which is normally what you get if, say, the file is unreadable to
your UID for some reason.
We have an internal primary master DNS server for our DNS zones (a
so called 'stealth master'), which runs Ubuntu instead of OpenBSD
for various reasons. We have the winter holiday break coming up and
since we've had problems with it coming up cleanly in the past, so last week it seemed like
a good time to reboot it under controlled circumstances to make
sure that at least that worked. When I did that, named (aka Bind)
refused to start with a 'permission denied' error (aka EPERM
)
when it tried to read its named.conf
configuration file. For
reasons beyond the scope of this entry, this file lives on our
central administrative NFS filesystem, and when you throw NFS into
the picture various things can go wrong with access permissions.
So I spent some time looking at file and directory permissions, NFS
mount state, and so on, until I remembered something my co-worker
had mentioned in passing.
Ubuntu defaults to installing and using AppArmor, but we don't
like it and we turn it off almost everywhere (we can't avoid it for
MySQL, although we can make it harmless).
That morning we had applied the pending Ubuntu packages updates,
as one does, and one of the packages that got updated had been the
AppArmor package. It turns out that in our environment, when an
AppArmor package update is applied, AppArmor gets re-enabled (but
I think not started immediately); when I rebooted our primary DNS
master, it now started AppArmor. AppArmor has a profile for Bind
that only allows for a configuration file in the standard place,
not where we put our completely different and customized one, and
so when Bind tried to read our named.conf
, the AppArmor LSM said
'no'. But that 'no' was surfaced only as an EPERM
error and so I
went chasing down the rabbit hole of all of the normal causes for
permission errors.
People who deal with LSMs all of the time will probably be familiar
with this issue and will immediately move to the theory that any
unfamiliar and mysterious permission denials are potentially the
LSM in action. But we don't use LSMs normally, so every time one
enables itself and gets in our way, we have to learn all about this
all over again. The process of troubleshooting would be much easier
if the LSM actually told us that it was doing things by having a
new errno
value for 'LSM permission denied', because then we'd
know right away what was going on.
(If Linux kernel people are worried about some combination of security concerns and backward compatibility, I would be happy if they made this extra errno value an opt-in thing that you had to turn on with a sysctl. We would promptly enable it for all of our servers.)
PS: Even if we didn't have our named.conf
on a NFS filesystem,
we probably wouldn't want to overwrite the standard version with
our own. It's usually cleaner to build your own completely separate
configuration file and configuration area, so that you don't have to
worry about package updates doing anything to your setup.
PCIe slot bandwidth can change dynamically (and very rapidly)
When I added some NVMe drives to my office machine and started looking into its PCIe setup, I discovered that its Radeon graphics card seemed to be operating at 2.5 GT/s (PCIe 1.0) instead of 8 GT/s (PCIe 3.0). The last time around, I thought I had fixed this just by poking into the BIOS, but in a comment, Alex suggested that this was actually a power-saving measure and not necessarily done by the BIOS. I'll quote the comment in full because it summarizes things better than I can:
Your GPU was probably running at lower speeds as a power-saving measure. Lanes consume power, and higher speeds consume more power. The GPU driver is generally responsible for telling the card what speed (and lane width) to run at, but whether that works (or works well) with the Linux drivers is another question.
It turns out that Alex is right, and what I saw after going through the BIOS didn't quite mean what I thought it did.
To start with the summary, the PCIe bandwidth being used by my
graphics card can vary very rapidly from 2.5 GT/s up to 8 GT/s and
then back down again based on whether or not the graphics driver
needs the card to do anything (or the aggregate Linux and X software
stack as a whole, since I don't know where these decisions are being
made). The most dramatic and interesting difference is between two
apparently very similar ways of seeing if the Radeon's bandwidth
is currently downgraded, either automatically scanning through
lspci
's
output with 'lspci -vv | fgrep downgrade
' or manually looking
through it with 'lspci -vv | less
'. When I used less
, the Radeon
normally showed up downgraded to 2.5 GT/s. When I used fgrep
,
other things before the Radeon showed up as downgraded but the
Radeon never did; it was always at 8 GT/s.
(Some of those other things have been downgraded to 'x0' lanes, which I suspect means that they've been disabled as unused.)
What I think is happening here is that when I pipe lspci
to less
,
lspci
gets the Radeon's bandwidth before any output is written
to the screen (less
reads it all in a big gulp and then displays
it), so at the time the graphics chain is inactive. When I use the
fgrep
pipe, some output is written to the screen before lspci
gets to the Radeon and so the graphics chain lights up the Radeon's
bandwidth to display things. What this suggests is that the graphics
chain can and does vary the Radeon's PCIe bandwidth quite rapidly.
Another interesting case is that running the venerable glxgears
doesn't
bring the PCIe bandwidth up from 2.5 GT/s, but running GpuTest's 'fur' test does (it goes to 8
GT/s as you might expect).
(It turns out that nVidia's Linux drivers also do this.)
Of course all of this may make seeing whether you're getting full PCIe bandwidth a little bit interesting. It's clearly not enough to just look at your system, even when it's moderately active (I have several X programs that update once a second); you really need to put it under some approximation of full load and then check. So far I've only seen this happen with graphics cards, but who knows what's next (NVMe drives could be one candidate to drop their bandwidth to save power and thus reduce heat).