Linux kernel Security Modules (LSMs) need their own errno value

December 18, 2019

Over on Twitter, I said something I've said before:

Once again, here I am hating how Linux introduced additional kernel security modules without also adding an errno for 'the loadable security module denied permissions'.

Lack of a LSM errno significantly complicates debugging problems, especially if you don't normally use LSMs.

Naturally there's a sysadmin story here, but let's start with the background (even if you probably know it).

SELinux and Ubuntu's AppArmor are examples of Linux Security Modules; each of them adds additional permission checks that you must pass over and above the normal Unix permissions. However, when they reject your access, they don't actually tell you this specifically; instead you get the generic Unix error of EPERM, 'operation not permitted', which is normally what you get if, say, the file is unreadable to your UID for some reason.

We have an internal primary master DNS server for our DNS zones (a so called 'stealth master'), which runs Ubuntu instead of OpenBSD for various reasons. We have the winter holiday break coming up and since we've had problems with it coming up cleanly in the past, so last week it seemed like a good time to reboot it under controlled circumstances to make sure that at least that worked. When I did that, named (aka Bind) refused to start with a 'permission denied' error (aka EPERM) when it tried to read its named.conf configuration file. For reasons beyond the scope of this entry, this file lives on our central administrative NFS filesystem, and when you throw NFS into the picture various things can go wrong with access permissions. So I spent some time looking at file and directory permissions, NFS mount state, and so on, until I remembered something my co-worker had mentioned in passing.

Ubuntu defaults to installing and using AppArmor, but we don't like it and we turn it off almost everywhere (we can't avoid it for MySQL, although we can make it harmless). That morning we had applied the pending Ubuntu packages updates, as one does, and one of the packages that got updated had been the AppArmor package. It turns out that in our environment, when an AppArmor package update is applied, AppArmor gets re-enabled (but I think not started immediately); when I rebooted our primary DNS master, it now started AppArmor. AppArmor has a profile for Bind that only allows for a configuration file in the standard place, not where we put our completely different and customized one, and so when Bind tried to read our named.conf, the AppArmor LSM said 'no'. But that 'no' was surfaced only as an EPERM error and so I went chasing down the rabbit hole of all of the normal causes for permission errors.

People who deal with LSMs all of the time will probably be familiar with this issue and will immediately move to the theory that any unfamiliar and mysterious permission denials are potentially the LSM in action. But we don't use LSMs normally, so every time one enables itself and gets in our way, we have to learn all about this all over again. The process of troubleshooting would be much easier if the LSM actually told us that it was doing things by having a new errno value for 'LSM permission denied', because then we'd know right away what was going on.

(If Linux kernel people are worried about some combination of security concerns and backward compatibility, I would be happy if they made this extra errno value an opt-in thing that you had to turn on with a sysctl. We would promptly enable it for all of our servers.)

PS: Even if we didn't have our named.conf on a NFS filesystem, we probably wouldn't want to overwrite the standard version with our own. It's usually cleaner to build your own completely separate configuration file and configuration area, so that you don't have to worry about package updates doing anything to your setup.


Comments on this page:

By Perry Lorier at 2019-12-19 06:15:14:

I've run into very similar problems with EINVAL. It's nearly impossible to figure out what you've done wrong, and the kernel won't be helpful and tell you which part of what you're doing is invalid. So I've started using the below shell script to use ftrace:

$ cat ~/bin/ftrace 
#!/bin/bash 

export DEBUGFS=`awk '/debugfs/ { print $2; }' /proc/mounts`

(
	echo Tracing $BASHPID 
	echo $BASHPID > $DEBUGFS/tracing/set_ftrace_pid
        echo function_graph > $DEBUGFS/tracing/current_tracer
	echo 1 > $DEBUGFS/tracing/options/func_stack_trace
	exec "$@"
)
cat $DEBUGFS/tracing/trace
echo 0 > $DEBUGFS/tracing/options/func_stack_trace
echo nop > $DEBUGFS/tracing/current_tracer

Note: that this will likely temporarily turn your machine to molasses as it performs all the tracing, so be wary of using this on a production host (although if you're using it on a production host, it's likely that the machine is already not doing it's day job so...)

This will show you all the stack traces inside the kernel, so you can see what was called just before it decided to give up and return back to userspace. Either that function is called something suspicious (eg aa_<something>), or when you look at the function source, it's pretty easy to guess which one has gone wrong.

Agreed; I wanted some stats on how much traffic was flowing somewhere once, so I tried to run tcpdump as a service, and somehow AppArmor applied a profile to that. I didn't know it existed, and only allowed writes to *.pcap filenames in some specific directories, so of course the service didn't work. (And only the service. The shell could run the same command just fine.) It took a while to dawn on me.

Written on 18 December 2019.
« PCIe slot bandwidth can change dynamically (and very rapidly)
Splitting a mirrored ZFS pool in ZFS on Linux »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Wed Dec 18 23:59:19 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.