Wandering Thoughts archives

2016-05-26

Why SELinux is inherently complex

The root of SELinux's problems is that SELinux is a complex security mechanism that is hard to get right. Unfortunately this complexity is not (just) simply an implementation artifact of the current SELinux code; instead, it's inherent in what SELinux is trying to do.

What SELinux is trying to do is understand 'valid' program behavior and confine programs to it at a fine grained level in an environment where all of the following are true:

  • Programs are large, complex, and can legitimately do many things (this is especially so because we are really talking about entire assemblages of programs, not just single binaries). After all, SELinux is intended to secure things like web servers, database engines, and mailers, all of which have huge amounts of functionality.

  • Programs legitimately access things that are spread all over the system and intermingled tightly with things that they should not be able to touch. This requires fine-grained selectivity about what programs can and cannot access.

  • Programs use and rely on outside libraries that can have unpredictable, opaque, and undocumented internal behavior, including about what resources those libraries access. Since we're trying to confine all of the program's observed behavior, this necessarily includes the behavior of the libraries that it uses.

All of this means that thoroughly understanding program behavior is very hard, yet such a thorough understanding is the core prerequisite for a SELinux policy that is both correct and secure. Even when you've got a thorough understanding once, the issue with libraries means that it can be kicked out from underneath you by a library update.

(Such insufficient understanding of program behavior is almost certainly the root cause of a great many of the SELinux issues that got fixed here.)

This complexity is inherent in trying to understand program behavior in the unconfined environment of a general Unix system, where programs can touch devices in /dev, configuration files under /etc, run code from libraries in /lib, run helper programs from /usr/bin, poke around in files in various places in /var/log and /var, maybe read things from /usr/lib or /usr/share, make network calls to various services, and so on. All the while they're not supposed to be able to look at many things from those places or do many 'wrong' operations. Your program that does DNS lookups likely needs to be able to make TCP connections to port 53, but you probably don't want it to be able to make TCP connections to port 25 (or 22). And maybe it needs to make some additional connections to local services, depending on what NSS libraries got loaded by glibc when it parsed /etc/nsswitch.conf.

(Cryptography libraries have historically done some really creative and crazy things on startup in the name of trying to get some additional randomness, including reading /etc/passwd and running ps and netstat. Yes, really (via).)

SELinux can be simple, but it requires massive reorganization of a typical Linux system and application stack. For example, life would be much simpler if all confined services ran inside defined directory trees and had no access to anything outside their tree (ie everything was basically chroot()'d or close to it); then you could write really simple file access rules (or at least start with them). Similar things could be done with services provided to applications (for example, 'all logging must be done through this interface'), requirements to explicitly document required incoming and outgoing network traffic, and so on.

(What all of these do is make it easier to understand expected program behavior, either by limiting what programs can do to start with or by requiring them to explicitly document their behavior in order to have it work at all.)

Sidebar: the configuration change problem

The problem gets much worse when you allow system administrators to substantially change the behavior of programs in unpredictable ways by changing their configurations. There is no scalable automated way to parse program configuration files and determine what they 'should' be doing or accessing based on the configuration, so now you're back to requiring people to recreate that understanding of program behavior, or at least a fragment of it (the part that their configuration changes affected).

This generously assumes that all points where sysadmins can change program configuration come prominently marked with 'if you touch this, you need to do this to the SELinux setup'. As you can experimentally determine today, this is not the case.

SELinuxInherentlyComplex written at 02:37:04; Add Comment

2016-05-25

SELinux is beyond saving at this point

SELinux has problems. It has a complexity problem (in that it is quite complex), it has technical problems with important issues like usability and visibility, it has pragmatic problems with getting in the way, and most of all it has a social problem. At this point, I no longer believe that SELinux can be saved and become an important part of the Linux security landscape (at least if Linux remains commonly used).

The fundamental reason why SELinux is beyond saving at this point is that after something like a decade of SELinux's toxic mistake, the only people who are left in the SELinux community are the true believers, the people who believe that SELinux is not a sysadmin usability nightmare, that those who disable it are fools, and so on. That your community narrows is what naturally happens when you double down on calling other people things; if people say you are an idiot for questioning the SELinux way, well, you generally leave.

If the SELinux community was going to change its mind about these issues, the people involved have had years of opportunities to do so. Yet the SELinux ship sails on pretty much as it ever has. These people are never going to consider anything close to what I once suggested in order to change course; instead, I confidently expect them to ride the 'SELinux is totally fine' train all the way into the ground. I'm sure they will be shocked and upset when something like OpenBSD's pledge() is integrated either in Linux libraries or as a kernel security module (or both) and people start switching to it.

(As always, real security is people, not math. A beautiful mathematical security system that people don't really use is far less useful and important than a messy, hacky one that people do use.)

(As for why I care about SELinux despite not using it and thinking it's the wrong way, see this. Also, yes, SELinux can do useful things if you work hard enough.)

SELinuxBeyondSaving written at 01:11:01; Add Comment

2016-05-04

The better way to clear SMART disk complaints, with safety provided by ZFS

A couple of months ago I wrote about clearing SMART complaints about one of my disks by very carefully overwriting sectors on it, and how ZFS made this kind of safe. In a comment, Christian Neukirchen recommended using hdparm --write-sector to overwrite sectors with read errors instead of the complicated dance with dd that I used in my entry. As it happens, that disk coughed up a hairball of smartd complaints today, so I got a chance to go through my procedures again and the advice is spot on. Using hdparm makes things much simpler.

So my revised steps are:

  1. Scrub my ZFS pool in the hopes that this will make the problem go away. It didn't, which means that any read errors in the partition for the ZFS pool is in space that ZFS shouldn't be using.

  2. Use dd to read all of the ZFS partition. I did this with 'dd if=/dev/sdc7 of=/dev/null bs=512k conv=noerror iflag=direct'. This hit several bad spots, each of which produced kernel errors that included a line like this:
    blk_update_request: I/O error, dev sdc, sector 1748083315
    

  3. Use hdparm --read-sector to verify that this is indeed the bad sector:
    hdparm --read-sector 1748083315 /dev/sdc
    

    If this is the correct sector, hdparm will report a read error and the kernel will log a failed SATA command. Note that is not a normal disk read, as hdparm is issuing a low-level read, so you don't get a normal message; instead you get something like this:

    ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    ata3.00: irq_stat 0x40000001
    ata3.00: failed command: READ SECTOR(S) EXT
    ata3.00: cmd 24/00:01:73:a2:31/00:00:68:00:00/e0 tag 3 pio 512 in
             res 51/40:00:73:a2:31/00:00:68:00:00/00 Emask 0x9 (media error)
    [...]
    

    The important thing to notice here is that you don't get the sector reported (at least not in decoded form), so you have to rely on getting the sector number correct in the hdparm command instead of being able to cross check it against earlier kernel logs.

    (Sector 1748083315 is 0x6831a273 in hex. All the bytes are there in the cmd part of the message, but clearly shuffled around.)

  4. Use hdparm --write-sector to overwrite the sector, forcing it to be spared out:
    hdparm --write-sector 1748083315 <magic option> /dev/sdc
    

    (hdparm will tell you what the hidden magic option you need is when you use --write-sector without it.)

  5. Scrub my ZFS pool again and then re-run the dd to make sure that I got all of the problems.

I was pretty sure I'd gotten everything even before the re-scrub and the re-dd scan, because smartd reported that there were no more currently unreadable (pending) sectors or offline uncorrectable sectors, both of which it had been complaining about before.

This was a lot easier and more straightforward to go through than my previous procedure, partly because I can directly reuse the sector numbers from the kernel error messages without problems and partly because hdparm does exactly what I want.

There's probably a better way to scan the hard drive for read errors than dd. I'm a little bit nervous about my 512Kb block size here potentially hiding a second bad sector that's sufficiently close to the first, but especially with direct IO I think it's a tradeoff between speed and thoroughness. Possibly I should explore how well the badblocks program works here, since it's the obvious candidate.

(These days I force dd to use direct IO when talking to disks because that way dd does much less damage to the machine's overall performance.)

(This is the kind of entry that I write because I just looked up my first entry for how to do it again, so clearly I'm pretty likely to wind up doing this a third time. I could just replace the drive, but at this point I don't have enough drive bay slots in my work machine's case to do this easily. Also, I'm a peculiar combination of stubborn and lazy where it comes to hardware.)

ClearingSMARTComplaintsII written at 00:19:21; Add Comment

2016-05-02

How I think you set up fair share scheduling under systemd

When I started writing this entry, I was going to say that systemd automatically does fair share scheduling between and describe the mechanisms that make that work. However, this turns out to be false as far as I can see; systemd can easily do fair share scheduling, but it doesn't do this by default.

The basic mechanics of fair share scheduling are straightforward. If you put all of each user's processes into a separate cgroup it happens automatically. Well. Sort of. You see, it's not good enough to put each user into a separate cgroup; you have to make it a CPU accounting cgroup, and a memory accounting cgroup, and so on. Systemd normally puts all processes for a single user under a single cgroup, which you can see in eg systemd-cgls output and by looking at /sys/fs/cgroup/systemd/user.slice, but by default it doesn't enable any CPU or memory or IO accounting for them. Without those enabled, the traditional Linux (and Unix) behavior of 'every process for itself' still applies.

(You can still use systemd-run to add your own limits here, but I'm not quite sure how this works out.)

Now, I haven't tested the following, but from reading the documentation it seems that what you need to do to get fair share scheduling for users is to enable DefaultCPUAccounting and DefaultBlockIOAccounting for all user units by creating an appropriate file in /etc/systemd/user.conf.d, as covered in the systemd-user.conf manpage and the systemd.resource-control manpage. You probably don't want to turn this on for system units, or at least I wouldn't.

I don't think there's any point in turning on DefaultMemoryAccounting. As far as I can see there is no kernel control that limits a cgroup's share of RAM, just the total amount of RAM it can use, so cgroups just can't enforce a fair share scheduling of RAM the way you can for CPU time (unless I've overlooked something here). Unfortunately, missing fair share memory allocation definitely hurts the overall usefulness of fair share scheduling; if you want to insure that no user can take an 'unfair' share of the machine, it's often just as important to limit RAM as CPU usage.

(Having discovered this memory limitation, I suspect that we won't bother trying to enable fair share scheduling in our Ubuntu 16.04 installs.)

SystemdFairshareScheduling written at 23:11:23; Add Comment

By day for May 2016: 2 4 25 26; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.