Why Unix doesn't have user-changeable namespaces

November 10, 2012

Today I was reading Plan 9 mounts and dependency injection, and in a footnote ran across this:

For the longest time, Linux did not provide per-process mount namespaces, and even today this feature is not available to unprivileged users — Plan 9, in contrast, had this feature available from the very beginning to all users.

As it happens there's an excellent reason why Unix (not just Linux) doesn't support this and why Plan 9 does, the same reason that chroot() is a privileged system call. It's this: Plan 9 does not have setuid, and Unix does.

Imagine that Unix had this feature and still had setuid, and you would like root privileges. No problem; make a custom namespace for /etc that has a version of /etc/shadow, /etc/group, and /etc/sudoers that have known passwords and list you as authorized. Now run sudo. Done.

In Unix, the practical security of setuid programs relies on control of the filesystem. There is a huge raft of ways to subvert almost all setuid programs if you can control the contents of all files that they access, and that is exactly what unprivileged, per-process namespaces give you. While it might be theoretically possible to still be secure in this environment (with a tiny bit of kernel support), no setuid program today is going to be for the simple reason that a setuid Unix program is entirely entitled to believe that /etc/shadow is under the system administrator's control, because that is the Unix permission model.

(I'm far from convinced that it's even theoretically possible for setuid programs to be secure in this environment, but I'm not totally sure it's impossible so I'm being conservative. What is sure is that any Unix system with user-controlled namespaces would need to totally rewrite all setuid programs.)

You could try to fix this by saying that setuid processes (and all processes that they start) don't see the user's customized namespace but some sort of standardized system-wide namespace. But this is a terrible solution in practice, one that's going to endlessly surprise users, because it means that setuid programs can wind up seeing an entirely different view of the system than you do. Since users do not necessarily know what's setuid and what isn't (and it changes over time anyways), they're going to experience an environment where some things work and others don't, apparently at random.

The end result of all of this is that all Unix system calls that rearrange the namespace that programs see must be privileged, because they allow you to compromise system security.

Plan 9 doesn't have setuid (because it has a real distributed filesystem), so it has none of these problems. You can rearrange your namespace freely because there's nothing you can subvert with a rearranged namespace.

Sidebar: the direct way to subvert almost all setuid programs with this

Modifying /etc/sudoers or /etc/shadow is simple but it's not the most direct way to subvert setuid programs. Almost all setuid programs on a modern system are dynamically linked, and a dynamically linked program loads code from outside files (both the initial runtime loader and various libraries). So build your own hacked version of the runtime loader that ignores the program and does whatever you want, change the namespace to put it on top of the normal loader's filename, and run a setuid program. Any precautions the program itself takes are now completely irrelevant; you have control before its code even starts executing.

This is why I say you need a tiny bit of kernel support to have even a theoretical chance of secure setuid programs in the face of such changeable namespaces; the kernel has to block this somehow for setuid programs or the game is over before it even starts.

Sidebar: shooting down a potential limited solution

One potential bandaid to preserve some unprivileged namespace changing would be to say that you can only change the binding of 'mount points' that you own; you can change what $HOME/bin is, but not /etc. The argument for this is that you can already change $HOME/bin around. However, I'm not convinced that this is secure in this form. The problem with a straightforward implementation is that you can cause files owned by other people to appear at arbitrary points under your own control (even if those files are not on the same filesystem as your directories and files). This is exactly the kind of thing that Linux has been busy preventing lately, which is a strong argument that it is dangerous in practice.

The apparently safe, fully restricted version requires you to own both the source and the target of the namespace operation. I'm not convinced that this is particularly useful.

Written on 10 November 2012.
« Some amusing cut and paste work from spammers
A reminder: string concatenation really is string concatenation »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 10 00:43:45 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.