Wandering Thoughts archives

2015-10-22

CPython's trust of bytecode is not a security problem

Yesterday I wrote about how CPython trusts bytecode so much that you can use it to read or write arbitrary memory. In comments, Ewen McNeil had a typical reaction to this:

It appears this means if you can get arbitrary Python execution (eg, unwisely trusting YAML, XML, pickle, etc...), then you can probably get arbitrary memory read/write in the Python process, which is a fairly short step away from arbitrary assembly code execution.

This is true, but it also misunderstands the security situation of Python bytecode. Even without this issue, it is game over in general if an attacker can load arbitrary bytecode into your Python process. The obvious weakness is that ctypes is part of the standard library these days and it can also be used to give you this level of access to memory without any need to corrupt the bytecode interpreter. But even without ctypes an attacker has plenty of options to achieve binary code execution. They can transfer a binary, write it out, and then execute it. They can transfer a native code Python module (in .so form), manipulate the Python load path, and then import it (which gets them code execution even in the Python process). They can run other existing vulnerable binaries on your system and exploit their bugs. And so on.

You can certainly try to stop this by creating a Python environment that blocks access to the Python features necessary for this. The problem is that there have proven to be many features that can be exploited to help here and many paths through Python to reach them. The runtime environment of Python is a complex, intertangled thing, and all attackers need is one crack that lets them bootstrap a reference to, say, the os module. And there are a lot of potential cracks.

(Python used to have a restricted execution module. As you can see, it was disabled in Python 2.3 because it had basically unfixable holes.)

The simple truth is Python is not a safe execution environment for untrusted code. The only important thing about bytecode being able to read and write arbitrary memory all by itself is that it shows how impossible the job of securing CPython is. Even if you managed to reliably cut off all access to modules and code that could be used to escape your sandbox at the Python level, you would have to audit and fix the innards of the bytecode interpreter itself to be safe.

This is why I say that this trust of bytecode is not a security problem; it doesn't really make the situation any worse than it already is. It's just an amusingly baroque alternate path to a security issue that is already there in general.

BytecodeIsTrustedII written at 00:12:41; Add Comment

2015-10-21

Python bytecode is quite heavily trusted by CPython

I've written before that Python bytecode is not secure, and at the time I said:

[...] I wouldn't be surprised if hand-generating crazy instruction sequences could do things like crash CPython (in fact, I'm pretty confidant that doing this is relatively trivial) and lead to arbitrary code execution. [...]

It turns out that I was exactly correct here, and it's actually been both found and demonstrated. Start with this tweet:

Python devs will hate you for it! One weird trick to directly access python's memory from the interpreter: [gist]

There's a brief explanation and then you can read the details of how CPython bytecode can be used to read and write arbitrary memory.

As that article notes, this is not a bug or at least not something the Python developers consider a bug. And for what it's worth, I agree with them. The CPython bytecode interpreter deliberately chooses to gain some extra speed by omitting checks that are only necessary if either something has gone terribly wrong with bytecode generation or you are loading malicious bytecode. LOAD_CONST is a hot path in a very important optimization and there are undoubtedly any number of other issues lurking in the undergrowth here; closing this hole would probably not make loading untrusted CPython bytecode materially safer and it probably would exact a slowdown.

(At a start, if you're even going to consider doing that it's clear that you need to at least audit the CPython bytecode interpreter to try to find other issues. You probably also want a pre-loading bytecode validation pass, too.)

One corollary of this is that bytecode rewriting is potentially dangerous (even if you have good intentions). A sufficiently badly rewritten bytecode sequence may not merely malfunction at the Python level, it's possible that it could crash or corrupt the CPython interpreter.

(On the other hand, if you're rewriting bytecode and running the result in production you probably really need whatever your rewriting enables. Test thoroughly, but if you've got to rewrite bytecode, well, you've got to. At least CPython gives you the freedom if you absolutely need it.)

BytecodeIsTrusted written at 02:08:07; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.