CPython's trust of bytecode is not a security problem
Yesterday I wrote about how CPython trusts bytecode so much that you can use it to read or write arbitrary memory. In comments, Ewen McNeil had a typical reaction to this:
It appears this means if you can get arbitrary Python execution (eg, unwisely trusting YAML, XML, pickle, etc...), then you can probably get arbitrary memory read/write in the Python process, which is a fairly short step away from arbitrary assembly code execution.
This is true, but it also misunderstands the security situation
of Python bytecode. Even without this issue, it is game over
in general if an attacker can load arbitrary bytecode into
your Python process. The obvious weakness is that ctypes is part of the standard
library these days and it can also be used to give you this level of
access to memory without any need to corrupt the bytecode interpreter.
But even without ctypes an attacker has plenty of options to achieve
binary code execution. They can transfer a binary, write it out, and
then execute it. They can transfer a native code Python module (in
form), manipulate the Python load path, and then
import it (which gets
them code execution even in the Python process). They can run other
existing vulnerable binaries on your system and exploit their bugs. And
You can certainly try to stop this by creating a Python environment
that blocks access to the Python features necessary for this. The
problem is that there have proven to be many features that can be
exploited to help here and many paths through Python to reach them.
The runtime environment of Python is a complex, intertangled thing,
and all attackers need is one crack that lets them bootstrap a
reference to, say, the
os module. And there are a lot of potential
(Python used to have a restricted execution module. As you can see, it was disabled in Python 2.3 because it had basically unfixable holes.)
The simple truth is Python is not a safe execution environment for untrusted code. The only important thing about bytecode being able to read and write arbitrary memory all by itself is that it shows how impossible the job of securing CPython is. Even if you managed to reliably cut off all access to modules and code that could be used to escape your sandbox at the Python level, you would have to audit and fix the innards of the bytecode interpreter itself to be safe.
This is why I say that this trust of bytecode is not a security problem; it doesn't really make the situation any worse than it already is. It's just an amusingly baroque alternate path to a security issue that is already there in general.