Python bytecode is quite heavily trusted by CPython
I've written before that Python bytecode is not secure, and at the time I said:
[...] I wouldn't be surprised if hand-generating crazy instruction sequences could do things like crash CPython (in fact, I'm pretty confidant that doing this is relatively trivial) and lead to arbitrary code execution. [...]
It turns out that I was exactly correct here, and it's actually been both found and demonstrated. Start with this tweet:
Python devs will hate you for it! One weird trick to directly access python's memory from the interpreter: [gist]
There's a brief explanation and then you can read the details of how CPython bytecode can be used to read and write arbitrary memory.
As that article notes, this is not a bug or at least not something
the Python developers consider a bug. And for what it's worth, I
agree with them. The CPython bytecode interpreter deliberately
chooses to gain some extra speed by omitting checks that are only
necessary if either something has gone terribly wrong with bytecode
generation or you are loading malicious bytecode.
a hot path in a very important optimization
and there are undoubtedly any number of other issues lurking in the
undergrowth here; closing this hole would probably not make loading
untrusted CPython bytecode materially safer and it probably would
exact a slowdown.
(At a start, if you're even going to consider doing that it's clear that you need to at least audit the CPython bytecode interpreter to try to find other issues. You probably also want a pre-loading bytecode validation pass, too.)
One corollary of this is that bytecode rewriting is potentially dangerous (even if you have good intentions). A sufficiently badly rewritten bytecode sequence may not merely malfunction at the Python level, it's possible that it could crash or corrupt the CPython interpreter.
(On the other hand, if you're rewriting bytecode and running the result in production you probably really need whatever your rewriting enables. Test thoroughly, but if you've got to rewrite bytecode, well, you've got to. At least CPython gives you the freedom if you absolutely need it.)
Comments on this page:Written on 21 October 2015.