Wandering Thoughts archives


Python bytecode is quite heavily trusted by CPython

I've written before that Python bytecode is not secure, and at the time I said:

[...] I wouldn't be surprised if hand-generating crazy instruction sequences could do things like crash CPython (in fact, I'm pretty confidant that doing this is relatively trivial) and lead to arbitrary code execution. [...]

It turns out that I was exactly correct here, and it's actually been both found and demonstrated. Start with this tweet:

Python devs will hate you for it! One weird trick to directly access python's memory from the interpreter: [gist]

There's a brief explanation and then you can read the details of how CPython bytecode can be used to read and write arbitrary memory.

As that article notes, this is not a bug or at least not something the Python developers consider a bug. And for what it's worth, I agree with them. The CPython bytecode interpreter deliberately chooses to gain some extra speed by omitting checks that are only necessary if either something has gone terribly wrong with bytecode generation or you are loading malicious bytecode. LOAD_CONST is a hot path in a very important optimization and there are undoubtedly any number of other issues lurking in the undergrowth here; closing this hole would probably not make loading untrusted CPython bytecode materially safer and it probably would exact a slowdown.

(At a start, if you're even going to consider doing that it's clear that you need to at least audit the CPython bytecode interpreter to try to find other issues. You probably also want a pre-loading bytecode validation pass, too.)

One corollary of this is that bytecode rewriting is potentially dangerous (even if you have good intentions). A sufficiently badly rewritten bytecode sequence may not merely malfunction at the Python level, it's possible that it could crash or corrupt the CPython interpreter.

(On the other hand, if you're rewriting bytecode and running the result in production you probably really need whatever your rewriting enables. Test thoroughly, but if you've got to rewrite bytecode, well, you've got to. At least CPython gives you the freedom if you absolutely need it.)

python/BytecodeIsTrusted written at 02:08:07; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.