Python bytecode is quite heavily trusted by CPython

October 21, 2015

I've written before that Python bytecode is not secure, and at the time I said:

[...] I wouldn't be surprised if hand-generating crazy instruction sequences could do things like crash CPython (in fact, I'm pretty confidant that doing this is relatively trivial) and lead to arbitrary code execution. [...]

It turns out that I was exactly correct here, and it's actually been both found and demonstrated. Start with this tweet:

Python devs will hate you for it! One weird trick to directly access python's memory from the interpreter: [gist]

There's a brief explanation and then you can read the details of how CPython bytecode can be used to read and write arbitrary memory.

As that article notes, this is not a bug or at least not something the Python developers consider a bug. And for what it's worth, I agree with them. The CPython bytecode interpreter deliberately chooses to gain some extra speed by omitting checks that are only necessary if either something has gone terribly wrong with bytecode generation or you are loading malicious bytecode. LOAD_CONST is a hot path in a very important optimization and there are undoubtedly any number of other issues lurking in the undergrowth here; closing this hole would probably not make loading untrusted CPython bytecode materially safer and it probably would exact a slowdown.

(At a start, if you're even going to consider doing that it's clear that you need to at least audit the CPython bytecode interpreter to try to find other issues. You probably also want a pre-loading bytecode validation pass, too.)

One corollary of this is that bytecode rewriting is potentially dangerous (even if you have good intentions). A sufficiently badly rewritten bytecode sequence may not merely malfunction at the Python level, it's possible that it could crash or corrupt the CPython interpreter.

(On the other hand, if you're rewriting bytecode and running the result in production you probably really need whatever your rewriting enables. Test thoroughly, but if you've got to rewrite bytecode, well, you've got to. At least CPython gives you the freedom if you absolutely need it.)

Comments on this page:

By Ewen McNeill at 2015-10-21 06:10:33:

It appears this means if you can get arbitrary Python execution (eg, unwisely trusting YAML, XML, pickle, etc...), then you can probably get arbitrary memory read/write in the Python process, which is a fairly short step away from arbitrary assembly code execution.

I guess one could take the point of view that it's "not a bug" by itself, but it's at least a very helpful stepping stone in chaining exploits together. At minimum anything exposed to untrusted data probably needs to hide all the Python means of creating arbitrary bytecode... And probably needs other process protection mechanisms in place too.


By cks at 2015-10-22 00:20:51:

The long answer is in BytecodeIsTrustedII, but the short answer is that if you let people give you Python code to run you have already given them the keys to the kingdom. That mere bytecode can be used to access arbitrary memory just shows how hard the job of creating a secure Python environment would be.

Written on 21 October 2015.
« Why I never tell people how I voted
CPython's trust of bytecode is not a security problem »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 21 02:08:07 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.