Wandering Thoughts archives

2013-11-25

From CPython bytecode up to function objects (in brief)

Python bytecode is the low level heart of (C)Python; it's what the CPython interpreter actually processes in order to run your Python code. The dis module is the heart of information on examining bytecode and on the bytecodes themselves. But CPython doesn't just run bytecode in isolation. In practice bytecode is always part of some other object, partly because bytecode by itself is not self-contained; it relies on various other things for context.

Bytecode by itself looks like this:

>>> fred.func_code.co_code
'|\x00\x00G|\x01\x00GHd\x00\x00S'

(That's authentic bytecode; you can feed it to dis.dis() to see what it means in isolation.)

I believe that Python bytecode is always found embedded in a code object. Code objects have two sorts of additional attributes; attributes which provide the necessary surrounding context that the bytecode itself needs, and attributes that just have information about the code that's useful for debugging. Examples of context attributes are co_consts, a tuple of constants used in the bytecode, and co_nlocals, the number of local variables that the code uses. Examples of information attributes are co_filename, co_firstlineno, and even co_varnames (which tells you what local variable N is called). Note that the context attributes are absolutely essential; bytecode is not self-contained and cannot be run in isolation without them. Many bytecodes simply do things like, say 'load constant 0'; if you don't know what constant 0 is, you're not going to get far with the bytecode. It is the code object that tells you this necessary stuff.

Most code objects are embedded in function objects (as the func_code attribute). Function objects supply some additional context attributes that are specific to using a piece of code as a function, as well as another collection of information about the function (most prominently func_doc, the function's docstring if any). As it happens, all of the special function attributes are documented reasonably well in the official Python data model, along with code objects and much more.

(Because I just looked it up, the mysterious func_dict property is another name for a function's __dict__ attribute, which is used to allow you to add arbitrary properties to a function. See PEP 232. Note that functions don't actually have a dictionary object attached to func_dict until you look at it or otherwise need it.)

Function objects themselves are frequently found embedded in instance method objects, which are used for methods on classes (whether bound to an object that's an instance of the class or unbound). But that's as far up the stack as I want to go today and anyways, instance method objects only have three attributes and they're all pretty obvious.

(If you have a class A with a method function fred, A.fred is actually an (unbound) instance method object. The fred function itself is A.fred.im_func, or if you want, A.__dict__["fred"].)

Note that not all code objects are embedded in function objects. For example, if you call compile() what you get back is a bare code object. I suspect that module level code winds up as a code object before getting run by the interpreter, but I haven't looked at the interpreter source to see so don't quote me on that.

(This entry was inspired by reading this introduction to the CPython interpreter (via Hacker News), which goes at things from the other direction.)

python/BytecodeToFunctions written at 23:11:13; Add Comment

Track your disk failures

Here is something that we've been learning the hard way: if you have any sort of fileserver environment with a significant number of disks (and maybe even if you don't), you should be tracking all of your disk failures. What this tracking is for is identifying failure patterns in your environment, things like whether certain sorts of disks fail more often, or disks in certain enclosures, and so on.

The very basic information you should record is full details for every disk failure. What I'd record today is when it happened, what sort of disk it was, what enclosure and bay it failed in, and how it failed (read errors, write errors, total death, IO got really slow, or however it happened). You might also want to track SMART attributes and note if you got any sort of SMART notices beforehand (in the extreme, you'd track SMART notices too). You might also be able to record how old the disk was (based on warranty status and perhaps date of manufacture information). This doesn't need any sort of complicated database system, a text file is fine, but you should record the main information in a way that it can be extracted with grep and awk.

(If you have external disk enclosures, keeping such a log may also raise the issue of consistent identification for them. Locally we have swapped some enclosures around when various things happen, which at the very least means you're going to want to note in the log that 'host X had its enclosure swapped here'.)

Once you have the core information logged you should also keep track of some aggregated failure information (instead of just having people to generate it on demand from the log). I would track at least failures by disk type and failures by enclosure, because these are the two things that are most likely to be correlated (ie, where one sort of disk is bad or one enclosure has a problem you may have overlooked). Update this aggregated information any time you add something to the log, either by hand or by auto-generating the aggregated stats from the log.

(This may sound obvious to some people but trust me, it's an easy thing to overlook or just not think about when you're starting out on a grand fileserver adventure.)

sysadmin/TrackYourDiskFailures written at 00:34:42; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.