Wandering Thoughts archives


What's happening when you change True and False in Python 2

Today I read Giedrius Statkevičius' What is Actually True and False in Python? (via), which talks about the history of how True and False aren't fixed constants until Python 3 and thus how you can change them in Python 2. But what does it really mean to do this? So let's dive right in to the details in an interactive Python 2 session.

As seen in Statkevičius' article, reversing True and False is pretty straightforward:

>>> int(True)
>>> True, False = False, True
>>> int(True)

Does this change what boolean comparisons actually return, though?

>>> int((0 == 0) == True)
>>> (0 == 0) == True
>>> (0 == 0) == False
>>> (0 == 0) is False

It doesn't, and this is our first clue to what is going on. We haven't changed the Python interpreter's view of what True and False are, or the actual bool objects that are True and False; we've simply changed what the names True and False refer to. Basically we've done 'fred, barney = False, True' but (re)using names that code expects to have a certain meaning. Our subsequent code is using our redefined True and False names because Python looks up what names mean dynamically, as the code runs, so if you rebind a name that rebinding takes immediate effect.

This is also why the truth values being printed are correct; the bool objects themselves are printing out their truth value, and since that truth value hasn't changed we get the results we expect:

>>> True, False
(False, True)

But what names have we changed?

>>> (0 == 0) is __builtins__.True
>>> True is __builtins__.False
>>> globals()["True"]

This tells us the answer, which is that we've added True and False global variables in our module's namespace by copying False and True values from the global builtins. This means that our redefined True and False are only visible in our own namespace. Code in other modules will be unaffected, as we've only shadowed the builtin names inside our own module.

(An interactive Python session has its own little module-level namespace.)

To see that this is true, we need a tst helper module with a single function:

 def istrue(val):
     if val == True:
        print "Yes"
        print "No"


>>> import tst
>>> tst.istrue(True)
>>> tst.istrue(0 == 0)

But we don't have to restrict ourselves to just our own module. So let's redefine the builtin versions instead, which will have a global effect. First, let's clear out our 'module' versions of those names:

>>> del True; del False

Then redefine them globally:

>>> __builtins__.True, __builtins__.False = (0 == 1), (0 == 0)
>>> (0 == 0) is True

We can verify that these are no longer in our own namespace:

>>> globals()["True"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'True'

We reuse our helper module to show that we've now made a global change:

 >>> tst.istrue(0 == 0)

But of course:

 >>> tst.istrue(True)

Changing __builtins__.True has changed the True that all modules see, unless they deliberately shadow the builtin True with their own module-level True. Unlike before, True now means the same thing in our interactive session and in the tst module.

Since modules are mutable, we can actually fix tst.istrue from the outside:

 >>> tst.True = (0 == 0)
 >>> tst.istrue(0 == 0)
 >>> tst.True

Now the tst module has its own module-global True name with the correct value and tst.istrue works correctly again. However, we're back to a difference in what True means in different modules:

>>> tst.istrue(True)
>>> False is tst.True

(Since our interactive session's 'module' has no name binding for False, it uses the binding in the builtins, which we made point to the True boolean object. However tst has its own name binding for True, which also points to the True boolean object. Hence our False is tst's True. Yes, this gets confusing fast.)

As noted in Statkevičius' article, Python only ever has two bool objects, one True and one False. These objects are immutable (and known by the CPython interpreter), and so we can't change the actual truth value of comparisons, what gets printed by the bool objects, and so on. All we can do is change what the names True and False mean at various levels; in a function (not shown here), for an entire module, or globally through the builtins.

(Technically there's a few more namespaces we could fiddle with.)

As a side note, we can't subclass bool to make a thing that is considered a boolean yet has different behavior. If we try it, CPython 2 tells us:

TypeError: Error when calling the metaclass bases
    type 'bool' is not an acceptable base type

This is an explicitly coded restriction; the C-level bool type doesn't allow itself to be subclassed.

(Technically it's coded by omitting a 'this can be a base type' flag from the C-level type flags for the bool type, but close enough. There are a number of built-in CPython types that can't be subclassed because they omit this flag.)

We can change the True and False names to point to non-bool objects if we want. If you take this far enough, you can arrange to get interesting errors and perhaps spectacular explosions:

>>> __builtins__.False = set("a joke")
>>> (0 != 0) == False
>>> d = {}
>>> d[False] = False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

For maximum fun, arrange for True and False to be objects that are deliberately uncomparable and can't be converted to booleans (in Python 2, this requires raising an error in your __eq__ and __nonzero__ methods).

(I've used False here because many objects in Python 2 are considered to be boolean True. In fact, by default almost all objects are; you have to go out of your way to make something False.)

python/ChangingTrueDetails written at 20:46:42; Add Comment

What ZFS gang blocks are and why they exist

If you read up on ZFS internals, sooner or later you will run across references to 'gang blocks'. For instance, they came up when I talked about what's in a DVA, where DVAs have a flag to say that they point to a gang block instead of a regular block. Gang blocks are vaguely described as being a way of fragmenting a large logical block into a bunch of separate sub-blocks.

A more on-point description can be found in the (draft) ZFS on-disk specification (PDF, via) or the source code comments about them in zio.c. I'll selectively quote from zio.c because it's easier to follow:

A gang block is a collection of small blocks that looks to the DMU like one large block. When zio_dva_allocate() cannot find a block of the requested size, due to either severe fragmentation or the pool being nearly full, it calls zio_write_gang_block() to construct the block from smaller fragments.

A gang block consists of a gang header and up to three gang members. The gang header is just like an indirect block: it's an array of block pointers. It consumes only one sector and hence is allocatable regardless of fragmentation. The gang header's bps point to its gang members, which hold the data.


Gang blocks can be nested: a gang member may itself be a gang block. Thus every gang block is a tree in which root and all interior nodes are gang headers, and the leaves are normal blocks that contain user data. The root of the gang tree is called the gang leader.

A 'gang header' contains three full block pointers, some padding, and then a trailing checksum. The whole thing is sized so that it takes up only a single 512-byte sector; I believe this means that gang headers in ashift=12 vdevs waste a bunch of space, or at least leave the remaining 3.5 Kb unused.

To understand more about gang blocks, we need to understand why they're needed. As far as I know, this comes down to the fact that ZFS files only ever have a single (logical) block size. If a file is less than the recordsize (usually 128 Kb), it's in a single logical block of the appropriate power of two size; once it hits recordsize or greater, it's in a number of recordsize'd blocks. This means that writing new data to most files normally requires allocating some size of contiguous block (up to 128 Kb, but less if the data you're writing is compressible).

(I believe that there is also metadata that's always unfragmented and may be in blocks up to 128 Kb.)

However, ZFS doesn't guarantee that a pool always has free 128 Kb chunks available, or in fact any particular size of chunk. Instead, free space can be fragmented; you might be unfortunate enough to have many gigabytes of free space, but all of it in fragments that were, say, 32 Kb and smaller. This is where ZFS needs to resort to gang blocks, basically in order to lie to itself about still writing single large blocks.

(Before I get too snarky, I should say that this lie probably simplifies the life of higher level code a fair bit. Rather than have a whole bunch of data and metadata handling code that has to deal with all sorts of fragmentation, most of ZFS can ignore the issue and then lower level IO code quietly makes it all work. Actually using gang blocks should be uncommon.)

All of this explains why the gang block bit is a property of the DVA, not of anything else. The DVA is where space gets allocated, so the DVA is where you may need to shim in a gang block instead of getting a contiguous chunk of space. Since different vdevs generally have different levels of fragmentation, whether or not you have a contiguous chunk of the necessary size will often vary from vdev to vdev, which is the DVA level again.

One quiet complication created by gang blocks is that according to comments in the source code, the gang members may not wind up on the same vdev as the gang header (although ZFS tries to keep them on the same vdev because it makes life easier). This is different from regular blocks, which are always only on a single vdev (although they may be spread across multiple disks if they're on a raidz vdev).

Gang blocks have some space overhead compared to regular blocks (in addition to being more fragmented on disk), but how much is quite dependent on the situation. Because each gang header can only point to three gang member blocks, you may wind up needing multiple levels of nested gang blocks if you have an unlucky combination of fragmented free space and a large block to write. As an example, suppose that you need to write a 128 Kb block and the pool only has 32 Kb chunks free. 128 Kb requires four 32 Kb chunks, which is more than a single gang header can point to, so you need a nested gang block; your overhead is two sectors for the two gang headers needed. If the pool was more heavily fragmented, you'd need more nested gang blocks and the overhead would go up. If the pool had a single 64 Kb chunk left, you could have written the 128 Kb with two 32 Kb chunks and the 64 Kb chunk and thus not needed the nested gang block with its additional gang header.

(Because ZFS only uses a gang block when the space required isn't available in a contiguous block, gang blocks are absolutely sure to be scattered on the disk.)

PS: As far as I can see, a pool doesn't keep any statistics on how many times gang blocks have been necessary or how many there currently are in the pool.

solaris/ZFSGangBlocks written at 02:55:39; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.