2018-01-06
What's happening when you change True
and False
in Python 2
Today I read Giedrius Statkevičius' What is Actually True and
False in Python?
(via),
which talks about the history of how True
and False
aren't fixed
constants until Python 3 and thus how you can change them in Python 2.
But what does it really mean to do this? So let's dive right in to the
details in an interactive Python 2 session.
As seen in Statkevičius' article, reversing True
and False
is
pretty straightforward:
>>> int(True) 1 >>> True, False = False, True >>> int(True) 0
Does this change what boolean comparisons actually return, though?
>>> int((0 == 0) == True) 0 >>> (0 == 0) == True False >>> (0 == 0) == False True >>> (0 == 0) is False True
It doesn't, and this is our first clue to what is going on. We
haven't changed the Python interpreter's view of what True and False
are, or the actual bool
objects that are True and False; we've
simply changed what the names True
and False
refer to. Basically
we've done 'fred, barney = False, True
' but (re)using names that
code expects to have a certain meaning. Our subsequent code is using
our redefined True
and False
names because Python looks up what
names mean dynamically, as the code runs, so if you rebind a name
that rebinding takes immediate effect.
This is also why the truth values being printed are correct; the
bool
objects themselves are printing out their truth value, and
since that truth value hasn't changed we get the results we expect:
>>> True, False (False, True)
But what names have we changed?
>>> (0 == 0) is __builtins__.True True >>> True is __builtins__.False True >>> globals()["True"] False
This tells us the answer, which is that we've added True
and
False
global variables in our module's namespace by copying
False
and True
values from the global builtins. This means
that our redefined True
and False
are only visible in our own
namespace. Code in other modules will be unaffected, as we've only
shadowed the builtin names inside our own module.
(An interactive Python session has its own little module-level namespace.)
To see that this is true, we need a tst
helper module with a
single function:
def istrue(val): if val == True: print "Yes" else: print "No"
Then:
>>> import tst >>> tst.istrue(True) No >>> tst.istrue(0 == 0) Yes
But we don't have to restrict ourselves to just our own module. So let's redefine the builtin versions instead, which will have a global effect. First, let's clear out our 'module' versions of those names:
>>> del True; del False
Then redefine them globally:
>>> __builtins__.True, __builtins__.False = (0 == 1), (0 == 0) >>> (0 == 0) is True False
We can verify that these are no longer in our own namespace:
>>> globals()["True"] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'True'
We reuse our helper module to show that we've now made a global change:
>>> tst.istrue(0 == 0) No
But of course:
>>> tst.istrue(True) Yes
Changing __builtins__.True
has changed the True
that all
modules see, unless they deliberately shadow the builtin True
with their own module-level True
. Unlike before, True
now
means the same thing in our interactive session and in the tst
module.
Since modules are mutable, we can actually fix tst.istrue
from
the outside:
>>> tst.True = (0 == 0) >>> tst.istrue(0 == 0) Yes >>> tst.True True
Now the tst
module has its own module-global True
name
with the correct value and tst.istrue
works correctly again.
However, we're back to a difference in what True
means in
different modules:
>>> tst.istrue(True) No >>> False is tst.True True
(Since our interactive session's 'module' has no name binding for
False
, it uses the binding in the builtins, which we made point
to the True boolean object. However tst
has its own name binding
for True
, which also points to the True boolean object. Hence
our False
is tst
's True
. Yes, this gets confusing fast.)
As noted in Statkevičius' article, Python only ever has two bool
objects, one True and one False. These objects are immutable (and
known by the CPython interpreter), and so we can't change the actual
truth value of comparisons, what gets printed by the bool
objects,
and so on. All we can do is change what the names True
and False
mean at various levels; in a function (not shown here), for an
entire module, or globally through the builtins.
(Technically there's a few more namespaces we could fiddle with.)
As a side note, we can't subclass bool
to make a thing that is
considered a boolean yet has different behavior. If we try it,
CPython 2 tells us:
TypeError: Error when calling the metaclass bases type 'bool' is not an acceptable base type
This is an explicitly coded restriction; the C-level bool
type
doesn't allow itself to be subclassed.
(Technically it's coded by omitting a 'this can be a base
type' flag from the C-level type flags for the bool
type, but
close enough. There are a number of built-in CPython types that
can't be subclassed because they omit this flag.)
We can change the True
and False
names to point to non-bool
objects if we want. If you take this far enough, you can arrange
to get interesting errors and perhaps spectacular explosions:
>>> __builtins__.False = set("a joke") >>> (0 != 0) == False False >>> d = {} >>> d[False] = False Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'set'
For maximum fun, arrange for True
and False
to be objects that
are deliberately uncomparable and can't be converted to booleans
(in Python 2, this requires raising an error in your __eq__
and
__nonzero__
methods).
(I've used False
here because many objects in Python 2 are
considered to be boolean True. In fact, by default almost all
objects are; you have to go out of your way to make something
False.)
What ZFS gang blocks are and why they exist
If you read up on ZFS internals, sooner or later you will run across references to 'gang blocks'. For instance, they came up when I talked about what's in a DVA, where DVAs have a flag to say that they point to a gang block instead of a regular block. Gang blocks are vaguely described as being a way of fragmenting a large logical block into a bunch of separate sub-blocks.
A more on-point description can be found in the (draft) ZFS on-disk specification (PDF, via) or the source code comments about them in zio.c. I'll selectively quote from zio.c because it's easier to follow:
A gang block is a collection of small blocks that looks to the DMU like one large block. When zio_dva_allocate() cannot find a block of the requested size, due to either severe fragmentation or the pool being nearly full, it calls zio_write_gang_block() to construct the block from smaller fragments.
A gang block consists of a gang header and up to three gang members. The gang header is just like an indirect block: it's an array of block pointers. It consumes only one sector and hence is allocatable regardless of fragmentation. The gang header's bps point to its gang members, which hold the data.
[...]
Gang blocks can be nested: a gang member may itself be a gang block. Thus every gang block is a tree in which root and all interior nodes are gang headers, and the leaves are normal blocks that contain user data. The root of the gang tree is called the gang leader.
A 'gang header' contains three full block pointers, some padding, and then a trailing checksum. The whole thing is sized so that it takes up only a single 512-byte sector; I believe this means that gang headers in ashift=12 vdevs waste a bunch of space, or at least leave the remaining 3.5 Kb unused.
To understand more about gang blocks, we need to understand why
they're needed. As far as I know, this comes down to the fact that
ZFS files only ever have a single (logical) block size. If a file is less than the recordsize
(usually 128 Kb), it's in a single logical block of the appropriate
power of two size; once it hits recordsize
or greater, it's in a
number of recordsize
'd blocks. This means that writing new data
to most files normally requires allocating some size of contiguous
block (up to 128 Kb, but less if the data you're writing is
compressible).
(I believe that there is also metadata that's always unfragmented and may be in blocks up to 128 Kb.)
However, ZFS doesn't guarantee that a pool always has free 128 Kb chunks available, or in fact any particular size of chunk. Instead, free space can be fragmented; you might be unfortunate enough to have many gigabytes of free space, but all of it in fragments that were, say, 32 Kb and smaller. This is where ZFS needs to resort to gang blocks, basically in order to lie to itself about still writing single large blocks.
(Before I get too snarky, I should say that this lie probably simplifies the life of higher level code a fair bit. Rather than have a whole bunch of data and metadata handling code that has to deal with all sorts of fragmentation, most of ZFS can ignore the issue and then lower level IO code quietly makes it all work. Actually using gang blocks should be uncommon.)
All of this explains why the gang block bit is a property of the DVA, not of anything else. The DVA is where space gets allocated, so the DVA is where you may need to shim in a gang block instead of getting a contiguous chunk of space. Since different vdevs generally have different levels of fragmentation, whether or not you have a contiguous chunk of the necessary size will often vary from vdev to vdev, which is the DVA level again.
One quiet complication created by gang blocks is that according to comments in the source code, the gang members may not wind up on the same vdev as the gang header (although ZFS tries to keep them on the same vdev because it makes life easier). This is different from regular blocks, which are always only on a single vdev (although they may be spread across multiple disks if they're on a raidz vdev).
Gang blocks have some space overhead compared to regular blocks (in addition to being more fragmented on disk), but how much is quite dependent on the situation. Because each gang header can only point to three gang member blocks, you may wind up needing multiple levels of nested gang blocks if you have an unlucky combination of fragmented free space and a large block to write. As an example, suppose that you need to write a 128 Kb block and the pool only has 32 Kb chunks free. 128 Kb requires four 32 Kb chunks, which is more than a single gang header can point to, so you need a nested gang block; your overhead is two sectors for the two gang headers needed. If the pool was more heavily fragmented, you'd need more nested gang blocks and the overhead would go up. If the pool had a single 64 Kb chunk left, you could have written the 128 Kb with two 32 Kb chunks and the 64 Kb chunk and thus not needed the nested gang block with its additional gang header.
(Because ZFS only uses a gang block when the space required isn't available in a contiguous block, gang blocks are absolutely sure to be scattered on the disk.)
PS: As far as I can see, a pool doesn't keep any statistics on how many times gang blocks have been necessary or how many there currently are in the pool.