2018-07-15
When I'll probably be able to use Python assignment expressions
The big recent Python news is that assignment expressions have been accepted for Python 3.8. This was apparently so contentious and charged a process that in its wake Guido van Rossum has stepped down as Python's BDFL. I don't have any strong feelings on assignment expressions for reasons beyond the scope of this entry, but today I want to think about how soon I could possibly use them in my Python code, and then how soon I could safely use them (ie how soon they will be everywhere I care about). The answers to out to be surprising, at least to me (it's probably not to experienced Python hands).
The nominal Python 3.8 release schedule is set out in PEP 569. According to it, Python 3.8 is planned to be released in October of 2019; however, there's some signs that the Python people want to move faster on this (see this LWN article). If Python sticks to the original timing, Python 3.8 might make Ubuntu 20.04 LTS (released in April 2020 but frozen before then) and would probably make the next Fedora release if Fedora keeps to their current schedule and does a release in May of 2020. So at this point it looks like the earliest I'd be able to use assignment expressions is in about two years. If Python moves up the 3.8 release schedule significantly, it might make one Fedora release earlier (the fall 2019 release), making that about a year and a half before I could think about it.
There are many versions of 'can safely use' for me, but I'll pick the one for work. There 'safely use' means that they're supported by the oldest Ubuntu LTS release I need to run the Python code on. We're deploying long-lived Ubuntu 18.04 machines now that will only be updated starting in 2022, so if Python 3.8 makes Ubuntu 20.04 that will be when I can probably start thinking about it, because everything will be 2020 or later. That's actually a pretty short time to safe use as these things go, but that's a coincidence due to the release timing of Python 3.8 and Ubuntu LTS versions. If Python 3.8 misses Ubuntu 20.04 LTS, I'd have to wait another two years (to 2024) unless I only cared about running my code on Ubuntu 22.04.
Of course, I'm projecting things four to six years into the future and that's dangerous at the best of times. We've already seen that Python may change its release timing, and who knows about both Ubuntu and Fedora.
(It seems a reasonably safe guess that I'll still be using Fedora on my desktops over that time period, and pretty safe that we'll still be using Ubuntu LTS at work, but things could happen there too.)
The reason that all of this was surprising to me was that I assumed Python 3.8 was further along in its development if controversial and much argued over change proposals were getting accepted for it. I guess the arguments started well before Python 3.7 was released, which makes sense given the 3.7 release schedule; 3.7 was frozen at the end of January, so everyone could start arguing about 3.8 no later than then.
(The official PEP has an initial date of the end of February, but I've heard it was in development and being discussed before then, just not formalized yet as a PEP.)
PS: If Debian keeps to their usual release schedule, it looks like Python 3.8 on its original schedule would miss the next Debian stable version (Debian 10). It would probably miss it even on an aggressive release schedule that saw Python 3.8 come out only a year after 3.7, since 3.7 was released only a few weeks ago.
Understanding ZFS System Attributes
Like most filesystems, ZFS faces the file attribute problem. It has a bunch of file attributes, both visible ones like the permission mode and the owner and internal ones like the parent directory of things and file generation number, and it needs to store them somehow. But rather than using fixed on-disk structures like everyone else, ZFS has come up with a novel storage scheme for them, one that simultaneously deals with both different types of ZFS dnodes wanting different sets of attributes and the need to evolve attributes over time. In the grand tradition of computer science, ZFS does it with an extra level of indirection.
Like most filesystems, ZFS puts these attributes in dnodes using
some extra space (in what is called the dnode 'bonus buffer').
However, the ZFS trick is that whatever system attributes a dnode
has are simply packed into that space without being organized into
formal structures with a fixed order of attributes. Code that uses
system attributes retrieves them from dnodes indirectly by asking
for, say, the ZPL_PARENT
of a dnode; it never cares exactly how
they're packed into a given dnode. However, obviously something
does.
One way to implement this would be some sort of tagged storage, where each attribute in the dnode was actually a key/value pair. However, this would require space for all of those keys, so ZFS is more clever. ZFS observes that in practice there are only a relatively small number of different sets of attributes that are ever stored together in dnodes, so it simply numbers each distinct attribute layout that ever gets used in the dataset, and then the dnode just stores the layout number along with the attribute values (in their defined order). As far as I can tell from the code, you don't have to pre-register all of these attribute layouts. Instead, the code simply sets attributes on dnodes in memory, then when it comes time to write out the dnode in its on-disk format ZFS checks to see if the set of attributes matches a known layout or if a new attribute layout needs to be set up and registered.
(There are provisions to handle the case where the attributes on a dnode in memory don't all fit into the space available in the dnode; they overflow to a special spill block. Spill blocks have their own attribute layouts.)
I'm summarizing things a bit here; you can read all of the details and more in a big comment at the start of sa.c.
As someone who appreciates neat solutions to thorny problems, I quite admire what ZFS has done here. There is a cost to the level of indirection that ZFS imposes, but once you accept that cost you get a bunch of clever bonuses. For instance, ZFS uses dnodes for all sorts of internal pool and dataset metadata, and these dnodes often don't have any use for conventional Unix file attributes like permissions, owner, and so on. With system attributes, these metadata dnodes simply don't have those attributes and don't waste any space on them (and they can use the same space for other attributes that may be more relevant). ZFS has also been able to relatively freely add attributes over time.
By the way, this scheme is not quite the original scheme that ZFS used. The original scheme apparently had things more hard-coded, but I haven't dug into it in detail since this has been the current scheme for quite a while. Which scheme is in use depends on the ZFS pool and filesystem versions; modern system attributes require ZFS pool version 24 or later and ZFS filesystem version 5 or later. You probably have these, as they were added to (Open)Solaris in 2010.