ZFS on Linux's development version now has much better pool recovery for damaged pools
Back in March, I wrote about how much better ZFS pool recovery was coming, along with what turned out to be some additional exciting features, such as the long-awaited feature of shrinking ZFS pools by removing vdevs. The good news for people using ZFS on Linux is that most of both features have very recently made it into the ZFS on Linux development source tree. This is especially relevant and important if you have a damaged ZFS on Linux pool that either doesn't import or panics your system when you do import it.
These changes aren't yet in any ZFS on Linux release and I suspect that they won't appear until 0.8.0 is released someday (ie, they won't be ported into the current 0.7.x release branch). However, it's fairly easy to build ZFS on Linux from source if you need to temporarily run the latest version in order to recover or copy data out of a damaged pool that you can't otherwise get at. I believe that some pool recovery can be done as a one-time import and then you can revert back to a released version of ZFS on Linux to use the now-recovered pool, but certainly not all pool import problems can be repaired like this.
(As far as vdev removal goes, it currently requires permanently
using a version of ZFS that supports it, because it adds a
device_removal feature to your pool that will never deactivate,
This may change at some point in the future, but I wouldn't hold
my breath. It seems miraculous enough that we've gotten vdev removal
after all of these years, even if it's only for single devices and
I haven't tried out either of these features, but I am running a recently built development version of ZFS on Linux with them included and nothing has exploded so far. As far as things go in general, ZFS on Linux has a fairly large test suite and these changes added tests along with their code. And of course they've been tested upstream and OmniOS CE had enough confidence in them to incorporate them.
Sorting out some of my current views on operator overloading in general
Operator overloading is a somewhat controversial topic in programming
language design and programming language comparisons. To somewhat
stereotype both sides, one side thinks that it's too often abused to
create sharp-edged surprises where familiar operators do completely
surprising things (such as
<< in C++ IO). The other side thinks that
it's a tool that can be used to create powerful advantages when done
well, and that its potential abuses shouldn't cause us to throw it out
In general, I think that operator overloading can be used for at least three things:
- implementing the familiar arithmetic operations on additional types
of numbers or very strongly number-like things, where the new
implementations respect the traditional arithmetic properties of
the operators; for example
- implementing these operations on things which already use these
operators in their written notation, even if how the operators
are used doesn't (fully) preserve their usual principles. Matrix
multiplication is not commutative, for example, but I don't
think many people would argue against using
*for it in a programming language.
- using these operators simply for convenient, compact notation in ways that have nothing to do with arithmetic, mathematical notation, or their customary uses in written form for the type of thing you're dealing with.
I don't think anyone disagrees with the use of operator overloading
for the first case. I suspect that there is some but not much
disagreement over the second case. It's the third case that I think
people are likely to be strongly divided over, because it's by far
the most confusing one. As an outside reader of the code, even
once you know the types of objects involved, you don't know anything
about what's actually happening; you have to read the definition
of what that type does with that operator. This is the 'say what?'
<< in C++ IO and
% with Python strings.
Languages are partly a cultural thing, not purely a technical one, and operator overloading (in its various sorts) can be a better or a worse fit for different languages. Operator overloading probably would clash badly with Go's culture, for example, even if you could find a good way to add it to the language (and I'm not sure you could without transforming Go into something relatively different).
(Designing operator overloading into your language pushes its culture in one direction but doesn't necessarily dictate where you wind up in the end. And there are design decisions that you can make here that will influence the culture, for example requiring people to define all of the arithmetic operators if they define any of them.)
Since I'm a strong believer in both the pragmatic effects and aesthetic power of syntax, I believe that even operator overloading purely to create convenient notation for something can be a good use of operator overloading in the right circumstances and given the right language culture. Generally the right circumstances are going to be when the operator you're overloading has some link to what the operation is doing. I admit that I'm biased here, because I've used the third sort of operator overloading from time to time in Python and I think it made my code easier to read, at least for me (and it certainly made it more compact).
(For example, I once implemented '
-' for objects that were
collections of statistics, most (but not all) of them time-dependent.
Subtracting one object from another gave you an object that had the
delta from one to the other, which I then processed to print
In thinking about this now, one thing that strikes me is that an advantage of operators over function calls is that operators tend to be written with whitespace, whereas function calls often run everything together in a hard to read blur. We know that whitespace helps readability, so if we're going to lean heavily on function calls in a language (including in the form of method calls), perhaps we should explore ways of adding whitespace to them. But I'm not sure whitespace alone is quite enough, since operators are also distinct from letters.
(I believe this is where a number of functional languages poke their heads up.)
Notice to web spiders: an email address in your user-agent isn't good enough
Every so often I turn over a rock here at Wandering Thoughts by looking at what IP addresses are making a lot of requests.
Most of the time that's Bing's bot, but
every so often something else floats to the top of the list, and
generally it's not something that leaves a favorable impression.
Today's case was clearly a web spider, from IP address 126.96.36.199
(which currently resolves to 'getzonefile.commedia.io') and with
"Mozilla/5.0 (compatible; Go-http-client/1.1; +firstname.lastname@example.org)"
This has caused me to create a new rule for web spiders: just having an email address in your User-Agent is not good enough, and in fact will almost certainly cause me to block that spider on contact.
What the User-Agent of a web spider is supposed to include is a website URL where I can read about what your web spider is and what benefit I get from allowing it to crawl Wandering Thoughts. Including an email address does not provide me with this information, and it doesn't even provide me with a meaningful way of reporting problems or complaining about your web spider, because in today's spam-laden Internet environment the odds that I'm going to send email to some random address is zero (especially to complain about something that it is nominally doing).
Of course, it turns out that this is not the only such User-Agent that I've seen (and blocked). Other ones that have shown up in recent logs are:
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36; +email@example.com"
"Mozilla/5.0 (compatible; um-LN/1.0; mailto: firstname.lastname@example.org)"
The MauiBot crawler is apparently reasonably well complained-about. I haven't found any particular mentions of the 'infegy.com' one from casual searches, but it's probably real (in one sense) given infegy.com's website.
(I also found one feed fetcher that appears to be pulling my feed with a User-Agent that lists an email address and a program name of 'arss/testing', but I've opted not to block it for now or mention the email address. If its author is reading this, you need a URL in there too.)
I'm not sure what web spider authors are thinking when they set their User-Agents up this way, and frankly I don't care (just as I don't care whether these email addresses are genuine and functional, or simply made up and bogus). On the one hand they are admitting that this is a web spider at work, but on the other hand they're fumbling at informing web server operators about their spiders.
PS: I'm aware that blocking web spiders this way is a quixotic and never-ending quest. There are a ton of nasty things out there, even among the ones that more or less advertise themselves. But sometimes I do these things anyway, because once I've turned over a rock I'm not good at looking away.
Python modules use operator overloading in two different ways
In Python (as in elsewhere), there are at least two different things that people use operator overloading for. That there's more than one thing makes a difference because some patterns of designing how operator overload work aren't sufficiently general to handle both things; if you want to serve both groups, you need to design a more general mechanism than you might expect, one that delegates more power to objects.
The first use of operator overloading is to extend operators so that they work (in the traditional ways) on objects that they wouldn't normally work on. The classical examples of this is complex numbers and rational numbers (both of which Python has in the standard library), and in general various sorts of things built with numbers and numeric representations. However you can go beyond this, to objects that aren't strictly numeric but which can use at least some of the the traditional numeric operators in ways that still obey the usual rules of arithmetic and make sense. Python sets implement some numeric operations in ways that continue to make sense and are unsurprising.
The second use is to simply hijack the operations in order to do
something convenient for your objects with a handy symbol for it.
Sometimes these operations are vaguely related to their numeric
equivalents (such as string multiplication, where
"a" * 4 gets
"aaaa"), but sometimes they have nothing to do with it. The
classic example of the latter is the string
% operator, which has
nothing at all to do with arithmetic but instead formats a string
% formatting codes. Using the
% operator for this is
certainly convenient and it has a certain mnemonic value and neatness
factor, but it definitely has nothing to do with
%'s normal use
Now, let us consider the case of Python not allowing you to overload boolean AND and OR. In a comment on that entry, Aneurin Price said:
I'm not at all convinced by this argument. My expectation for this hypothetical
__band__is that it would be called after evaluating a and finding it truthy, at which point b is evaluated either way. [...]
This is definitely true if you think of operator overloading as only for the first case. But, unfortunately for the design of overloading AND and OR, this is not all that people would like to use it for. My understanding is that ORMs such as Django's and SQLAlchemy would like to intercept AND and OR in order to build up complicated conditional SQL queries with, essentially, a DSL based on Python expressions. In this DSL, they would like to be able to write something like:
Q.descfield.startswith("Who") or Q.descfield.startswith("What")
This wouldn't evaluate or produce any sort of truth value; instead it
would produce an object representing a pending SQL query with a
clause that encoded this OR condition. Later you'd execute the SQL query
to produce the actual results.
If operator overloading for AND and OR paid any attention to the nominal truth value of the left expression, there is no way to make this work. Instead, allowing general overloading of AND and OR requires allowing the left side expression to hijack the process before then. In general, operator overloading that allows for this sort of usage needs to allow for this sort of early hijacking; fortunately this is generally easy for arithmetic operators.
(I'm not sure Python has truly general support for mixing unusual numerical types together, but then such general support is probably very hard to implement. I think you want to be able to express a compatibility table, where each type can say that its overloads handle certain other types or types that have certain properties or something. Otherwise getting your rational number type to interact well with my Point type gets really complicated really fast, if not impossible.)
Yubico fails to care that people give you email addresses for specific purposes
A while back, Yubico had a little security issue that forced it to replace any number of Yubikey 4s, including mine. In order to do this, they required people to give them an email address so they could send you some necessary information; following my usual practice I gave them a tagged, individualized address. Today I received email to that address, received from the server of a domain called 'mktomail.com', that started out like this:
Subject: Passwordless authentication is here
Yubico scales across enterprise
Passwords are out. You're in!
The passswordless evolution of the FIDO U2F standard has arrived with FIDO2. [... marketing materials removed with prejudice ...]
You are receiving this email because you made a Yubico purchase or contacted Yubico.
I'm sorry, that's not how this works. In the normal course of events, people do not give you email addresses to do with as you will; people give you email addresses for specific purposes. In this case, I gave Yubico an email address to get a defective product fixed, but one might report a bug, contact product support, or perform other limited interactions with the company. These specific and limited purposes do not include 'receive unsolicited commercial marketing emails'.
Of course, the marketing department does not want to hear this. The marketing department wants to use every plausible address it can get its hands on. People these days vaguely get that you usually cannot buy addresses from other people without getting badly burned, but they keep thinking that other addresses are fair game, regardless of the purpose for which they were originally handed to the company.
Some of the time, the company supports the marketing department, as it did at Yubico, and these addresses get used outside of the purpose they were given to the company. At that point the company betrays the trust of the people who handed over their email addresses in good faith and pisses off some number of people who have interacted with the company in the past, some of which have actually bought their products. The results are predictable, as is the resulting form-letter evasion.
(When enough companies do this sort of thing for long enough, you get things like the EU's GDPR, which will likely make this conduct illegal. Sadly it is probably not illegal under Canada's anti-spam legislation, and anyway I expect Yubico to ignore the GDPR issues until they or someone else visible gets slapped with a nice fine for this sort of thing.)
Sadly I have no idea what is a viable alternative to Yubikeys, but at least we're not likely to buy any more any time soon.
How we're going to be doing custom NFS mount authorization on Linux
We have a long standing system of custom NFS mount authorization on our current OmniOS-based fileservers. This system has been working reliably for years, but our next generation of fileservers will use a different OS, almost certainly Linux, and our current approach doesn't work on Linux, so we had to develop a new one.
One of the big attributes of our current system is that it doesn't require the clients to do anything special; they do NFS mount requests or NFS activity, and provided that their SSH daemon is running, they get automatically checked and authorized. This is important to making the system completely reliable, which is very important if we're going to use it for our own machines (which are absolutely dependent on NFS working). However, the goals of our NFS authorization have shifted so that we no longer require this for our own machines. In light of that, we decided to adopt a more straightforward approach on Linux, one that requires client machines to explicitly do a manual step on boot before they could get NFS access.
The overall 'authorization' system works via firewall rules, where
only machines in a particular ipset table
can talk to the NFS ports on the fileserver. Control over actual
NFS mounts and NFS level access is still done through
and so on, but you have to be in the ipset table in order to even
get that far. To get authorized, ie to get added to the ipset table,
your client machine makes a connection to a specific TCP port on
the fileserver. This ends up causing a Go program to make a
connection to the SSH server on the client machine and verify its
host key against a
known_hosts file that we maintain; if the key verifies, we add
the client's IP address to the ipset table, and if it fails to
verify, we explicitly remove the client's IP address from the table.
(This connection can be done as simply as '
nc FILESERVER PORT
</dev/null >/dev/null'. In practice clients may want to record the
output from the port, because we spit out status messages, including
potentially important ones about why a machine failed verification.
We syslog them too, but those syslog logs aren't accessible to other
This Go program can actually check and handle multiple IP addresses at once (doing so in parallel). In this mode, it runs from cron every few minutes to re-verify all of the currently authorized hosts. The program is sufficiently fast that it can complete this full re-verification in under a second (and with negligible resource usage); in practice, the speed limit is how long of a timeout we use to wait for machines to respond.
To handle fileserver reboots, verified IPs are persistently recorded by touching a file (with the name of their IP address) in a magic directory. On boot and on re-verification, we merge all of the IPs from this directory with the IPs from the ipset table and verify them all. Any IPs that pass verification but aren't in the ipset table are added back to the table (and any IPs in the ipset table but not recorded on disk are persisted to disk), which means that on boot all IPs will be re-added to the ipset table without the client having to do anything.
Clients theoretically don't have to do anything once they've booted and been authorized, but because things can always go wrong we're going to recommend that they re-poke the magic TCP port every so often from cron, perhaps every five or ten minutes. That will insure that any NFS outage should have a limited duration and thus hopefully a limited impact.
(In theory the parallel Go checker is so fast that we could just
extract all of the client IPs from our
known_hosts and always
try to verify them, say, once every fifteen minutes. In practice I
think we're unlikely to do this because there are various potential
issues and it's probably unlikely to help much in practice.)
We're probably going to provide people with a little Python program
that automatically does the client side of the verification for all
current NFS mounts and all mounts in
/etc/fstab, and then logs
the results and so on. This seems more friendly than asking all of
the people involved to write
their own set of scripts or commands for this.
PS: Our own machines on trusted subnets are handled by just having a blanket allow rule in the firewall for those subnets. You only have to be in the ipset table if you're not on one of those subnets.
One reason why Python doesn't let you overload the boolean AND and OR operations
Recently I read Kurt Rose's DISappearing and (via
Planet Python), where Kurt noted that Python doesn't have
methods that let you override boolean
or operations on
your class objects. As it happens, there's a really good reason for
this, which is that Python would require a new fundamental data type
in order to make it really work.
or have the extremely valuable property of
short-circuiting evaluation, where if you write, say, '
a() evaluates to false, Python will not even call
Let's imagine a hypothetical world in which Python allows you to
do this overriding and the boolean operators still preserve this
short circuiting. As usual, if you write '
a and b', this will (at
least some of the time) translate into a call to the override method
a, let's call it
__band__, and the
__band__ method will
receive an additional argument that represents the right hand side:
class AClass: def __band__(self, right): ....
Now here is the big question: what's the type of
right in this
right is the value we get from evaluating
the right hand side expression; if you write '
a & b()', this is
roughly the same as
a.__and__(b()). However this can't be the
__band__, because that would mean no more short-circuiting;
a had a
__band__ method, writing
a and b() would call
b() all of the time. To preserve short-circuiting,
to be some type that represents the right hand side expression in
an un-evaluated form.
However, Python has no such type today. Closures sort of come close,
but they create additional effects and do things like appear in Python
exception backtraces. This means that adding override methods for
boolean operations would require either discarding short-circuiting (and
right be the evaluation result) or figuring out and introducing
a new, relatively complex type in Python just to support this.
(Continuations are sort of what you'd need but I think they're not quite what you want, or at least you need a continuation that captures only the right side expression.)
The other problem of such a
right type is that you'd want to be
able to peer inside it relatively easily. After all, the entire
purpose of implementing your own
__band__ method is so that you
can do something different from a plain boolean
and when the right
hand side is some special thing. If all you're going to do is:
def __band__(self, right): if not bool(self): return False else: return right.eval()
then there's not really any point in having a
__band__ at all,
especially given the general complexity involved in Python as a
(This is of course not necessarily the only reason for Python to fence off boolean operations as things that you absolutely can't override. You can certainly argue that they should be inviolate and not subject to clever redefinitions simply as a matter of principle.)
Firefox turns out to need some degree of 'autoplay' support
When I wrote some notes about Firefox's current media autoplay settings, I said, about the central function in Firefox that decides whether or not media can autoplay:
Since I don't want videos to ever autoplay, [...] I may someday try making the entire function just immediately return false.
I did this experiment and I can now report that the result is a
failure. Completely disabling
ie making it always return
false, results in a Firefox that won't
play video at all. It won't play YouTube videos, which doesn't entirely
surprise me, but it also won't even play directly loaded
The videos load, but clicking on the appropriate 'play' button or
control does nothing and the video never starts going.
The situation with bare
.mp4 files surprises me a little bit,
Firefox is presumably putting up the player controls itself, so it
can know for sure whether or not you clicked on the 'play' button.
Based on some quick spelunking in the Firefox source code, it appears
that calls the media's
.play() method. This is the same fundamental
that surprising that it goes through the same autoplay checks; they
appear to be implemented directly in the
.play() handling code
and apply to anything that calls it, regardless of where from or
This leaves me less surprised about the name and behavior of the
media.autoplay.enabled preference and the other stuff involved
here. Given that Firefox needs there to be some 'autoplay', clearly
it could never be the case that setting
false disabled everything here, because then video (and probably
audio) would never play. That's clearly not what people want.
Modern Unix GUIs now need to talk to at least one C library
I've written before about whether the C runtime and library are a legitimate part of the Unix API. The question matters because some languages want to be as self contained as possible on Unix (Go is one example), and so they don't want to have to use anything written in C if at all possible. However, I recently realized that regardless of the answer to this question, it's essentially impossible to do a good quality, modern Unix GUI without using at least one C library.
This isn't because you need to use a toolkit like Gtk+ or QT, or that you need a C library in order to, for example, speak the X protocol (or Wayland's protocol). You can write a new toolkit if you need to and people have already reimplemented the X protocol in pure non-C languages. Instead the minimal problem is fonts.
Modern fonts are selected and especially rendered in your client, and they're all TrueType fonts. Doing a high quality job of rendering TrueType fonts is extremely complicated, which is why everyone uses the same library for this, namely FreeType. FreeType is written in C, so if you want to use it, you're going to be calling a C library (and it will call on some additional services from something like the C runtime, although apparently you can shim in your own versions of some parts of it).
(Selecting fonts is also a reasonably complicated job, especially if you want to have your fonts match with the rest of the system and be specified in the same way. That's another C library, fontconfig.)
There's no good way out from calling FreeType. Avoiding it requires either abandoning the good modern fonts that users want your UI to have, implementing your own TrueType renderer that works as well as FreeType (and updating it as FreeType improves), or translating FreeType's C code into your language (and then re-translating it every time a significant FreeType update comes out). The latter two are theoretically possible but not particularly practical; the first means that you don't really have a modern Unix GUI program.
(I don't know enough about Wayland to be sure, but it may make this situation worse by essentially requiring you to use Mesa in order to use OpenGL to get decent performance. With X, you can at least have the server do much of the drawing for you by sending X protocol operations; I believe that Wayland requires full client side rendering.)
The direct consequence of this is that there will never be a true pure Go GUI toolkit for Unix that you actually want to use. If the toolkit is one you want to use, it has to be calling FreeType somewhere and somehow; if it isn't calling FreeType, you don't want to use it.
(It's barely possible that the Rust people will be crazy enough to either write their own high-quality equivalent of FreeType or automatically translate its C code into Rust. I'm sure there are people who look at FreeType and want a version of it with guaranteed memory safety and parallel rendering and so on.)
Why you can't put zero bytes in Unix command line arguments
One sensible reaction to all of the rigmarole with '
grep -P' I
went through in yesterday's entry
in order to search for a zero byte (a null byte) is to ask why I
didn't just use a zero byte in the command line argument:
fgrep -e ^@ -l ...
(Using the usual notation for a zero byte.)
You can usually type a zero byte directly at the terminal, along with a number of other unusual control characters (see my writeup of this here), and failing that you could write a shell script in an editor and insert the null byte there. Ignoring character set encoding issues for the moment, this works for any other byte, but if you try it you'll discover that it doesn't work for the zero byte. If you're lucky, your shell will give you an error message about it; if you're not, various weird things will happen. This is because the zero byte can't ever be put into command line arguments in Unix.
Why is ultimately simple. This limitation exists because the Unix
API is fundamentally a C API (whether or not the C library and
runtime are part of the Unix API), and in C,
strings are terminated by a zero byte. When Unix programs such as
the shell pass command line arguments to the kernel as part of the
exec*() family of system calls, they do so as an array of
null-terminated C strings; if you try to put a null byte in there
as data, it will just terminate that command line argument early
(possibly reducing it to a zero-length argument, which is legal but
unusual). When Unix programs start they receive their command line
arguments as an array of C strings (in C, the
argv argument to
main()), and again a null byte passed in as data would be seen
as terminating that argument early.
This is true whether or not your shell and the program you're trying to run are written in C. They can both be written in modern languages that are happy to have zero bytes in strings, but the command line arguments moving between them are being squeezed through an API that requires null-terminated strings. The only way around this would be a completely new set of APIs on both sides, and that's extremely unlikely at this point.
Because filenames are also passed to the kernel as C strings, they too can't contain zero bytes. Neither can environment variables, which are passed between programs (through the kernel) as another array of C strings.
As a corollary, certain character set encodings really don't work as locales on Unix because they run into this. Any character set encoding that can generate zero bytes as part of its characters is going to have serious problems with filenames and command line arguments; one obvious example of such a character set is UTF-16. I believe the usual way for Unixes to deal with a filesystem that's natively UCS-2 or UTF-16 is to encode and decode to UTF-8 somewhere in the kernel or the filesystem driver itself.