Wandering Thoughts

2017-09-26

Using zdb to peer into how ZFS stores files on disk

If you've read much about ZFS and ZFS performance tuning, one of the things you'll have run across is the ZFS recordsize. The usual way it's described is, for example (from here):

All files are stored either as a single block of varying sizes (up to the recordsize) or using multiple recordsize blocks.

For reasons beyond the scope of this entry, I was wondering if this was actually true. Specifically, suppose you're using the default 128 Kb recordsize and you write a file that is 160 Kb at the user level (128 Kb plus 32 Kb). The way recordsize is usually described implies that ZFS writes this on disk as two 128 Kb blocks, with the second one mostly empty.

It turns out that we can use zdb to find out the answer to this question and other interesting ones like it, and it's not even all that painful. My starting point was Bruning Questions: ZFS Record Size, which has an example of using zdb on a file in a test ZFS pool. We can actually do this with a test file on a regular pool, like so:

  • Create a test file:
    cd $HOME/tmp
    dd if=/dev/urandom of=testfile bs=160k count=1
    

    I'm using /dev/urandom here to defeat ZFS compression.

  • Use zdb -O to determine the object number of this file:
    ; zdb -O ssddata/homes cks/tmp/testfile
      Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
     1075431    2   128K   128K   163K     512   256K  100.00  ZFS plain file
    

    (Your version of zdb may be too old to have the -O option, but it's in upstream Illumos and ZFS on Linux.)

  • Use zdb -ddddd to dump detailed information on the object:
    # zdb -ddddd ssddata/homes 1075431
    [...]
         0  L0 0:7360fc5a00:20000 20000L/20000P F=1 B=3694003/3694003
     20000  L0 0:73e6826c00:8400 20000L/8400P F=1 B=3694003/3694003
    
         segment [0000000000000000, 0000000000040000) size  256K
    

    See Bruning Questions: ZFS Record Size for information on what the various fields mean.

    (How many ds to use with the -d option for zdb is sort of like explosives; if it doesn't solve your problem, add more -ds until it does. This number of ds works with ZFS on Linux for me but you might need more.)

What we have here is two on-disk blocks. One is 0x20000 bytes long, or 128 KB; the other is 0x8400 bytes long, or 33 Kb. I don't know why it's 33 Kb instead of 32 Kb, especially since zdb will also report that the file has a size of 163840 (bytes), which is exactly 160 Kb as expected. It's not the ashift on this pool, because this is the pool I made a little setup mistake on so it has an ashift of 9.

Based on what we see here it certainly appears that ZFS will write a short block at the end of a file instead of forcing all blocks in the file to be 128 Kb once you've hit that point. However, note that this second block still has a logical size of 0x20000 bytes (128 Kb), so logically it covers the entire recordsize. This may be part of why it takes up 33 Kb instead of 32 Kb on disk.

That doesn't mean that the 128 Kb recordsize has no effect; in fact, we can show why you might care with a little experiment. Let's rewrite 16 Kb in the middle of that first 128 Kb block, and then re-dump the file layout details:

; dd if=/dev/urandom of=testfile conv=notrunc bs=16k count=1 seek=4
# zdb -ddddd ssddata/homes 1075431
[...]
     0  L0 0:73610c5a00:20000 20000L/20000P F=1 B=3694207/3694207
 20000  L0 0:73e6826c00:8400 20000L/8400P F=1 B=3694003/3694003

As you'd sort of expect from the description of recordsize, ZFS has not split the 128 Kb block up into some chunks; instead, it's done a read-modify-write cycle on the entire 128 Kb, resulting in an entirely new 128 Kb block and 128 Kb of read and write IO (at least at a logical level; at a physical level this data was probably in the ARC, since I'd just written the file in the first place).

Now let's give ZFS a slightly tricky case to see what it does. Unix files can have holes, areas where no data has been written; the resulting file is called a sparse file. Traditionally holes don't result in data blocks being allocated on disk; instead they're gaps in the allocated blocks. You create holes by writing beyond the end of file. How does ZFS represent holes? We'll start by making a 16 Kb file with no hole, then give it a hole by writing another 16 Kb at 96 Kb into the file.

; dd if=/dev/urandom of=testfile2 bs=16k count=1
# zdb -ddddd ssddata/homes 1078183
[...]
     0 L0 0:7330dcaa00:4000 4000L/4000P F=1 B=3694361/3694361

      segment [0000000000000000, 0000000000004000) size   16K

Now we add the hole:

; dd if=/dev/urandom of=testfile2 bs=16k count=1 seek=6 conv=notrunc
[...]
# zdb -ddddd ssddata/homes 1078183
[...]
     0 L0 0:73ea07a400:8200 1c000L/8200P F=1 B=3694377/3694377

      segment [0000000000000000, 000000000001c000) size  112K

The file started out as having one block of (physical on-disk) size 0x4000 (16 Kb). When we added the hole, it was rewritten to have one block of size 0x8200 (32.5 Kb), which represents 112 Kb of logical space. This is actually interesting; it means that ZFS is doing something clever to store holes that fall within what would normally be a single recordsize block. It's also suggestive that ZFS writes some extra data to the block over what we did (the .5 Kb), just as it did with the second block in our first example.

(The same thing happens if you write the second 16 Kb block at 56 Kb, so that you create a 64 Kb long file that would be one 64 Kb block if it didn't have a hole.)

Now that I've worked out how to use zdb for this sort of exploration, there's a number of questions about how ZFS stores files on disks that I want to look into at some point, including how compression interacts with recordsize and block sizes.

(I should probably also do some deeper exploration of what the various information zdb is reporting means. I've poked around with zdb before, but always in very 'heads down' and limited ways that didn't involve really understanding ZFS on-disk structures.)

solaris/ZFSZdbForFileAnalysis written at 01:18:03; Add Comment

2017-09-25

What I use printf for when hacking changes into programs

I tweeted:

Once again I've managed to hack a change into a program through brute force, guesswork, determined grepping, & printf. They'll get you far.

First off, when I say that I hacked a change in, I don't mean that I carefully analyzed the program and figured out the correct and elegant place to change the program's behavior to what I wanted. I mean that I found a spot where I could add a couple of lines of code that reset some variables and when the dust settled, it worked. I didn't carefully research my new values for those variables; instead I mostly experimented until things worked. That's why I described this as being done with brute force and guesswork.

The one thing in my list that may stand out is printf (hopefully the uses of grep are pretty obvious; you've got to find things in the source code somehow, since you're certainly not reading it systematically). When I'm hacking a program up like this, the concise way to describe what I'm using printf for is I use printf to peer into the dynamic behavior of the program.

In theory how the program behaves at runtime is something that you can deduce from understanding the source code, the APIs that it uses, and so on. This sort of understanding is vital if you're actively working on the program and you want relatively clean changes, but it takes time (and you have to actually know the programming languages involved, and so on). When I'm hacking changes into a program, I may not even know the language and I certainly don't have the time and energy to carefully read the code, learn the APIs, and so on. Since I'm not going to deduce this from the source code, I take the brute force approach of modifying the program so that it just tells me things like whether some things are called and what values variables have. In other words, I shove printf calls into various places and see what they report.

I could do the same thing with a debugger, but generally I find printf-based debugging easier and often I'd have to figure out how to hook up a debugger to the program and then make everything work right. For that matter, I may not have handy a debugger that works well with whatever language the program happens to use. Installing and learning a new debugger just to avoid adding some printfs (or the equivalent) is rather a lot of yak shaving.

programming/PrintfToSeeDynamicBehavior written at 00:14:53; Add Comment

2017-09-24

Reading code and seeing what you're biased to see, illustrated

Recently I was reading some C code in systemd, one of the Linux init systems. This code is run in late-stage system shutdown and is responsible for terminating any remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   [...]
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);
   [...]
}

At this point it's important to note that the killall() function manually scans through all remaining processes. Also, this code is called to send either SIGTERM (plus SIGHUP) or SIGKILL to all or almost all processes.

The use of SIGSTOP and SIGCONT here are a bit unusual, since you don't need to SIGSTOP processes before you kill them (or send them signals in general). When I read this code, what I saw in their use was an ingenious way of avoiding any 'thundering herd' problems when processes started being signalled and dying, so I wrote it up in yesterday's entry. I saw this, I think, partly because I've had experience with thundering herd wakeups in response to processes dying and partly because in our situation, the remaining processes are stalled.

Then in comments on that entry, Davin noted that SIGSTOPing everything first did also did something else:

So, I would think it's more likely that the STOP/CONT pair are designed to create a stable process tree which can then be walked to build up a list of processes which actually need to be killed. By STOPping all other processes you prevent them from forking or worse, dieing and the process ID being re-used.

If you're manually scanning the process list in order to kill almost everything there, you definitely don't want to miss some processes because they appeared during your scan. Freezing all of the remaining processes so they can't do inconvenient things like fork() thus makes a lot of sense. In fact, it's quite possible that this is the actual reason for the SIGSTOP and SIGCONT code, and that the systemd people consider avoiding any thundering herd problems to be just a side bonus.

When I read the code, I completely missed this use. I knew all of the pieces necessary to see it, but it just didn't occur to me. It took Davin's comment to shift my viewpoint, and I find that sort of fascinating; it's one thing to know intellectually that you can have a too-narrow viewpoint and miss things when reading code, but another thing to experience it.

(I've had the experience where I read code incorrectly, but in this case I was reading the code correctly but missed some of the consequences and their relevance.)

programming/CodeReadingNarrowness written at 00:53:22; Add Comment

2017-09-23

A clever way of killing groups of processes

While reading parts of the systemd source code that handle late stage shutdown, I ran across an oddity in the code that's used to kill all remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   [...]
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);
   [...]
}

(I've removed error checking and some other things; you can see the original here.)

This is called to send signals like SIGTERM and SIGKILL to everything. At first the use of SIGSTOP and SIGCONT puzzled me, and I wondered if there was some special behavior in Linux if you SIGTERM'd a SIGSTOP'd process. Then the penny dropped; by SIGSTOPing processes first, we're avoiding any thundering herd problems when processes start dying.

Even if you use kill(-1, <signal>), the kernel doesn't necessarily guarantee that all processes will receive the signal at once before any of them are scheduled. So imagine you have a shell pipeline that's remained intact all the way into late-stage shutdown, and all of the processes involved in it are blocked:

proc1 | proc2 | proc3 | proc4 | proc5

It's perfectly valid for the kernel to deliver a SIGTERM to proc1, immediately kill the process because it has no signal handler, close proc1's standard output pipe as part of process termination, and then wake up proc2 because now its standard input has hit end-of-file, even though either you or the kernel will very soon send proc2 its own SIGTERM signal that will cause it to die in turn. This and similar cases, such as a parent waiting for children to exit, can easily lead to highly unproductive system thrashing as processes are woken up unnecessarily. And if a process has a SIGTERM signal handler, the kernel will of course schedule it to wake up and may start it running immediately, especially on a multi-core system.

Sending everyone a SIGSTOP before the real signal completely avoids this. With all processes suspended, all of them will get your signal before any of them can wake up from other causes. If they're going to die from the signal, they'll die on the spot; they're not going to die (because you're starting with SIGTERM or SIGHUP and they block or handle it), they'll only get woken up at the end, after most of the dust has settled. It's a great solution to a subtle issue.

(If you're sending SIGKILL to everyone, most or all of them will never wake up; they'll all be terminated unless something terrible has gone wrong. This means this SIGSTOP trick avoids ever having any of the processes run; you freeze them all and then they die quietly. This is exactly what you want to happen at the end of system shutdown.)

unix/ProcessKillingTrick written at 02:42:54; Add Comment

2017-09-22

Using a watchdog timer in system shutdown with systemd (on Ubuntu 16.04)

In Systemd, NFS mounts, and shutting down your system, I covered how Mike Kazantsev pointed me at the ShutdownWatchdogSec setting in system.conf as a way of dealing with our reboot hang issues. I also alluded to some issues with it. We've now tested and deployed a setup using this, so I want to walk through how it works and what its limitations are. As part of that I need to talk about how systemd actually shuts down your system.

Under systemd, system shutdown happens in two stages. The first stage is systemd stopping all of the system units that it can, in whatever way or ways they're configured to stop. Some units may fail to stop here and some processes may not be killed by their unit's 'stop' action(s), for example processes run by cron. This stage is the visible part of system shutdown, the bit that causes systemd to print out all of its console messages. It ends when systemd reaches shutdown.target, which is when you get console messages like:

[...]
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Reached target Shutdown.

(There are apparently a few more magic systemd targets and services that get invoked here without producing any console messages.)

The second stage starts when systemd transfers control (and being PID 1) to the special systemd-shutdown program in order to do the final cleanup and shutdown of the system (the manual page describes why it exists and you can read the actual core code here). Simplified, systemd-shutdown SIGTERMs and then SIGKILLs all remaining processes and then enters a loop where it attempts to unmount any remaining filesystems, deactivate any remaining swap devices, and shut down remaining loop and DM devices. If all such things are gone or systemd-shutdown makes no progress at all, it goes on to do the actual reboot. Unless you turn on systemd debugging (and direct it to the console), systemd-shutdown is completely silent about all of this; it prints nothing when it starts and nothing as it runs. Normally this doesn't matter because it finishes immediately and without problems.

Based on the manpage, you might think that ShutdownWatchdogSec limits the total amount of time a shutdown can take and thus covers both of these stages. This is not the case; the only thing that ShutdownWatchdogSec does is put a watchdog timer on systemd-shutdown's end-of-things work in the second stage. Well, sort of. If you read the manpage, you'd probably think that the time you configure here is the time limit on the second stage as a whole, but actually it's only the time limit on each of those 'try to clean up remaining things' loops. systemd-shutdown resets the watchdog every time it starts a trip through the loop, so as long as it thinks it's making some progress, your shutdown can take much longer than you expect in sufficiently perverse situations. Or rather I should say your reboot. As the manual page specifically notes, the watchdog shutdown timer only applies to reboots, not to powering the system off.

(One consequence of what ShutdownWatchdogSec does and doesn't cover is that for most systems it's safe to set it to a very low timeout. If you get to the systemd-shutdown stage with any processes left, so many things have already been shut down that those processes are probably not going to manage an orderly shutdown in any case. We currently use 30 seconds and that's probably far too generous.)

To use ShutdownWatchdogSec, you need a kernel watchdog timer; you can tell if you have one by looking for /dev/watchdog and /dev/watchdogN devices. Kernel watchdog timers are created by a variety of modules that support a variety of hardware watchdogs, such as iTCO_wdt for the Intel TCO WatchDog that you probably have on your Intel-based server hardware. For our purposes here, the simplest and easiest to use kernel watchdog module is softdog, a software watchdog implemented at the kernel level. Softdog has the limitation that it doesn't help if the kernel itself hangs, which we don't really care about, but the advantage that it works everywhere (including in VMs) and seems to be quite reliable and predictable.

Some Linux distributions (such as Fedora) automatically load an appropriate kernel watchdog module depending on what hardware is available. Ubuntu 16.04 goes to the other extreme; it extensively blacklists all kernel watchdog modules, softdog included, so you can't even stick something in /etc/modules-load.d. To elide a long discussion, our solution to this was a new cslab-softdog.service systemd service that explicitly loaded the module using the following:

[Service]
Type=oneshot
RemainAfterExit=True
ExecStart=/sbin/modprobe softdog

With softdog loaded and ShutdownWatchdogSec configured, systemd appears to reliably reboot my test VMs and test hardware in situations where systemd-shutdown previously hung. It takes somewhat longer than my configured ShutdownWatchdogSec, apparently because softdog gives you an extra margin of time just in case, probably 60 seconds based on what modinfo says.

Sidebar: Limiting total shutdown time (perhaps)

As noted in comments on my first entry on our reboot problems, reboot.target and poweroff.target both normally have a JobTimeoutSec of 30 minutes. If my understanding of systemd is correct, setting any JobTimeoutSec here is supposed to force a reboot or poweroff if the first stage of shutdown takes that long (because rebooting is done by attempting to active reboot.target, which is a systemd 'job', which causes the job timeout to matter).

Although I haven't tested it yet, this suggests that combining a suitably short short JobTimeoutSec on reboot.target with ShutdownWatchdogSec would limit the total time your system will ever spend rebooting. Picking a good JobTimeoutSec value is not obvious; you want it long enough that daemons have time to shut down in an orderly way, but not so long that you go off to the machine room. 30 minutes is clearly too long for us, but 30 seconds would probably be too short for most servers.

linux/SystemdShutdownWatchdog written at 02:28:17; Add Comment

2017-09-21

My potential qualms about using Python 3 in projects

I wrote recently about why I didn't use the attrs module recently; the short version is that it would have forced my co-workers to learn about it in order to work on my code. Talking about this brings up a potentially awkward issue, namely Python 3. Just like the attrs module, working with Python 3 code involves learning some new things and dealing with some additional concerns. In light of this, is using Python 3 in code for work something that's justified?

This issue is relevant to me because I actually have Python 3 code these days. For one program, I had a concrete and useful reason to use Python 3 and doing so has probably had real benefits for our handling of incoming email. But for other code I've simply written it in Python 3 because I'm still kind of enthused about it and everyone (still) does say it's the right thing to do. And there's no chance that we'll be able to forget about Python 2, since almost all of our existing Python code uses Python 2 and isn't going to change.

However, my tentative view is that using Python 3 is a very different situation than the attrs module. To put it one way, it's quite possible to work with Python 3 without noticing. At a superficial level and for straightforward code, about the only difference between Python 3 and Python 2 is print("foo") versus 'print "foo". Although I've said nasty things about Python 3's automatic string conversions in the past, they do have the useful property that things basically just work in a properly formed UTF-8 environment, and most of the time that's what we have for sysadmin tools.

(Yes, this isn't robust against nasty input, and some tools are exposed to that. But many of our tools only process configuration files that we've created ourselves, which means that any problems are our own fault.)

Given that you can do a great deal of work on an existing piece of Python code without caring whether it's Python 2 or Python 3, the cost of using Python 3 instead of Python 2 is much lower than, for example, the cost of using the attrs module. Code that uses attrs is basically magic if you don't know attrs; code in Python 3 is just a tiny bit odd looking and it may blow up somewhat mysteriously if you do one of two innocent-seeming things.

(The two things are adding a print statement and using tabs in the indentation of a new or changed line. In theory the latter might not happen; in practice, most Python 3 code will be indented with spaces.)

In situations where using Python 3 allows some clear benefit, such as using a better version of an existing module, I think using Python 3 is pretty easily defensible; the cost is very likely to be low and there is a real gain. In situations where I've just used Python 3 because I thought it was neat and it's the future, well, at least the costs are very low (and I can argue that this code is ready for a hypothetical future where Python 2 isn't supported any more and we want to migrate away from it).

Sidebar: Sometimes the same code works in both Pythons

I wrote my latest Python code as a Python 3 program from the start. Somewhat to my surprise, it runs unmodified under Python 2.7.12 even though I made no attempt to make it do so. Some of this is simply luck, because it turns out that I was only ever invoking print() with a single argument. In Python 2, print("fred") is seen as 'print ("fred")', which is just 'print "fred"', which works fine. Had I tried to print() multiple arguments, things would have exploded.

(I have only single-argument print()s because I habitually format my output with % if I'm printing out multiple things. There are times when I'll deviate from this, but it's not common.)

python/Python3LearningQualms written at 01:35:57; Add Comment

2017-09-20

Wireless is now critical (network) infrastructure

When I moved over to here a decade or so ago, we (the department) had a wireless network that was more or less glued together out of spare parts. One reason for this, beyond simply money, is that wireless networking was seen as a nice extra for us to offer to our users and thus not something we could justify spending a lot on. If we had to prioritize (and we did), wired networking was much higher up the heap than wireless. Wired networking was essential; the wireless was merely nice to have and offer.

I knew that wireless usage had grown and grown since then, of course; anyone who vaguely pays attention knows that, and the campus wireless people periodically share eye-opening statistics on how many active devices there are. You see tablets and smartphones all over (and I even have one of my own these days, giving me a direct exposure), and people certainly like using their laptops with wifi (even in odd places, although our machine room no longer has wireless access). But I hadn't really thought about the modern state of wireless until I got a Dell XPS 13 laptop recently and then the campus wireless networking infrastructure had some issues.

You see, the Dell XPS 13 has no onboard Ethernet, and it's not at all alone in that; most modern ultrabooks don't, for example. Tablets are obviously Ethernet-free, and any number of people around here use one as a major part of their working environment. People are even actively working through their phones. If the wireless network stops working, all of these people are up a creek and their work grinds to a halt. All of this has quietly turned wireless networking into relatively critical infrastructure. Fortunately our departmental wireless network is in much better shape now than it used to be, partly because we outsourced almost all of it to the university IT people who run the campus wireless network.

(We got USB Ethernet dongles for our recent laptops, but we're sysadmins with unusual needs, including plugging into random networks in order to diagnose problems. Not everyone with a modern laptop is going to bother, and not everyone who gets one is going to carry it around or remember where they put it or so on.)

This isn't a novel observation but it's something that's snuck up on me and before now has only been kind of an intellectual awareness. It wasn't really visceral until I took the XPS 13 out of the box and got to see the absence of an Ethernet port in person.

(The USB Ethernet dongle works perfectly well but it doesn't feel the same, partly because it's not a permanently attached part of the machine that is always there, the way the onboard wifi is.)

sysadmin/WirelessCriticalInfrastructure written at 01:22:01; Add Comment

2017-09-18

Looking back at my mixed and complicated feelings about Solaris

So Oracle killed Solaris (and SPARC) a couple of weeks ago. I can't say this is surprising, although it's certainly sudden and underhanded in the standard Oracle way. Back when Oracle killed Sun, I was sad for the death of a dream, despite having had ups and downs with Sun over the years. My views about the death of Solaris are more mixed and complicated, but I will summarize them by saying that I don't feel very sad about Solaris itself (although there are things to be sad about).

To start with, Solaris has been dead for me for a while, basically ever since Oracle bought Sun and certainly since Oracle closed the Solaris source. The Solaris that the CS department used for years in a succession of fileservers was very much a product of Sun the corporation, and I could never see Oracle's Solaris as the same thing or as a successor to it. Hearing that Oracle was doing things with Solaris was distant news; it had no relevance for us and pretty much everyone else.

(Every move Oracle made after absorbing Sun came across to me as a 'go away, we don't want your business or to expand Solaris usage' thing.)

But that's the smaller piece, because I have some personal baggage and biases around Solaris itself due to my history. I started using Sun hardware in the days of SunOS, where SunOS 3 was strikingly revolutionary and worked pretty well for the time. It was followed by SunOS 4, which was also quietly revolutionary even if the initial versions had some unfortunate performance issues on our servers (we ran SunOS 4.1 on a 4/490, complete with an unfortunate choice of disk interconnect). Then came Solaris 2, which I've described as a high speed collision between SunOS 4 and System V R4.

To people reading this today, more than a quarter century removed, this probably sounds like a mostly neutral thing or perhaps just messy (since I did call it a collision). But at the time it was a lot more. In the old days, Unix was split into two sides, the BSD side and the AT&T System III/V side, and I was firmly on the BSD side along with many other people at universities; SunOS 3 and SunOS 4 and the version of Sun that produced them were basically our standard bearers, not only for BSD's superiority at the time but also their big technical advances like NFS and unified virtual memory. When Sun turned around and produced Solaris 2, it was viewed as being tilted towards being a System V system, not a BSD system. Culturally, there was a lot of feeling that this was a betrayal and Sun had debased the nice BSD system they'd had by getting a lot of System V all over it. It didn't help that Sun was unbundling the compilers around this time, in an echo of the damage AT&T's Unix unbundling did.

(Solaris 2 was Sun's specific version of System V Release 4, which itself was the product of Sun and AT&T getting together to slam System V and BSD together into a unified hybrid. The BSD side saw System V R4 as 'System V with some BSD things slathered over top', as opposed to 'BSD with some System V things added'. This is probably an unfair characterization at a technical level, especially since SVR4 picked up a whole bunch of important BSD features.)

Had I actually used Solaris 2, I might have gotten over this cultural message and come to like and feel affection for Solaris. But I never did; our 4/490 remained on SunOS 4 and we narrowly chose SGI over Sun, sending me on a course to use Irix until we started switching to Linux in 1999 (at which point Sun wasn't competitive and Solaris felt irrelevant as a result). By the time I dealt with Solaris again in 2005, open source Unixes had clearly surpassed it for sysadmin usability; they had better installers, far better package management and patching, and so on. My feelings about Solaris never really improved from there, despite increasing involvement and use, although there were aspects I liked and of course I am very happy that Sun created ZFS, put it into Solaris 10, and then released it to the world as open source so that it could survive the death of Sun and Solaris.

The summary of all of that is that I'm glad that Sun created a number of technologies that wound up in successive versions of Solaris and I'm glad that Sun survived long enough to release them into the world, but I don't have fond feelings about Solaris itself the way that many people who were more involved with it do. I cannot mourn the death of Solaris itself the way I could for Sun, because for me Solaris was never a part of any dream.

(One part of that is that my dream of Unix was the dream of workstations, not the dream of servers. By the time Sun was doing interesting things with Solaris 10, it was clearly not the operating system of the Unix desktop any more.)

(On Solaris's death in general, see this and this.)

solaris/SolarisMixedFeelings written at 23:34:48; Add Comment

Sorting out the world of modern USB (at least a bit)

Part of thinking about new machines for home and work is figuring out what motherboard I want, and part of that is figuring out what I want and need in motherboard features. I've looked into how many SATA ports I want and what it will take to drive a 4K monitor with onboard graphics, so now I've been trying to figure out USB ports. Part of this is trying to understand the different sorts of USB ports that there are and what you can do with them.

(This would be easier if I'd kept up with all of the twists and turns in PC hardware standards, but I haven't.)

USB is a vast and complicated world, with both a set of signalling standards (the old USB 2.0, USB 3.0, and now USB 3.1 aka USB 3.1 gen 2) and a set of port shapes and sizes (the original USB-A and now USB-C) that may be combined in various ways. Fortunately I'm only interested in modern and non-perverse motherboards, so for me I believe that it breaks down this way:

  • old fashioned USB 2.0 ports (with black USB-A connectors) are too slow for disks but are (probably) fine for things like keyboards and mice. But I only need a few of these, and there's no need to have any USB 2.0 only ports if I have enough better USB ports.

  • USB 3.0 ports (often using blue USB-A connectors) are good enough for general usage (theoretically including disks) but are not the latest hotness. USB 3.0 is old enough that any decent modern (desktop) motherboard should really include a bunch of USB 3.0 ports. Even inexpensive H270 based motherboards have a number of them.

    USB 3.0 is not infrequently called 'USB 3.1 gen 1' in advertising and product specifications. This is technically correct but practically misleading, because it's not the type of USB 3.1 that you and I want if we care about USB 3.1.

  • USB 3.1 ports are either USB-C or USB-A, and you may need to look for things specifically described as 'USB 3.1 gen 2'. It's the latest hotness with the fastest connection speed (twice that of USB 3.0 aka USB 3.1 gen 1), but the more that I look the less I'm sure that this will matter to me for the next five years or so.

Then there is USB-C, the new (and small) connector standard for things. When I started writing this entry I thought life was simple and modern USB-C ports were always USB 3.1 (gen 2), but this is not actually the case. It appears not uncommon for H270 and Z270 based motherboards to have USB-C ports that are USB 3.0, not USB 3.1 (gen 2). It seems likely that over time more and more external devices will expect you to have USB-C connectors even if they don't use USB 3.1 (gen 2), which strongly suggests that any motherboard I get should have at least one USB-C port and ideally more.

(The state of connecting USB-C devices to USB-A ports is not clear to me. According to the Wikipedia page on USB-C, you aren't allowed to make an adaptor with a USB-C receptacle and a USB-A connector that will plug into a USB-A port. On the other hand, you can find a lot of cables that are a USB-A connector on one end and USB-C connector on the other end and advertised as letting you connect devices with USB-C with old devices with USB-A, and some of them appear to support USB 3.1 gen 2 USB-A ports. There are devices that you plug USB-C cables in to, and devices that basically have a USB-C cable or connector coming out of them; the former you can convert to USB-A but the later not.)

USB-C ports may support something called alternate mode, where some of the physical wires in the port and the cable are used for another protocol instead of USB. Standardized specifications for this theoretically let your USB-C port be a DisplayPort or Thunderbolt port (among others). On a desktop motherboard, this seems far less useful than simply having, say, a DisplayPort connector; among other advantages, this means you get to drive your 4K display at the same time as you have a USB-C thing plugged in. As a result I don't think Alternate Mode support matters to me, which is handy because it seems to be very uncommon on desktop motherboards.

(Alternate Mode support is obviously attractive if you have limited space for connectors, such as on a laptop or a tablet, because it may let you condense multiple ports into one. And USB-C is designed to be a small connector.)

Intel's current H270 and Z270 chipsets don't appear to natively support USB 3.1 gen 2. This means that any support for it on latest-generation motherboards is added by the motherboard vendor using an add-on controller chipset, and I think you're unlikely to find it on inexpensive motherboards. It also means that I get to search carefully to find motherboards with genuine USB 3.1 gen 2, which is being a pain in the rear so far. An alternate approach would be to get USB 3.1 gen 2 through an add-on PCIE card (based on information from here); this might be a lot less of a pain than trying to find and select a suitable motherboard.

(As for how many of each type of port I need or want, I haven't counted them up yet. My current bias is towards at least two USB 3.1 gen 2 ports, at least one USB-C port, and a bunch of USB 3.0 ports. I probably have at least four or five USB 2.0 devices to be plugged in, although some can be daisy-chained to each other. I'm a little surprised by that count, but these things have proliferated while I wasn't paying attention. Everything is USB these days.)

tech/SortingOutModernUSB written at 00:27:30; Add Comment

2017-09-17

Why I didn't use the attrs module in a recent Python project

I've been hearing buzz about the attrs Python module for a while (for example). I was recently writing a Python program where I had some structures and using attrs to define the classes involved would have made the code shorter and more obvious. At first I was all fired up to finally use attrs, but then I took a step back and reluctantly decided that doing so would be the wrong choice.

You see, this was code for work, and while my co-workers can work in Python, they're not Python people in the way that I am. They're certainly not up on the latest Python things and developments; to them, Python is a tool and they're happy to let it be if they don't need to immerse themselves in it. Naturally, they don't know anything about the attrs module.

If I used attrs, the code would be a bit shorter (and it'd be neat to actually use it), but my co-workers would have to learn at least something about attrs before they could understand my code to diagnose problems, make changes, or otherwise work on it. Using straightforward structure-style classes is boring, but it's not that much more code and it's code that's using a familiar, well established idiom that pretty much everyone is already familiar with.

Given this situation, I did the responsible thing and decided that my desire to play around with attrs was in no way a sufficient justification for inflicting another Python module to learn on my co-workers. Boring straightforward code has its advantages.

I can think of two things that would change this calculation. The first is if I needed more than just simple structure-style classes, so that attrs was saving me a significant chunk of code and making the code that remained much clearer. If I come out clearly ahead with attrs even after adding explanatory comments for my co-workers (or future me), then attrs is much more likely to be a win overall instead of just an indulgence.

(I think that the amount of usage and the size of the codebase matters too, but for us our codebases are small since we're just writing system utility programs and so on in Python.)

The second is if attrs usage becomes relatively widespread, so that my co-workers may well be encountering it in other people's Python code that we have to deal with, in online documentation, and so on. Then using attrs would add relatively little learning overhead and might even have become the normal idiom. This is part of why I feel much more free to use modules in the standard library than third-party modules; the former are, well, 'standard' in at least some sense.

(Mind you, these days I'm sufficiently out of touch with the Python world that I'm not sure how I'd find out if attrs was a big, common thing. Perhaps if Django started using and recommending it.)

python/AttrsLearningProblem written at 01:45:54; Add Comment

(Previous 10 or go back to September 2017 at 2017/09/15)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.