Wandering Thoughts: Recent Entries For range/11-20

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.

2013-05-15

Why I've so far been neglecting functional programming languages

Functional programming languages are in many ways the latest hotness and so for years I've been making off and on runs at things like yet another explanation of monads (which I think I sort of understand by now) and similar topics. Despite this, so far I've been almost completely uninterested in actually trying to write a functional program or exploring a FP language.

The big problem for me is that as far as I can tell, the kind of programs I usually work with are exactly the kind of programs that functional programming is stereotypically a bad fit with. The stereotype I've absorbed is that functional programming is quite a good fit for computation but not a good fit for IO, because IO intrinsically has side effects. Unfortunately most of what I write is all about IO and has little or no computation. Bashing a squarish peg into a roundish hole is unlikely to tell me anything particularly meaningful about nice the language is to work in; what I really need is a roundish peg, a computational problem, and those are relatively scarce around here.

(It's possible that I'm not looking hard enough. For example, I do periodically want to do things like log analysis or event reassembly, where the original data could just as well be a predefined data structure in the program instead of processed from logfiles on disk. I suspect that a functional language would handle these fine, maybe better than ad-hoc hackery in awk, Python, or whatever. If I was really crazy I would try rewriting the logic in our ZFS spares handling system in an FP language to see if it got clearer; it's fundamentally a series of transformations of a tree and then some analysis of the result. The result might even be more testable.)

programming/WhyNotFunctional written at 00:56:36; Add Comment

2013-05-13

My language irritations with Go (so far) and why I'm wrong about them

The great thing about an evolving language is that if you're slow enough about writing up your irritations with it, some of them can wind up fixed (or part fixed). So this list is somewhat shorter than it was when I originally wrote my first Go program, and none of the irritations are major. Also, I will reluctantly concede that Go has good engineering reasons for all of them.

My largest single irritation is that break acts on switch and select; I expected it to act only on any enclosing control structure, so that you could write something like:

for {
   select {
   case <-mchan:
      // message silently swallowed
   case <-schan:
      break
}     

Instead you have to invent a boolean loop condition. I understand why Go does this; it enables you to exit early out of a switch or select case instead of having to wrap everything in ever increasing levels of nesting. This is likely especially important because Go uses explicit error checking (which would otherwise force those nested if blocks).

The issue that got partially fixed is Go's return requirements. When I wrote the original version of my program the natural form of one function was a big switch with a number of specific cases and then a default: to catch the rest; however, the original rules required a surplus return at the end of the function, which irritated me by forcing me to move the default case to the end of the function, obscuring the logic. The Go 1.1 changes make my particular case okay but I believe there remain cases where you need an unreachable ending return (or panic) to make the compiler happy.

You can make an argument that the original and current state of affairs are good software engineering. If the compiler did true reachability analysis it'd increase the number of cases where an innocent looking change to some part of the code would suddenly make the return coverage not be complete and thus produce potentially odd messages about missing returns. The current brute force rules protect against this and lead Go programmers to write in a certain sort of consistent style.

My final issue is my perennial one of being unable to cleanly cancel IO being done by goroutines, breaking them out of things so that they can see a death signal from outside. You can argue that this is a bug in the runtime, but the problem with this is that everything that calls an IO operation then needs to be aware of this particular error case (and catch it, and propagate it up the call stack in whatever way is appropriate). A good start to making it a bug in the runtime would be for the runtime to define a specific error for 'IO attempted on closed connection' and for absolutely everything to use it.

(As it stands, the net package doesn't even define a publicly visible error instance for this case, although it does define one internally. It's my personal view that this beautifully illustrates why this is a general language problem; while you can 'solve' it in code, it requires absolutely everyone to get it right and, well, they clearly don't.)

Again this is a software engineering tradeoff. Both the semantics and the runtime implementation of goroutines are undoubtedly vastly simplified because you don't have to worry about being able to signal or cancel a goroutine from outside itself. Outside of the program exiting, all of the interaction that a goroutine has with the outside world are initiated by itself, on its own terms. This makes it much easier to reason about the effects of a goroutine, especially if it's careful not to use global state.

programming/GoLanguageIrritations written at 23:39:13; Add Comment

The Unix philosophy is not an end to itself

Today I feel like opening a can of worms that I've alluded to before.

Here is something very important about the Unix philosophy (regardless of what exactly that is): the Unix philosophy was not conceived as an empty philosophy that was an end to itself. Instead it is above all a theory about how to make computers easy, powerful, and useful. This philosophy (or at least the things built by people following it at Bell Labs and elsewhere) has been extraordinarily successful, and I'm not just talking about Unix; concepts first pioneered in Unix and C now form core pieces of pretty much every computer system in the world.

But it's possible to take this too far. To put it one way, it's my strong view that the core goal of Unix is to be useful, not to be philosophically pure. The underlying purpose comes first and fitting how to be useful into 'the Unix way of doing things' comes second. If Unix has to be non-Unixy for a while (or even permanently) in order to be useful, then, well, I pick usefulness. Excessive minimalism and 'Unixness' for the sake of minimalism and Unixness is a kind of masochism.

(Of course the devil is in the details, as it always is. It's certainly possible to ruin Unix without getting anything worth it in exchange.)

What this biases me towards is an environment where one solves the problem first then try to make it fit into the traditional 'Unix way' second. Which is why part of me thinks that GNU sort's -h option is perfectly fine because it solves a real problem (and solves it now).

(The counterargument is that Unix cannot be all things to all people. As with all systems, at some point you have to draw a line and say 'this doesn't fit, you need to go elsewhere'. I don't know how to balance this. I do know that a certain amount of griping about 'the one true Unix way' and how (some) modern Unixes are ruining it reminds me an awful lot of the griping of Lisp adherents at the rise of Unix, and for that matter the griping of Unix people (myself sometimes included) at the rise of Windows and Macs.)

unix/UnixPhilosophyPurpose written at 00:29:34; Add Comment

2013-05-11

The consequences of importing a module twice

Back when I wrote about Python's relative import problem, I mentioned that only actually importing a module once can be important due to Python's semantics. Today I feel like discussing what these are and how much they can matter.

The straightforward thing that goes wrong if you manage to import a module twice (under two different names) is that any code in the module gets run twice, not once. Modules that run active code on import assume that this code is only going to be run once; running it again may result in various sorts of malfunctions.

At one level, modules that run code on import are relatively rare because people understand it's bad form for a simple import to have big side effects. At another level, various frameworks like Django effectively run code on module import in order to handle things like setting up models and view forms and so on; it's just that this code isn't directly visible in your module because it's hiding in framework metaclasses. But this issue is a signpost to the really big thing: function and class definitions are executable statements that are run at import time. The net effect is that when you import a module a second time the new import has a completely distinct set of functions, classes, exceptions, sentinel objects, and so on. They look identical to the versions from the first import but as far as Python is concerned they are completely distinct; fred.MyCls is not the same thing as mymod.fred.MyCls.

(This is the same effect that you get when you use reload() on a module.)

However, my guess is that this generally won't matter. Most Python code uses duck typing and the two distinct classes are identical as far as that goes. Use of things like specific exceptions, sentinel values, and imported classes is probably going to be confined to the modules that directly imported the dual-imported module and thus mostly hidden from the outside world (for example, it's usually considered bad manners to leak exceptions from a module that you imported into the outside world). In many cases even the objects from the imported module are going to be significantly confined to the importing module.

(One potentially bad thing is that if the module has an internal cache of some sort, you will get two copies of the cache and thus perhaps twice the memory use.)

python/DualImportProblems written at 22:16:08; Add Comment

2013-05-10

Illustrating the tradeoff of security versus usability

One of the sessions of the university's yearly technical conference that I went to today was on two-factor authentication using USB crypto tokens (augmented by software on the client). In the talk, it came up that token-aware software can notice when the USB token is removed and do things like de-authenticate you or break a VPN connection. It struck me that this creates a perfect illustration of the tradeoff between security and usability, which I will frame through a question:

When the screen locker activates, should a token-aware application break its authenticated connection to whatever it's talking to and deauthenticate the user, forcing them to reauthenticate by re-entering their token PIN when they come back to the machine? This is clearly the most secure option; otherwise there's no proof that the person who unlocked the screen and is now using the computer is the person who owns the USB token and passed the two-factor authentication earlier.

Some people are enthusiastically saying 'yes' right now. Now, imagine that you're using this two-factor system to authenticate your SSH connections to your servers. Does your opinion change? In fact, does your opinion change about how the system should behave if the token is removed?

The usability issue is pretty simple: tearing down VPNs and breaking SSH sessions and logging you out of applications is secure but disruptive. In some situations it would be actively dangerous, because you'd be interrupting something halfway through an operation (although in this sort of environment all sysadmins would rapidly start using screen or tmux everywhere in self defense). You probably don't want this disruption every time you step away from your machine to go to the office coffee pot, the washroom, or whatever. At the same time you don't want to leave your machine exposed with its screen unlocked.

(In fact the most secure thing to do would be to both lock your screen and take the USB crypto token with you. This is also likely to be maximally disruptive.)

It's worth noting that the more you use your USB token, the more disruptive this is. This is especially punishing to the power users who run authenticated applications all the time and who often or always have multiple ones active at once, possibly with complex state (such as sysadmins with SSH sessions). Unfortunately these may be exactly the people you want to be most secure.

It's tempting to say that way to improve this situation is to improve the usability by suspending secured sessions instead of breaking them and deauthenticating the user; then users merely have to re-enter their PIN (hopefully only once) instead of re-opening all their secured applications and re-establishing their VPN and SSH connections and so on. In theory you can make this work. In practice, doing this securely requires that the server side of everything supports the equivalent of screen, letting you disconnect and later reconnect.

(If the suspension is done only by client software bad guys can use various physical attacks to compromise an exposed machine, bypass the client suspension, and directly use the established VPN, SSH session, or whatever. You need the server software to force the client to re-authenticate.)

PS: I suspect that you can predict the result of having the screen locker activating causing sessions to be broken and people to be deauthenticated. For that matter, you can likely predict the result of having this happen when the USB token is removed (and it involves a surprising number of unattended USB tokens, especially in areas that people feel are physically secure (like lockable single-person offices)).

tech/SecurityVsUsabilityToken written at 23:39:17; Add Comment

Disk IO is what shatters the VM illusion for me right now

I use VMs on my office workstation as a far more convenient substitute for real hardware. In theory I could assemble a physical test machine or a group of them, hook them all up, install things on them, and so on; in practice I virtualize all of that. This means that what I want is the illusion of separate machines and for the most part that's what I get.

However, there's one area where the illusion breaks down and exposes that all of these machines are really just programs on my workstation, and that's disk IO. Because everything is on spinning rust right now (and worse, most of it is on a common set of spinning rust), disk IO in a VM has a clear and visible impact on me trying to do things on my workstation (and vice versa but I generally don't care as much about that). Unfortunately doing things like (re)installing operating systems and performing package updates do a lot of disk IO, often random disk IO.

(In practice neither RAM nor CPU usage break the illusion, partly because I have a lot of both in practice and VMs don't claim all that much of either. It also helps that the RAM is essentially precommitted the moment I start a VM.)

The practical effect is that I generally have to restrict myself to one disk IO intensive thing at once, regardless of where it's happening. This is not exactly a fatal problem, but it is both irritating and a definite crack in the otherwise pretty good illusion that those VMs are separate machines.

(The illusion is increased because I don't interact with them with their nominal 'hardware' console, I do basically everything by ssh'ing in to them. This always seems a little bit Ouroboros-recursive, especially since they have an independent network presence.)

sysadmin/ShatteringVMIllusion written at 02:26:02; Add Comment

2013-05-08

Thoughts on when to replace disks in a ZFS pool

One of the morals that you can draw from our near miss that I described in yesterday's entry, where we might have lost a large pool if things had gone a bit differently, is that the right time to replace a disk with read errors is TODAY. Do not wait. Do not put it off because things are going okay and you see no ZFS-level errors after the dust settles. Replace it today because you never know what is going to happen to another disk tomorrow.

Well, maybe. Clearly the maximally cautious approach is to replace a disk any time it reports a hard read error (ie one that is seen at the ZFS layer) or SMART reports an error. But the problem with this for us is that we'd be replacing a lot of disks and at least some of them may be good (or at least perfectly workable). For read errors, our experience is that some but not all reported read errors are transient errors in that they don't happen again if you do something like (re)scrub the pool. And SMART error reports seem relatively uncorrelated with actual errors reported by the backend kernels or seen by ZFS.

In theory you could replace these potentially questionable disks, test them thoroughly, and return them to your spares pool if they pass your tests. In practice this would add more and more questionable disks to your spares pool and, well, do you really trust them completely? I wouldn't. This leaves either demoting them to some less important role (if you have one that can use a potentially significant number of disks, and maybe you do) or trying to return them to the vendor for a warranty claim (and I don't know if the vendor will take them back under that circumstance).

I don't have a good answer to this. Our current (new) approach is to replace disks that have persistent read errors. On the first read error we clear the error and schedule a pool scrub; if the disk then reports more read errors (during the scrub, before the scrub, or in the next while after the scrub), it gets replaced.

(This updates some of our past thinking on when to replace disks. The general discussion there is still valid.)

solaris/ZFSDiskReplacementWhen written at 22:24:52; Add Comment

How ZFS resilvering saved us

I've said nasty things about ZFS before and I'll undoubtedly say some in the future, but today, for various reasons, I want to take the positive side and talk about how ZFS has saved us. While there are a number of ways that ZFS routinely saves us in the small, there's been one big near miss that stands out.

Our fundamental environment is ZFS pools with vdevs of mirror pairs of disks. This setup costs space but, among other things, it's safe from multi-disk failures unless you lose both sides of a single mirror pair (at which point you've lost a vdev and thus the entire pool). One day we came very close to that: one side of a mirror pair died more or less completely and then, as we were resilvering on to a spare disk, the other side of the mirror started developing read errors. This was especially bad because read errors generally had the effect of locking up this particular fileserver (for reasons we don't understand). This was particularly bad because in Solaris 10 update 8, rebooting a locked up fileserver causes the pool resilver to lose all progress to date and start again from scratch.

ZFS resilver saved us here in two ways. The obvious way is that it didn't give up on the vdev when the second disk had some read errors. Many RAID systems would have shrugged their shoulders, declared the second disk bad too, and killed the RAID array (losing all data on it). ZFS was both able and willing to be selective, declaring only specific bits bad instead of ejecting the whole disk and destroying the pool.

(We were lucky in that no metadata was damaged, only file contents, and we had all of the damaged files in backups.)

The subtle way is how ZFS let us solve the problem of successfully resilvering the pool despite the fileserver's 'eventually lock up after enough read errors' behavior. Because ZFS told us what the corrupt files were when it found them and because ZFS only resilvers active data, we could watch the pool's status during the resilver, see what files were reported as having unrepairable problems, and then immediately delete them; this effectively fenced the bad spots on the disk off from the fileserver so that it wouldn't trip over them and explode (again). With a traditional RAID system and a whole-device resync it would have been basically impossible to fence the RAID resync away from the bad disk blocks. At a minimum this would have made the resync take much, much longer.

The whole experience was very nerve-wracking, because we knew we were only one glitch away from ZFS destroying a very large pool. But in the end ZFS got us through and we able to tell users that we had very strong assurances that no other data had been damaged by the disk problems.

solaris/ZFSResilverSave written at 00:15:12; Add Comment

2013-05-07

Python's relative import problem

Back in this entry I bemoaned the fact that Python's syntax for relative imports ('from . import fred') is only valid inside modules. The reason to have it valid outside modules is fairly straightforward; it would allow you to import and run the same Python code whether or not you were doing 'import module.thing' from outside the module's directory or sitting inside the module's directory doing 'import thing'. The way things are in Python today, once you start using relative imports in your code it can only be used as a module (which has implications for it being somehow on your Python path and so on even while you're coding).

Unfortunately for me, I suspect that this restriction is not arbitrary. The problem that Python is probably worrying about is importing the same submodule twice under different names. The official Python semantics are that there is only one copy of a particular (sub)module and its module level code is run only once, even if the module is imported multiple times; imports after the first one simply return a cached reference.

(These semantics are important in a number of situations that may not be obvious, due to Python's execution model.)

However, Python has opted to do this based on the apparent (full) module name, not based on (say) remembering the file that a particular module was loaded from and not reloading the file. When you do a relative import inside a module, Python knows the full name of the new submodule you're importing (because it knows the full, module-included name of the code doing the relative import). When you do a relative import outside a module, Python has no such knowledge but it knows that in theory this code is part of a module. This opens up the possibility of double-importing a submodule (once under its full name and once under whatever magic name you make up for a non-module relative import). Python opts to be safe and block this by refusing to do a relative import unless it can (reliably) work out the absolute name.

(There are still plenty of ways to import a module twice but they all require you to actively do something bad, like add both a directory and one of its subdirectories to your Python path. Sadly this is quite easy because Python will automatically add things to the Python path for you under some common circumstances.)

python/RelativeImportProblem written at 00:54:18; Add Comment

2013-05-05

Unix is not necessarily Unixy

As I've written about before, in some quarters there is a habit of saying that everything added to Unix needs to be 'Unixy'. One of the many problems with this is that a number of aspects of Unix itself are not 'Unixy'. I don't mean that in a theoretical way, where we debate about whether a particular API or approach is really 'Unixy'. I mean that in a concrete sense, in that Bell Labs, generally regarded as the home of Unix and the people who understand its essential nature best, built various things differently than mainline Unix. In some cases they did this after mainline Unix had established something, which is a clear sign that they felt that other Unix developers had gotten it wrong.

(In the end their vision of the right way to do things was so extreme that they started over from scratch so they didn't have to worry about backwards compatibility. The result of that was Plan 9.)

The easiest place to see this is in the approach that Bell Labs took to networking. Unfortunately I don't believe that manual pages from post-V7 Research Unix are online, but the next best thing is the networking manual pages for Plan 9 (which has essentially the same interface from what I understand). Plan 9 networking is completely different from the BSD sockets API that is now the Unix standard; it is in large part much more high level. You can read about it in the Plan 9 dial(2) manpage, and a version of this interface without the Plan 9 bits has resurfaced in the Go net package's Dial() and Listen() APIs.

You can certainly argue that these APIs are fundamentally not comparable to the BSD sockets API because they're on a different level (the BSD sockets API is a kernel API, while most of the Plan 9 API is implemented in library code). But in a sense this is besides the point, which is that the Plan 9 API is how Bell Labs thought programs should do networking.

(You can also argue that the Plan 9 API is insufficient in practice and that programs need and want more control over networking than it offers. I'm sympathetic to this argument but it does open up a can of worms about when one should discount the Bell Labs view on 'what is Unix' and what can replace it.)

unix/UnixIsNotUnixy written at 23:37:01; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.