Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.
|
2013-05-15 Why I've so far been neglecting functional programming languagesFunctional programming languages are in many ways the latest hotness and so for years I've been making off and on runs at things like yet another explanation of monads (which I think I sort of understand by now) and similar topics. Despite this, so far I've been almost completely uninterested in actually trying to write a functional program or exploring a FP language. The big problem for me is that as far as I can tell, the kind of programs I usually work with are exactly the kind of programs that functional programming is stereotypically a bad fit with. The stereotype I've absorbed is that functional programming is quite a good fit for computation but not a good fit for IO, because IO intrinsically has side effects. Unfortunately most of what I write is all about IO and has little or no computation. Bashing a squarish peg into a roundish hole is unlikely to tell me anything particularly meaningful about nice the language is to work in; what I really need is a roundish peg, a computational problem, and those are relatively scarce around here. (It's possible that I'm not looking hard enough. For example, I do periodically want to do things like log analysis or event reassembly, where the original data could just as well be a predefined data structure in the program instead of processed from logfiles on disk. I suspect that a functional language would handle these fine, maybe better than ad-hoc hackery in awk, Python, or whatever. If I was really crazy I would try rewriting the logic in our ZFS spares handling system in an FP language to see if it got clearer; it's fundamentally a series of transformations of a tree and then some analysis of the result. The result might even be more testable.) (One comment.)
programming/WhyNotFunctional written at 00:56:36; Add Comment
2013-05-13 My language irritations with Go (so far) and why I'm wrong about themThe great thing about an evolving language is that if you're slow enough about writing up your irritations with it, some of them can wind up fixed (or part fixed). So this list is somewhat shorter than it was when I originally wrote my first Go program, and none of the irritations are major. Also, I will reluctantly concede that Go has good engineering reasons for all of them. My largest single irritation is that
Instead you have to invent a boolean loop condition. I understand why Go
does this; it enables you to exit early out of a The issue that got partially fixed is Go's return requirements. When I wrote the original version
of my program the natural form of one function was a big switch with
a number of specific cases and then a You can make an argument that the original and current state of affairs
are good software engineering. If the compiler did true reachability
analysis it'd increase the number of cases where an innocent looking
change to some part of the code would suddenly make the My final issue is my perennial one of being unable to cleanly cancel IO being done by goroutines, breaking them out of things so that they can see a death signal from outside. You can argue that this is a bug in the runtime, but the problem with this is that everything that calls an IO operation then needs to be aware of this particular error case (and catch it, and propagate it up the call stack in whatever way is appropriate). A good start to making it a bug in the runtime would be for the runtime to define a specific error for 'IO attempted on closed connection' and for absolutely everything to use it. (As it stands, the Again this is a software engineering tradeoff. Both the semantics and the runtime implementation of goroutines are undoubtedly vastly simplified because you don't have to worry about being able to signal or cancel a goroutine from outside itself. Outside of the program exiting, all of the interaction that a goroutine has with the outside world are initiated by itself, on its own terms. This makes it much easier to reason about the effects of a goroutine, especially if it's careful not to use global state.
The Unix philosophy is not an end to itselfToday I feel like opening a can of worms that I've alluded to before. Here is something very important about the Unix philosophy (regardless of what exactly that is): the Unix philosophy was not conceived as an empty philosophy that was an end to itself. Instead it is above all a theory about how to make computers easy, powerful, and useful. This philosophy (or at least the things built by people following it at Bell Labs and elsewhere) has been extraordinarily successful, and I'm not just talking about Unix; concepts first pioneered in Unix and C now form core pieces of pretty much every computer system in the world. But it's possible to take this too far. To put it one way, it's my strong view that the core goal of Unix is to be useful, not to be philosophically pure. The underlying purpose comes first and fitting how to be useful into 'the Unix way of doing things' comes second. If Unix has to be non-Unixy for a while (or even permanently) in order to be useful, then, well, I pick usefulness. Excessive minimalism and 'Unixness' for the sake of minimalism and Unixness is a kind of masochism. (Of course the devil is in the details, as it always is. It's certainly possible to ruin Unix without getting anything worth it in exchange.) What this biases me towards is an environment where one solves the
problem first then try to make it fit into the traditional 'Unix way'
second. Which is why part of me thinks that GNU sort's (The counterargument is that Unix cannot be all things to all people. As with all systems, at some point you have to draw a line and say 'this doesn't fit, you need to go elsewhere'. I don't know how to balance this. I do know that a certain amount of griping about 'the one true Unix way' and how (some) modern Unixes are ruining it reminds me an awful lot of the griping of Lisp adherents at the rise of Unix, and for that matter the griping of Unix people (myself sometimes included) at the rise of Windows and Macs.) (One comment.)
unix/UnixPhilosophyPurpose written at 00:29:34; Add Comment
2013-05-11 The consequences of importing a module twiceBack when I wrote about Python's relative import problem, I mentioned that only actually importing a module once can be important due to Python's semantics. Today I feel like discussing what these are and how much they can matter. The straightforward thing that goes wrong if you manage to import a module twice (under two different names) is that any code in the module gets run twice, not once. Modules that run active code on import assume that this code is only going to be run once; running it again may result in various sorts of malfunctions. At one level, modules that run code on import are relatively rare
because people understand it's bad form for a simple import to have
big side effects. At another level, various frameworks like Django
effectively run code on module import in order to handle things like
setting up models and view forms and so on; it's just that this
code isn't directly visible in your module because it's hiding in
framework metaclasses. But this issue is a signpost to the really big
thing: function and class definitions are executable statements that are run at import time. The net effect
is that when you import a module a second time the new import has a
completely distinct set of functions, classes, exceptions, sentinel
objects, and so on. They look identical to the versions from the first
import but as far as Python is concerned they are completely distinct;
(This is the same effect that you get when you use However, my guess is that this generally won't matter. Most Python code uses duck typing and the two distinct classes are identical as far as that goes. Use of things like specific exceptions, sentinel values, and imported classes is probably going to be confined to the modules that directly imported the dual-imported module and thus mostly hidden from the outside world (for example, it's usually considered bad manners to leak exceptions from a module that you imported into the outside world). In many cases even the objects from the imported module are going to be significantly confined to the importing module. (One potentially bad thing is that if the module has an internal cache of some sort, you will get two copies of the cache and thus perhaps twice the memory use.) (2 comments.)
python/DualImportProblems written at 22:16:08; Add Comment
2013-05-10 Illustrating the tradeoff of security versus usabilityOne of the sessions of the university's yearly technical conference that I went to today was on two-factor authentication using USB crypto tokens (augmented by software on the client). In the talk, it came up that token-aware software can notice when the USB token is removed and do things like de-authenticate you or break a VPN connection. It struck me that this creates a perfect illustration of the tradeoff between security and usability, which I will frame through a question: When the screen locker activates, should a token-aware application break its authenticated connection to whatever it's talking to and deauthenticate the user, forcing them to reauthenticate by re-entering their token PIN when they come back to the machine? This is clearly the most secure option; otherwise there's no proof that the person who unlocked the screen and is now using the computer is the person who owns the USB token and passed the two-factor authentication earlier. Some people are enthusiastically saying 'yes' right now. Now, imagine that you're using this two-factor system to authenticate your SSH connections to your servers. Does your opinion change? In fact, does your opinion change about how the system should behave if the token is removed? The usability issue is pretty simple: tearing down VPNs and breaking SSH
sessions and logging you out of applications is secure but disruptive.
In some situations it would be actively dangerous, because you'd be
interrupting something halfway through an operation (although in this
sort of environment all sysadmins would rapidly start using (In fact the most secure thing to do would be to both lock your screen and take the USB crypto token with you. This is also likely to be maximally disruptive.) It's worth noting that the more you use your USB token, the more disruptive this is. This is especially punishing to the power users who run authenticated applications all the time and who often or always have multiple ones active at once, possibly with complex state (such as sysadmins with SSH sessions). Unfortunately these may be exactly the people you want to be most secure. It's tempting to say that way to improve this situation is to improve the usability by suspending secured sessions instead of breaking them and deauthenticating the user; then users merely have to re-enter their PIN (hopefully only once) instead of re-opening all their secured applications and re-establishing their VPN and SSH connections and so on. In theory you can make this work. In practice, doing this securely requires that the server side of everything supports the equivalent of screen, letting you disconnect and later reconnect. (If the suspension is done only by client software bad guys can use various physical attacks to compromise an exposed machine, bypass the client suspension, and directly use the established VPN, SSH session, or whatever. You need the server software to force the client to re-authenticate.) PS: I suspect that you can predict the result of having the screen locker activating causing sessions to be broken and people to be deauthenticated. For that matter, you can likely predict the result of having this happen when the USB token is removed (and it involves a surprising number of unattended USB tokens, especially in areas that people feel are physically secure (like lockable single-person offices)).
Disk IO is what shatters the VM illusion for me right nowI use VMs on my office workstation as a far more convenient substitute for real hardware. In theory I could assemble a physical test machine or a group of them, hook them all up, install things on them, and so on; in practice I virtualize all of that. This means that what I want is the illusion of separate machines and for the most part that's what I get. However, there's one area where the illusion breaks down and exposes that all of these machines are really just programs on my workstation, and that's disk IO. Because everything is on spinning rust right now (and worse, most of it is on a common set of spinning rust), disk IO in a VM has a clear and visible impact on me trying to do things on my workstation (and vice versa but I generally don't care as much about that). Unfortunately doing things like (re)installing operating systems and performing package updates do a lot of disk IO, often random disk IO. (In practice neither RAM nor CPU usage break the illusion, partly because I have a lot of both in practice and VMs don't claim all that much of either. It also helps that the RAM is essentially precommitted the moment I start a VM.) The practical effect is that I generally have to restrict myself to one disk IO intensive thing at once, regardless of where it's happening. This is not exactly a fatal problem, but it is both irritating and a definite crack in the otherwise pretty good illusion that those VMs are separate machines. (The illusion is increased because I don't interact with them with their nominal 'hardware' console, I do basically everything by ssh'ing in to them. This always seems a little bit Ouroboros-recursive, especially since they have an independent network presence.) (4 comments.)
sysadmin/ShatteringVMIllusion written at 02:26:02; Add Comment
2013-05-08 Thoughts on when to replace disks in a ZFS poolOne of the morals that you can draw from our near miss that I described in yesterday's entry, where we might have lost a large pool if things had gone a bit differently, is that the right time to replace a disk with read errors is TODAY. Do not wait. Do not put it off because things are going okay and you see no ZFS-level errors after the dust settles. Replace it today because you never know what is going to happen to another disk tomorrow. Well, maybe. Clearly the maximally cautious approach is to replace a disk any time it reports a hard read error (ie one that is seen at the ZFS layer) or SMART reports an error. But the problem with this for us is that we'd be replacing a lot of disks and at least some of them may be good (or at least perfectly workable). For read errors, our experience is that some but not all reported read errors are transient errors in that they don't happen again if you do something like (re)scrub the pool. And SMART error reports seem relatively uncorrelated with actual errors reported by the backend kernels or seen by ZFS. In theory you could replace these potentially questionable disks, test them thoroughly, and return them to your spares pool if they pass your tests. In practice this would add more and more questionable disks to your spares pool and, well, do you really trust them completely? I wouldn't. This leaves either demoting them to some less important role (if you have one that can use a potentially significant number of disks, and maybe you do) or trying to return them to the vendor for a warranty claim (and I don't know if the vendor will take them back under that circumstance). I don't have a good answer to this. Our current (new) approach is to replace disks that have persistent read errors. On the first read error we clear the error and schedule a pool scrub; if the disk then reports more read errors (during the scrub, before the scrub, or in the next while after the scrub), it gets replaced. (This updates some of our past thinking on when to replace disks. The general discussion there is still valid.) (2 comments.)
solaris/ZFSDiskReplacementWhen written at 22:24:52; Add Comment
How ZFS resilvering saved usI've said nasty things about ZFS before and I'll undoubtedly say some in the future, but today, for various reasons, I want to take the positive side and talk about how ZFS has saved us. While there are a number of ways that ZFS routinely saves us in the small, there's been one big near miss that stands out. Our fundamental environment is ZFS pools with vdevs of mirror pairs of disks. This setup costs space but, among other things, it's safe from multi-disk failures unless you lose both sides of a single mirror pair (at which point you've lost a vdev and thus the entire pool). One day we came very close to that: one side of a mirror pair died more or less completely and then, as we were resilvering on to a spare disk, the other side of the mirror started developing read errors. This was especially bad because read errors generally had the effect of locking up this particular fileserver (for reasons we don't understand). This was particularly bad because in Solaris 10 update 8, rebooting a locked up fileserver causes the pool resilver to lose all progress to date and start again from scratch. ZFS resilver saved us here in two ways. The obvious way is that it didn't give up on the vdev when the second disk had some read errors. Many RAID systems would have shrugged their shoulders, declared the second disk bad too, and killed the RAID array (losing all data on it). ZFS was both able and willing to be selective, declaring only specific bits bad instead of ejecting the whole disk and destroying the pool. (We were lucky in that no metadata was damaged, only file contents, and we had all of the damaged files in backups.) The subtle way is how ZFS let us solve the problem of successfully resilvering the pool despite the fileserver's 'eventually lock up after enough read errors' behavior. Because ZFS told us what the corrupt files were when it found them and because ZFS only resilvers active data, we could watch the pool's status during the resilver, see what files were reported as having unrepairable problems, and then immediately delete them; this effectively fenced the bad spots on the disk off from the fileserver so that it wouldn't trip over them and explode (again). With a traditional RAID system and a whole-device resync it would have been basically impossible to fence the RAID resync away from the bad disk blocks. At a minimum this would have made the resync take much, much longer. The whole experience was very nerve-wracking, because we knew we were only one glitch away from ZFS destroying a very large pool. But in the end ZFS got us through and we able to tell users that we had very strong assurances that no other data had been damaged by the disk problems. (2 comments.)
solaris/ZFSResilverSave written at 00:15:12; Add Comment
2013-05-07 Python's relative import problemBack in this entry I bemoaned the fact that
Python's syntax for relative imports (' Unfortunately for me, I suspect that this restriction is not arbitrary. The problem that Python is probably worrying about is importing the same submodule twice under different names. The official Python semantics are that there is only one copy of a particular (sub)module and its module level code is run only once, even if the module is imported multiple times; imports after the first one simply return a cached reference. (These semantics are important in a number of situations that may not be obvious, due to Python's execution model.) However, Python has opted to do this based on the apparent (full) module name, not based on (say) remembering the file that a particular module was loaded from and not reloading the file. When you do a relative import inside a module, Python knows the full name of the new submodule you're importing (because it knows the full, module-included name of the code doing the relative import). When you do a relative import outside a module, Python has no such knowledge but it knows that in theory this code is part of a module. This opens up the possibility of double-importing a submodule (once under its full name and once under whatever magic name you make up for a non-module relative import). Python opts to be safe and block this by refusing to do a relative import unless it can (reliably) work out the absolute name. (There are still plenty of ways to import a module twice but they all require you to actively do something bad, like add both a directory and one of its subdirectories to your Python path. Sadly this is quite easy because Python will automatically add things to the Python path for you under some common circumstances.)
2013-05-05 Unix is not necessarily UnixyAs I've written about before, in some quarters there is a habit of saying that everything added to Unix needs to be 'Unixy'. One of the many problems with this is that a number of aspects of Unix itself are not 'Unixy'. I don't mean that in a theoretical way, where we debate about whether a particular API or approach is really 'Unixy'. I mean that in a concrete sense, in that Bell Labs, generally regarded as the home of Unix and the people who understand its essential nature best, built various things differently than mainline Unix. In some cases they did this after mainline Unix had established something, which is a clear sign that they felt that other Unix developers had gotten it wrong. (In the end their vision of the right way to do things was so extreme that they started over from scratch so they didn't have to worry about backwards compatibility. The result of that was Plan 9.) The easiest place to see this is in the approach that Bell Labs took to
networking. Unfortunately I don't believe that manual pages from post-V7
Research Unix are online, but the next best thing is the networking
manual pages for Plan 9 (which has essentially the same interface from
what I understand). Plan 9 networking is completely different from the
BSD sockets API that is now the Unix standard; it is in large part much
more high level. You can read about it in the Plan 9 dial(2) manpage, and a version of
this interface without the Plan 9 bits has resurfaced in the Go You can certainly argue that these APIs are fundamentally not comparable to the BSD sockets API because they're on a different level (the BSD sockets API is a kernel API, while most of the Plan 9 API is implemented in library code). But in a sense this is besides the point, which is that the Plan 9 API is how Bell Labs thought programs should do networking. (You can also argue that the Plan 9 API is insufficient in practice and that programs need and want more control over networking than it offers. I'm sympathetic to this argument but it does open up a can of worms about when one should discount the Bell Labs view on 'what is Unix' and what can replace it.) (One comment.)
unix/UnixIsNotUnixy written at 23:37:01; Add Comment
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |