Wandering Thoughts archives


Why I can't see IPv6 as a smooth or fast transition

Today I got native IPv6 up at home. My home ISP had previously been doing tunneled IPv6 (over IPv4), except that I'd turned my tunnel off back in June for some reason (I think something broke and I just shrugged and punted). I enjoyed the feeling of doing IPv6 right for a few hours, and then, well:

@thatcks: The glorious IPv6 future: with IPv6 up, Google searches sometimes just cut off below the initial banner and search box.
For bonus points, the searches aren't even going over IPv6. Tcpdump says Google appears to RSET my HTTPS TCPv4 connections sometimes.

(Further staring at packet traces makes me less certain of what's going on, although there are definitely surprise RSET packets in there. Also, when I said 'IPv6 up', I was being imprecise; what makes a difference is only whether or not I have an active IPv6 default route so that my IPv6 traffic can get anywhere. Add the default route (out my PPPoE DSL link) and the problems start to happen; delete it and everything is happy.)

Every so often someone says that the networking world should get cracking on the relatively simple job of adopting and adding IPv6 everywhere. Setting aside anything else involved, what happened to me today is why I laugh flatly at anyone who thinks this. IPv6 is simple only if everything works right, but we have plenty of existence proofs that it does not. Enabling IPv6 in a networking environment is a great way to have all sorts of odd problems come crawling out of the woodwork, some of which don't seem like they have anything to do with IPv6 at all.

It would be nice if these problems and stumbling points didn't happen, and certainly in the nice shiny IPv6 story they're not supposed to. But they do, and combined with the fact that IPv6 is often merely nice, not beneficial, I think many networks won't be moving very fast on IPv6. This makes a part of me sad, but it's the same part of me that thinks that problems like mine just shouldn't happen.

(I don't think I'm uniquely gifted in stumbling over IPv6 related problems, although I certainly do seem to have bad luck with it.)

tech/IPv6ComplicationsAgain written at 03:09:58; Add Comment


Maybe I should try to find another good mini keyboard

As I've mentioned a few times before, I've been using one particular mini keyboard for a very long time now and I've become very attached to it. It's thoroughly out of production (although I have spares) and worse, it uses a PS/2 interface which presents problems in the modern world. One solution is certainly to go to a lot of work to keep on using it anyways, but I've been considering if perhaps I shouldn't try to find a modern replacement instead.

Some people are very attached to very specific keyboards for hard to replicate reasons; just ask any strong fan of the IBM Model M. But I'm not really one of them. I'm attached to having a mini keyboard that's not too mimimal (the Happy Hacking keyboard is too far) and has a reasonably sensible key layout, and I'd like to not have space eaten up by a Windows key that I have no use for, but I'm not attached to the BTC-5100C itself. It just happened to be the best mini keyboard we found back fifteen or more years ago when we looked around for them, or at least the best one that was reasonably widely available and written about.

The keyboard world has come a long way in the past fifteen years or so. The Internet has really enabled enthusiasts to connect with each other and for specialist manufacturers to serve them and to spread awareness of their products, making niche products much more viable and thus available. And while I like the BTC-5100C, I suspect that it is not the ultimate keyboard in terms of key feel and niceness for typing; even at the time it was new, it was not really a premium keyboard. In specific, plenty of people feel that mechanical keyboards are the best way to go and there are certainly any number of mechanical mini keyboards (as I've seen on the periodic occasions when I do Internet searches about this).

So I've been considering trying USB mechanical mini keyboard, just as I've sometimes toyed with getting a three button mouse with a scroll wheel. So far what's been stopping me has been the same thing in both cases, namely how much these things cost. I think I'm willing to pay $100 for a good keyboard I like that'll probably last the near side of forever, but it's hard to nerve myself up to spending that much money without being certain first.

(Of course, some or many places offer N-day money back guarantees. While shipping things back is likely to be kind of a pain, perhaps I should bite the bullet and just do it. Especially since I have a definite history of hesitating on hardware upgrades that turn out to be significant. One of the possible keyboards is even Canadian.)

(Of course there's a Reddit board for mechanical keyboards. I'll have to read through their pages.)

Sidebar: What I want in a mini keyboard layout

Based on my experiences with trying out a Happy Hacking keyboard once (and a few other mini keyboards), my basic requirements are:

  • a separate row of function keys for F1 through F10. I simply use function keys too much to be satisfied with a very-mini layout that only has a four row layout with numbers and then the Q/A/Z letter rows (and gets at function keys via a 'FN' modifier key).

  • actual cursor keys; again, I use them too much to be happy having to shift with something to get them.

  • Backspace and Delete as separate keys. I can live with shifted Insert.
  • Esc as a real (unshifted) key. Vi people know why.

  • A SysRq key being available somehow, as I want to keep on being able to use Linux's magic SysRq key combos. This implies that I actually have to be able to use Alt + SysRq + letters and numbers.

    (I may have to give this up.)

(I think this is called a '75%' layout on Reddit.)

A sensible location for Esc would be nice but frankly I've given up on that; people have been moving Esc off to the outer edges of the keyboard for decades. The last keyboard I saw with a good layout there was the NCD Unix keyboard (which I now consider too big).

The good thing about having these basic requirements is that I can actually rule out a lot of keyboards based purely on looking at pictures of them, without having to hunt down reviews or commentary or the like.

tech/MiniKeyboardContemplation written at 02:14:35; Add Comment


On not having people split their time between different bosses

In some places, it is popular (or occasionally done) to say something like 'well, this area only has the money for 1/3rd of a sysadmin, and this area has the money for 2/3rds of a sysadmin, so I know; we'll hire one sysadmin and split her up'. It is my personal view that this is likely to be a mistake, especially as often implemented. There are at least two pathologies you can run into here.

The basic pathology is that humans are frequently terrible at tracking their own time, so it is quite likely that you are not going to wind up with the time split that you intended. Without strong work against it, it's easy to get pulled towards one side because it's more interesting, clearly needs you more, or the like, and then have that side take over a disproportionate amount of your time. Perhaps time splitting might go well if your one sysadmin is a senior sysadmin with a lot of practical experience at doing this and a collection of tools and tricks for making it work. If your one sysadmin is a junior sysadmin thrown into the lion cage with no support, guidance, tools, and monitoring, well, you're probably going to get about the results that you should expect.

The more advanced pathology is that you are putting the sysadmin in the unhappy position of having to tell people no for purely bureaucratic reasons (or to go over and above their theoretical work hours), because sooner or later one of the areas is going to want more work than fits in the X amount of the sysadmin that they are entitled to. At that point the sysadmin is supposed to say 'sorry, I switch over to area Q now, I know that you feel that your work is quite important, maybe more important than area Q's work, but I am not supposed to spend any more time on you until next week'. This is going to make people unhappy with the sysadmin, which is a stressful and unpleasant experience for them. People don't like inflicting those experiences on themselves.

(The actual practical result is likely to be either overwork or that once again the actual time split is not the time split you intended.)

I feel strongly that the consequence of both pathologies is that management or at least team leadership should be deeply involved in any such split-sysadmin situation. Management should be the ones saying 'no' to areas (and taking the heat for it), not sysadmins, and management should be monitoring the situation (and providing support and mentoring) to make sure the time is actually winding up being split the way it's intended.

(There are structural methods of achieving this, such as having areas 'purchase' X hours of work through budget/chargeback mechanisms, but they have their own overheads such as time tracking software.)

If you like, of course, you can instead blame the sysadmin for doing things wrong or not properly dividing her time or the like. This is the 'human error' explanation of problems and as always it is not likely to give you a solution to the problem. It will give you a great excuse to fire people, though. Maybe that's what you actually want.

sysadmin/MyViewTimeSplittingBad written at 01:02:34; Add Comment


Wikitext not having errors creates a backtracking problem

In the past I've written some pain points of parsing wikitext and called out how there aren't conventional parsing errors in running wikitext, just things that turn out to be plain text instead of complete wikitext markup. Some of the consequences of this may not be obvious, and in fact they weren't obvious to me until I tried to make an overly ambitious change to DWiki's markup to HTML conversion code.

The obvious problem that 'no errors' creates is that you will have to either accept closing off incomplete markup or do lookahead to verify that you seem to have a complete entity, or both. If your markup denotes links as '[[ ... ]]', you probably want to look ahead for a ']]' before you start processing a '[[' as a link. Unfortunately doing lookahead correctly is quite hard if your wikitext permits various sorts of nested constructs. Consider DWikiText, which also has '(( ... ))' to quote uninterpreted text in a monospaced font, and then parsing the following:

This example [[looks like ((it has an ending ]])) but it doesn't.

Purely textual lookahead for a ']]' gets fooled here. So let's assume we're going to get fooled sooner or later and handle this better. Rather than trying to rely on fallible lookahead, if we reach the end of a paragraph with an unclosed entity we'll go back to the start of the entity and turn it into plain text.

Unfortunately this has problems too, because something retroactively becoming plain text may change the meaning of other text after that point. Consider this contrived example:

Lorem ipsum ((this *should be emphasis* because the '((' isn't closed and thus is plain text.

If you start out parsing the (( as real, the *'s are plain text. But once the (( is just plain text, they should be creating italics for emphasis. To really retroactively change the (( to plain text, you may need to backtrack all text processing since then and redo it. And backtracking is something conventional parsing technology is generally not designed for; in fact, conventional parsing technology usually avoids it like the plague (along with aggressive lookahead).

(I think the lookahead situation gets somewhat better if you look ahead in the token stream instead of in plain text, but it's still not great. You're basically parsing ahead of your actual parse, and you'd better keep both in sync. Backtracking your actual parsing is probably better.)

All of this has caused me to feel that parsing running wikitext in a single pass is not the best way to do it. Instead I have a multi-pass approach in mind (and have for some time), although I'm not entirely convinced it's right either. I probably won't know unless (and until) I actually implement it, which is probably unlikely.

(An alternate approach would be to simply have backtracking in a conventional recursive descent parser; every time you hit a 'parse error', the appropriate construct being parsed would turn its start token into plain text and continue the parsing from there. Unfortunately this feels like it could be vulnerable to pathological behavior, which is a potential issue for a parser that may be handed user-controlled input in the form of eg comments.)

PS: How I stubbed my toe on this issue was basically trying to do this sort of 'convert back to plain text' for general unclosed font changes in DWikiText. When I did this outside of a limited context, it blew up in my face.

programming/WikitextNoErrorsBacktracking written at 02:10:30; Add Comment


I've decided I'll port DWiki to Python 3, but not any time soon

At this point I have only one significant Python 2 program that I care deeply about and that is DWiki, the pile of code that underlies Wandering Thoughts. What to do about DWiki in light of Python 3 has been something that has been worrying and annoying me for some time, because doing a real port (as opposed to a quick bring-up experiment) is going to involve a lot of wrestling with old code and Unicode conversion issues. Recently I've come around to a conclusion about what I plan to do about the whole issue (perhaps an obvious one).

In the end neither grimly staying on Python 2 forever or rewriting DWiki in something else (eg in Go) are viable plans, which leaves me with the default: sooner or later I'll port DWiki to Python 3. However I don't expect to do this any time soon, for two reasons. The first is that Python 3 itself is still being developed and in fact the Python landscape as a whole is actively evolving. As a result I'd rather defer a port until things have quieted down and gotten clearer in a few years (who knows, perhaps I'll want to explicitly revise DWiki to be PyPy-friendly by then). As far as I'm concerned the time to port to Python 3 is when it's gotten boring, because then I can port once and not worry about spending the next few years keeping up with exciting improvements that I'd like to revise my code to take advantage of.

The second reason is more pragmatic but is related to the rapid rate of change in Python 3, and it is that the systems I want to run DWiki on are inevitably going to be behind the times on Python 3 versions. Right now, the rapid rate of improvements in Python 3 means that being behind the times leaves you actively missing out on desirable things. In a few years hopefully that will be less so and a Python 3 version that was frozen a year or three ago will not be so much less attractive than a current version. This too is part of Python 3 slowing down and becoming boring.

(If you are saying 'who freezes Python 3 versions at something a few years old?', you haven't looked at long term support Linux distributions or considered how long people will run eg older FreeBSD versions. There is a long and slow pipeline from the latest Python 3 release to when it appears in OS distributions that many people are using, as I've covered before.)

I don't have any particular timeline on DWiki's Python 3 port except that I don't intend or expect to do this within, oh, the next three years. Probably I'll start looking at this seriously about the time the Python developers start clearing their throats and trying to once again persuade everyone that 2.7 support will be dropped soon, this time for sure. A clear slowdown in Python 3 development plus OS distros catching up to current versions might push that to sooner, but probably not much sooner.

Hopefully thinking through all of this and writing it down means that I can stop worrying about DWiki's future every so often. I may not be doing anything about it, but at least I now have a reasonable plan (and I've kind of made my peace with the idea of going through all the effort to get a production quality version of DWiki running under Python 3 (and yes, the amount of effort it's going to take still irritates me and probably always will)).

(Although every so often I toy with the idea of a from-scratch rewrite of DWiki in Go that addresses various things I'd do differently this time around, the reality is that DWiki's creation took place in unusual circumstances that I'm unlikely to repeat any time soon.)

python/DWikiPython3Someday written at 00:57:00; Add Comment


Do we want to continue using a SAN in our future fileserver design?

Yesterday I wrote that we're probably going to need a new Linux iSCSI target for our next generation of fileservers, which I optimistically expect us to begin working on in 2018 (when the current ones will be starting to turn four years old). But as I mentioned in an aside, there's a number of things up in the air here and one of them is the big question of whether we want to keep on using any sort of SAN at all or move to entirely local storage.

We had a number of reasons originally for using an iSCSI SAN, but in practice many of them never got used much. We've made minimal use of failover, we've never expanded a fileserver's storage use beyond the pair of backends that 'belong' to it, and while surviving single backend failures was great (cf), a large part of those backend failures was because we bought inexpensive hardware. If our current, significantly better generation of hardware survives to 2018 without similar large scale failures, I think there could be a real question about carrying on the model.

I've written about our general tradeoffs of a SAN versus disk servers and they remain applicable. However, two things have changed since writing that last year. The first is that we now have operational experience with a single fileserver that has a pile of disk space and a pile of load on it, and our experience overall is that we wish it was actually two fileservers instead. Even when we aren't running into OmniOS issues, it is the fileserver that is most prone to have problematic load surges and so on, simply because it has so much disk space and activity on it. One thing this has done is change my opinion about how big a disk server we'd want to have; instead of servers as big as our current fileservers with their paired backends, I now think that servers roughly half the size would be a good choice (ie, with 8 pairs of data disks).

The second is that I now believe we're going to have a real choice of viable OSes to run ZFS on in 2018, and specifically I expect that to include Linux. If we don't need iSCSI initiator support, need only a medium number of disks, and are willing to pay somewhat extra for good hardware (as we did this generation by avoiding ESATA), then I think hardware support in our OS of choice is likely to be much less of an issue. Put bluntly, both Linux and FreeBSD should support whatever disk controller hardware we use and it's reasonably likely that OmniOS will as well.

There are unquestionably downsides to moving away from a SAN (as I covered, and also). But there are also attractive simplifications, cost savings, and quite possibly performance increases (at least in an all-SSD environment). Moving away from a SAN is in no way a done deal (especially since we like the current environment and it's been quite good for us) and a lot can (and will) change between now and 2018, but the thought is now actively in my mind in a way that it wasn't before.

(Of course, part of this is that occasionally I play around with crazy and heretical what-if thoughts about our architecture and systems. 'What if we didn't use a SAN' is just one iteration of this.)

sysadmin/NonSANPossibleFuture written at 01:20:47; Add Comment


We're probably going to need new Linux iSCSI target software

When I think ahead to our theoretical 2018 fileserver refresh, one of my thoughts is that we're probably going to need new iSCSI target software. We're currently using IET and while we like it and there's is nothing deeply wrong with it I have to admit that it lacks some moderately important features and the pace of its development is what could politely be called 'quiet'. In fact it's sufficiently quiet that I don't know if IET will be adapted to future Linux kernels, and by 2018 even 'enterprise' long term support distros will likely be using future kernels.

If we're going to change iSCSI target software the obvious choice is the LIO target, which is the current in-kernel implementation and hopefully also the future one (the kernel changed implementations once already). There are other alternatives (the ArchLinux wiki has a decent overview), but none of them seem compelling enough to go outside the standard kernel and thus what Linux distributions will package the tools for and support as (relatively) standard.

(On the flipside, I haven't conducted any sort of deep evaluation of the other options. I wasn't impressed with anything apart from IET in my original evaluation, but that was years ago.)

I've looked into LIO some and I can't say I'm terribly enthused, for two reasons. LIO configuration is rather complicated, and it really wants to be done through a command line tool instead of a configuration file (and an interactive one at that); the latter is a bad flaw that I've written about before. LIO's tool save the resulting live configuration in JSON file(s) and in theory you can create the file yourself by hand. LIO also has a Python API, rtslib, so another option would be to create our own program to set up the iSCSI target configuration (either once or on boot) from a simpler file format.

At some point I'm going to need to test and experiment with LIO. However I don't know if it's worthwhile to do it just yet as opposed to about two years from now, since a lot can change in that sort of time.

(In a way, worrying about specific software is silly at this point. Things in the open source world can change drastically over two years and anyways there are any number of things that are up in the air about a future fileserver design. I just think about this now because I've wound up thinking that IET is getting long in the tooth and kind of neglected by now, so we're going to have to do something about it sooner or later.)

linux/NewLinuxISCSITargetThoughts written at 03:08:25; Add Comment


One thing I'm hoping for in our third generation fileservers

If all goes according to my vague schedule, we should be at least starting to plan our third generation of fileservers in 2018, when our second generation fileservers are four years old. 2018 is not all that far off now, so every so often I think a bit about what interesting things might come up from the evolution of technology over the next few years.

Some things are obvious. I certainly hope our entire core network is reliable 10G (copper) Ethernet by 2018, for example, and I optimistically hope for at least doubling and ideally quadrupling the memory in fileservers (from 64 GB to 128 GB or 256 GB). And it's possible that we'll be completely blindsided by some technology shift that's currently invisible (eg a large scale switch from x86 to ARM).

(I call a substantial increase in RAM optimistic because RAM prices have been remarkably sticky for several years now.)

One potential change I'm really looking forward to is moving to all-SSD storage. Running entirely on SSDs would likely make a clear difference to how responsive our fileservers are (especially if we go to 10G Ethernet too), and with the current rate of SSD evolution it doesn't seem out of the bounds of reality. Certainly one part of this is that the SSD price per GB of storage keeps falling, but even by 2018 I'll be surprised if it's as cheap as relatively large HDs. Instead one reason I think it might be feasible for us is that the local demand for our storage just hasn't been growing all that fast (or at least people's willingness to pay for more storage seems moderate).

So let me put some relatively concrete numbers on that. Right now we're using 2 TB HDs and we have only one fileserver that's got more than half its space allocated. If space growth stays modest through 2018, we could likely replace the 2 TB HDs with, say, 3 TB SSDs and still have growth margin left over for the next four or five years. And in 2018, continued SSD price drops could easily make such SSDs cost about as much as what we've been paying for good 2TB 7200 RPM HDs. Even if they cost somewhat more, the responsiveness benefits of an all-SSD setup are very attractive.

(At a casual check, decent 2TB SSDs are currently somewhere around 4x to 5x more expensive than what we paid for our 2 TB HDs. Today to the start of 2018 gives them two years and a bit to cover that price ground, which may be a bit aggressive.)

tech/SSDFileserverHope written at 02:04:19; Add Comment


The Go 'rolling errors' pattern, in function call form

One of the small annoyances of Go's explicit error returns is that the basic approach of checking error returns at every step is annoying when all the error handling is actually the same. You wind up with the classic annoying pattern of, say:

s.f1, err = strconv.ParseUint(fields[1], 10, 64)
if err != nil {
   return nil, err
s.f2, err = strconv.ParseUint(fields[2], 10, 64)
if err != nil {
   return nil, err
[... repeat ...]

Of course, any good lazy programmer who is put into this starting situation is going to come up with a way to aggregate that error handling together. Go programmers are no exception, which has led to what I'll call a generic 'rolling errors' set of patterns. The basic pattern, as laid out in Rob Pike's Go blog entry Errors are values, is that as you do a sequence of operations you keep an internal marker of whether errors have occurred; at the end of processing, you check it and handle any error then.

Rob Pike's examples all use auxiliary storage for this internal marker (in one example, in a closure). I'm a lazy person so I tend to externalize this auxiliary storage as an extra function argument, which makes the whole thing look like this:

func getInt(field string, e error) (uint64, error) {
   i, err := strconv.ParseUint(field, 10, 64)
   if err != nil {
      return i, err
   return i, e

func .... {

   var err error
   s.f1, err = getInt(fields[1], err)
   s.f2, err = getInt(fields[2], err)
   s.f3, err = getInt(fields[3], err)

   if err != nil {
      return nil, err

This example code does bring up something you may want to think about in 'rolling errors' handling, which is what operations you want to do once you hit an error and which error you want to return. Sometimes the answer is clearly 'stop doing operations and return the first error'; other times, as with this code, you may decide that any of the errors is okay to return and it's simpler if the code keeps on doing operations (it may even be better).

(In retrospect I could have made this code just as simple while still stopping on the first error, but it didn't occur to me when I put this into a real program. In this case these error conditions are never expected to happen, since I'm parsing what should be numeric fields that are in a system generated file.)

As an obvious corollary, this 'rolling errors' pattern doesn't require using error itself. You can use it with any running or accumulated status indicator, including a simple boolean.

(Sometimes you don't need the entire infrastructure of error to signal problems. If this seems crazy, consider the case of subtracting two accumulating counters from each other to get a delta over a time interval where a counter might roll over and make this delta invalid. You generally don't need details or an error message here, you just want to know if the counter rolled over or not and thus whether or not you want to disregard this delta.)

programming/GoRollingErrors written at 00:23:46; Add Comment


When chroot() started to confine processes inside the new root

Writing about the somewhat surprising history of chroot() did leave me with one question: when did chroot() start to confine processes inside the new root directory hierarchy? This is an interesting moment because it marks the point where chroot() stops being a little hack to help emulation and instead turns into a security feature.

(The first use of chroot() as a security feature seems to be in the 4.2BSD ftpd, as covered in the first entry. I can't be completely sure of this because I can't find an easily searchable version of the tuhs.org 4.1c BSD tree.)

Early versions of chroot() appear to be trivially escapable by things like 'cd /; cd ..', which puts you in the parent of the nominal root directory. A version of the chroot() system call that did not allow this appears in 4.1c BSD; you can see the code in namei(). Unlike the 4BSD version of the same code, this code specifically checks to see if you are trying to look up '..' at the chroot root directory, and remaps the result if you are.

I don't know for sure why this change appeared in 4.1c BSD, but it's possible to speculate. The 4BSD namei() is essentially the same as the V7 namei(), but the 4.1c BSD namei() is significantly changed in several ways (for example, it has a lot more comments). 4.1c BSD is the first appearance of two significant changes related to namei(); it's when BSD introduced both a rename() system call and the BSD FFS. It also seems to have seen a significant reorganization of the kernel source code away from its previous V7-like appearance. So I suspect that when the BSD people were changing namei() around anyways because of other changes, they noticed and fixed the chroot escape. With the chroot escape fixed, it was then used as a security feature in the 4.2BSD ftpd.

(The history portion of the Wikipedia page on chroot is no help, because it's clearly wrong unless you creatively reinterpret what it's saying. chroot() was not 'added' to BSD at any point, because BSD inherited it from V7 from the start. This bit of history appears to come from the references section of FreeBSD's Jails: Confining the omnipotent root (via) from 2000 and may refer either to the addition of a chroot(2) manpage or the namei() changes.)

Sidebar: The peculiar history of chroot() documentation

In V7, as I discovered, chroot() is documented in the chdir() manpage. However, while 32V, 3BSD, and 4BSD all still have the chroot() system call, documentation for it has disappeared from their chdir() manpages. A chroot() manpage (re)appears only in 4.1c BSD.

The 32V chdir() manpage seems to be the V7 manpage with the chroot() documentation removed (and it definitely isn't the V6 chdir() manpage). It may be that the chroot() stuff was removed because the 32V people thought it was a hack that was better off not being documented, or maybe 32V got their manpages from an earlier version of V7 that didn't have the chroot() addition.

unix/ChrootHistoryII written at 02:16:10; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.