Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.
|
2010-09-09 Go, IPv6, and single binding machinesThe current libraries for the Go language and their built in tests strongly believe that you can talk to IPv4 addresses through IPv6 sockets, which is not necessarily the case. This is a known issue (see also), and is more than somewhat inconvenient on a machine with dual binding turned off, such as my workstation, as Go will not install from source unless all its tests pass. (Since Debian has apparently changed their minds about dual binding, this may not affect very many people. I maintain that it should, although it's now a quixotic battle that I am probably not going to win any time soon.) If this affects you, the simple fix is probably to just apply the
patch from Joerg Sonnenberger that's (currently)
at the end of Go issue 679. I opted for a
slightly different fix, because I wanted to force Go to use IPv4
sockets where possible. Thus, I forced (A more thorough fix for While this is an incomplete hack with some limits, I think it is
generally going to do what I want from Go even with servers, provided that I am careful
(basically I can't mix an explicit IPv4 server with a Go-based IPv6
one). A better fix would be to change the code to explicitly force
(One of the attraction of Go is that it looks familiar enough to me that I can fumble my way through this sort of chainsaw modifications and usually get them to work.) As a side note: since OpenBSD doesn't allow dual binding under any circumstances, this is going to be a real issue if anyone ever attempts to port Go to OpenBSD. I suspect that the solution will be to turn off a bunch of tests.
2010-09-07 My new view of DomainKeysNow that I actually understand how DKIM works, I have a much better view of it than I used to and as a result it's much more attractive and likely to be implemented here some day. The way I choose to look at it is that for us, it is essentially a lightweight anti-tampering feature. We can sign outgoing messages to transparently add some basic integrity protection for our users' mail, and verify inbound DKIM signatures for a basic integrity check. This makes using it a lot like our existing habit of using TLS SMTP whenever possible, ie it does some good for frustrating bad people and basically no harm. (We're not going be able to actually start using DKIM any time soon, because all of our mail servers are running Exim versions that are too old to have DKIM support. The inbound gateway is due to be upgraded soon since it's still running Ubuntu 6.06, but the mail submission machine is running RHEL 5 which is good for years yet. And supporting DKIM is in no way important enough to justify compiling a local version of Exim from source.) The one fly in the ointment would be if people implementing DKIM checkers (whether in MTAs or MUAs) had made my mistake and are reading more into DKIM than is actually there, especially if they're treating it as a kind of SPF. In that case, adding DKIM DNS data and DKIM signatures to some of our outgoing email might harm the delivery prospects of email from us and our users that wasn't DKIM signed by us, as would happen for, eg, email that our users send from GMail using their CSLab addresses. Hopefully this is not the case. (Making it harder for our users to use GMail would be, to put it one way, an extremely unpopular move.) Sidebar: Where we would put DKIM in our incoming email flowOur inbound spam and virus filtering can alter the message body (if
a virus is removed) and the (Logically this implies that we should re-sign messages when we forward them, but I'm not quite comfortable with that for various reasons.) If we do DKIM validation after the filtering, all virus-cleaned messages
will fail DKIM checks and most spam-scored messages will (the
2010-09-06 Sorting out DomainKeys and understanding its limitsOkay, first a disclaimer: most everyone talks about DomainKeys, but it is formally DomainKeys Identified Mail (DKIM). Plain 'DomainKeys' is the name of the earlier specification that was folded into DKIM. Until I started digging into this in detail, I had the basic idea that
DKIM signed email header fields, crucially including the It turns out that this is totally wrong, and the Wikipedia DKIM page even sort of explains how it is wrong, if you read it carefully. Simplified, DKIM is nothing more and nothing less than a way
of letting a domain take authoritative responsibility for a
'message', where the message is the email body plus selected message
headers (which headers are up to the DKIM signer, but the RFC requires Thus, when GMail DKIM-signs their outgoing email all they are saying
is 'this really originated on GMail, and you can verify that it has
not been tampered with in transit'. They are not saying anything about
whether the email really came from who it claimed to come from in
the (As it happens, GMail does try to verify It follows that you cannot use DKIM for two useful things without outside knowledge:
As far as I can tell, without such advance policy knowledge there are only two useful things that you can do with a DKIM signature. First, if there is a signature but it does not validate, either the message has been tampered with in transit (possibly accidentally, possibly due to having a virus sliced out of it by someone's mail filters) or the header has been forged. Second, if the signature does validate you theoretically have someone to blame if the message is spam or otherwise bad (not that this does you much good in practice).
2010-09-05 An observation from changing my passwordI've changed my password at work, or started to change it at least (this will be an extended process). Doing this has reinforced some things that I know but rarely think about, and exposed a surprising inconvenience in how I do things. The big thing is that you don't really remember how many machines you have accounts on until you try to work out how many different places you need to change your password. This is not really an issue for users (if us sysadmins are doing our job right, they change their password once and it magically propagates everywhere), but as a sysadmin I have access to all sorts of isolated machines that are not part of our password propagation system. Which means that I get to change my password on all of them, assuming that I can remember what they all are. (In looking at this, I see that The surprising inconvenience is that I have set up ssh identities to give me passwordless access to my account on most machines; in fact, a lot of my usual environment relies on it. This did not strike me as a problem until I changed my password and suddenly started wanting to type the new one as much as possible to reinforce it in my mind and my fingers. Suddenly all of that passwordless access was inconvenient as well as convenient, since it meant that I'm really not typing my password all that much. This has both surprised and amused me, because sometimes I am easily amused by the perversities of life. (Turning my ssh identities off completely would likely make various
parts of my environment explode in even less convenient ways, so I've
resorted to modifying an (2 comments.)
sysadmin/PasswordChangeNotes written at 23:57:53; Add Comment
A plan to deal with my feed reader problemI have a feed reader problem, one that has long ago reached epic levels: in practice, I'm not actually really reading feed entries. For years, Liferea has been telling me that I have thousands of unread entries and I have been ignoring them. I think it's time to declare feed reader bankruptcy (which is much like email bankruptcy) and deal honestly with the results. (This will be a bit traumatic, because I'm somewhat obsessive about some things. It hurts to consciously and deliberately throw away unread entries.) In thinking about this, I have realized that I have two sorts of feeds that I follow: casual reading feeds, that I keep around so that I have something to browse when I'm feeling bored and want to poke at their topic, and feeds that I am strongly interested in and want to read all or almost all of, even if it takes me a while. If I'm being honest about it, almost all of the feeds I currently have in Liferea are casual reading feeds (which is one reason I keep not reading them). So here's my current plan for dealing with all of this: A certain amount of the casual feeds are simply going to be discarded (a process that I've already started); I'll trust that anything worth reading that they produce will show up on the usual link sources that I browse (such as Hacker News). The rest of them will go into Google Reader, because Google Reader will quietly expire old unread entries for me. Throwing away old entries to keep the volume manageable is exactly the behavior I want for casual feeds. (Google Reader is also better for casual browsing because I can use it from anywhere. Liferea is tied to a particular machine.) My important feeds will stay in Liferea where I can exert more control over them, for example deciding exactly when they expire (or don't). I will probably also find some feeds that are more convenient to read in Liferea than in Google Reader. If I do this right I will have only a relatively small number of feeds in Liferea, and they will generally not have many unread entries. I'm not sure that this will actually work, but I'll have to see how it goes. Something certainly needs to change; thousands of unread feed entries that just keep expiring off the bottom of feeds just don't work. (They 'work' in one sense, but they create a kind of mental pressure that makes me avoid having much to do with them. Right now I avoid entire categories of feeds in Liferea because of all of the unread entries.) PS: if I'm being honest with myself, I should probably throw away at least half of my casual feeds. Many of them were added because they looked sort of interesting, way back in my early days of feed reading enthusiasm when I felt that I had a lot more time for this. Rather than putting them in Google Reader only to ignore them, I should just save them in a file somewhere. (This reminds me rather vividly of mailing lists, and if I go far enough back, Usenet. I went through much the same pattern with them that I am going through with feeds now, and if I got into something like Twitter I suspect that I would go through the same pattern with it too.) (2 comments.)
tech/DealingWithMyFeeds written at 00:30:37; Add Comment
2010-09-04 The laziness of a programmer, illustratedAt work, I have fallen into the bad habit of keeping a lot of iconified Firefox windows around, full of various things that I am going to read sometime (honest). As I've mentioned before, I have all of these iconified windows very carefully placed and organized so that I can find them again and keep track of them. Naturally, this makes quitting and restarting Firefox kind of a pain. I have Firefox set to preserve all of the active windows and tabs over restarts, but it doesn't preserve the positions of the iconified windows (and it doesn't entirely preserve the regular window position either); any time I have to start Firefox again I have to re-position all of those icons. Generally this means that I don't; I never exit Firefox unless I'm forced to, because it's such a pain to get everything set up again. (Which implies that I never log out, either; I just leave my screen locked.) Recently I got tired of this (in the aftermath of my Fedora 13 upgrade, I've been restarting things more than usual). Thus I decided that clearly there had to be a way to fish around in the depths of X to find the current icon positions, so I could write a quick script that recorded them in a file and then shuffled the icons back into the right spots for me. (This is less crazy than it sounds; I already have command line utilities to reposition windows, and X comes with a fair number of commands to poke at various aspects of window state.) I'll cut to the chase: yes, except that it wasn't exactly a quick script. The most convenient way of doing this turned out to be writing an FVWM module in Perl that finds out all of this information and writes a file of FVWM commands that can be loaded back in to FVWM to (re)position and (re)iconify all of my Firefox windows just right. In the process of doing this I had to remember my Perl, look up a certain amount of Perl's OO support (my last serious Perl programming pretty much predates it), and figure out how to work with FVWM's underdocumented Perl bindings. (FVWM has no current Python bindings for would be module authors.) But all of this was less work than continuing to re-position all of my Firefox windows by hand. Honest. (The resulting module is sort of theoretically general. If you are really interested, see here. As a bonus, you get to laugh at my hack-job Perl.) (3 comments.)
programming/ProgrammerLaziness written at 01:10:22; Add Comment
2010-09-03 Finally understanding the attraction of AJAXI'll admit it; I'm slow sometimes. For a very long time now I haven't really gotten why people keep sprinkling AJAX over their web pages (partly because I assiduously use NoScript and so mostly don't see it). Oh, I understood that you needed it to create actual applications on the web and that it could be convenient for making vaguely friendly things, but I didn't really understand it in the context of relatively ordinary web apps like DWiki. But my recent thinking about my comment form design mistake has finally fixed that. Here is my recent insight in a nutshell:
In a conventional non-AJAX web interface, any significant action forces a (full) page reload. This creates a visible page refresh except in extremely ideal circumstances and in general means that the user has to find their place again and reorient themselves. This is sort of tolerable if what the user is working on fits entirely inside their browser window; it's fairly horrible if it doesn't and they have to actively scroll around to find where they were before. This is the core problem I have with a revision to my comment form design; I'm pretty sure that people would get lost among everything else going on. (The ideal circumstances are that you're using fragment identifiers in the URL, the browser accurately repositions things back at the fragment identifier, and the entire system loads the new page so fast that there is no visible flicker.) In an AJAX web interface the user can perform actions without this lurching jump. For example, when they click on 'add comments', they don't get yanked to a new page; instead, a comment form unfolds right then and there in front of them. This is less confusing in two ways. First, it is happening right in front of you, clearly visible. Second, it is the only thing that is happening; you don't have to pick out the significant change from all of the other flickering and movement and so on that's going on as the page reloads. This creates a more fluid, less disorienting interface, one that is easier and faster to work with because you spend more time doing what you're interested in and less time finding your place again every so often. In a sense, the result is much closer to a direct manipulation interface than a standard, non-AJAX web page can manage. I don't think that there's any way to pull this off without AJAX; you really need some way to do a partial page content update without anything else flickering or moving. That's just not something that browsers offer (you don't even get it on plain user-initiated page refresh). (I suspect that this is old hat for people in the field, but all of it only clicked for me when I started really thinking over the problem of people getting lost in my comment form under various circumstances, cf TemplateLimitations.) PS: looking backwards, this makes me slightly more sympathetic to old HTML frames. Although they were almost never used this way, you can argue that they were a crude first attempt at the sort of limited page update you'd need to pull this off. (4 comments.)
web/FinallyGettingAJAX written at 01:11:31; Add Comment
2010-09-02 Why Python's
|
| 0 | No filtering is done. |
| 1 | Packets are discarded if they come in on any interface except the one that a reply to the source IP would go out on. |
| 2 | Packets are discarded if a reply to the source IP could not be sent out any interface. |
(A more formal description is in ip-sysctl.txt in the kernel documentation. Like all interface sysctls, it can be set separately for each interface, as a default, and for all interfaces.)
I don't understand how this can possibly work. Well, I understand how it works, I just don't understand how it can possibly do any good in most configurations. And I don't understand how a setting of '1' can possibly work at all in multihomed configurations where the multihomed machine is not the sole router for every network it's connected to that is not where its default route points.
First, as far as I can tell a setting of '2' is equivalent to '0' if you have a default route set (the usual case). With a default route set, all source IPs are reachable and so '2' will never discard packets, which is exactly the same as '0'.
For a machine with a single network interface and a default route, all settings are equivalent (for the same reason as above; all source IPs are reachable through your single interface). If you do not have a default route, either '1' or '2' will discard packets that come from networks you do not have routes for.
It is the multihomed case where things explode. Suppose that you have
a multihomed host with two network interfaces, net-1 and net-2, with
IP-1 on net-1 and IP-2 on net-2. With an rp_filter value of 1, a
machine on net-2 cannot talk to this machine's IP-1 address unless the
packets pass through the multihomed machine on the way to net-1, ie the
multihomed machine is the router for the net-2 machine. If the packets
go through another router, they will arrive on the multihomed machine's
net-1 interface but the replies would go out the net-2 interface, so
they fail the check.
Effectively this creates a bad version of an isolated interface, with the packet reachability restrictions but without the multiple split routing tables that make multihomed hosts actually work. As a bonus it hides the restriction deep in the networking sysctls, where you have to be an expert to find it.
(I suppose that there are some advantages to this half-hearted approach, in that it avoids some limits in the policy based routing version of it.)
By the way, I stumbled over this courtesy of Ubuntu 10.04 setting
rp_filter to 1 by default. We have multihomed non-routing machines,
and when we set up an Ubuntu 10.04 test version things promptly
exploded. If I was not already suspicious of network sysctls, we could
have spent quite a lot of time trying to find out just why the machine
was ignoring certain sorts of network traffic.
(As it was I did 'sysctl -a | fgrep net. | sort' on both a 10.04
and an 8.04 machine and then looked for settings that were different.
Ubuntu 10.04 may not be the first version that sets this, but 8.04
definitely didn't.)
PS: a much more useful version of this sysctl would be a 'private' flag on interfaces. If an interface had the private flag set, packets with a source IP address that was routed through that interface would only be accepted on that interface; all other interfaces would discard such packets.
2010-08-30
I spent part of today writing a quick one-off data conversion program. The core of it was a function that filtered items from a list through a number of things in order to sort them into the right category. Once the dust settled on all of the sorting needed, the function had quite a lot of stock arguments, things that didn't vary from call to call in my program. In fact, an unwieldy number of them.
There are at least three vaguely Pythonic options for how to deal with this (plus how I actually did), but what interests me in retrospect is the one answer that I didn't even think about. Namely, global variables.
There are all sorts of reasons to avoid global variables in general, but this was a one-off program and if I'm being honest, that's what all of those stock parameters really were. I was making them local variables in the calling function and then passing them in to the classifying function not so much because it was a good idea but because that's what I do in Python. I just don't use global variables very much even when they'd arguably make sense, and when I do use them I feel irritated.
As best I can tell, what does it is the pesky global keyword. Having
to declare variables global any time I want to rebind them adds just
enough extra friction to using global variables in practice that I would
rather not bother and instead pass lots of things around as parameters.
I generally resort to global variables only when passing the same
information as parameters would add arguments to too many layers of
function calls.
(This is the situation where you have four or five layers of function calls and some of the stuff down at the leaves wants to gather some expensive piece of information only once. The nominally logical thing to do is to call the 'gather information' function once at the start of your program and then pass the parameter all the way down to the leaves, but that means you have to pass the information object through all of the intermediate layers, where all it does is clutter up parameter lists. Really, you want to put it in a global variable, especially if you have several different clusters of these functions that want different chunks of information; passing the information they need down as parameters doesn't scale.)
Part of the friction is the annoyance of the extra line in any function
that will rebind the global variable. But another part is just having to
think about it at all, partly because I sort of consider global to be
a wart (especially because I know what the bytecode is doing behind
the scenes).
(Global's not really a wart, but that's another entry.)
The three Python options that immediately come to mind are:
Since this was a quick hack, I was lazy and did the poor man's structure: I made a tuple with all of the stock parameters and just passed in the tuple (and then unpacked it in the classifying function). This is less aesthetically pleasing than a structure, but also less code, and it is the obvious next step when one's parameter list spirals out of control and most of it is the same from call to call.
(My eventual code had two arguments that varied from call to call and six that were the same, packed into a tuple. I'm sure that this is a code smell, but it was a quick hack.)
These are my WanderingThoughts
(About the blog)
GettingAround
Full index of entries
Recent comments
This is part of CSpace, and is written by ChrisSiebenmann.
* * *
Atom feeds are available; see the bottom of most pages.
Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web