Wandering Thoughts: Recent Entries

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.

2010-09-09

Go, IPv6, and single binding machines

The current libraries for the Go language and their built in tests strongly believe that you can talk to IPv4 addresses through IPv6 sockets, which is not necessarily the case. This is a known issue (see also), and is more than somewhat inconvenient on a machine with dual binding turned off, such as my workstation, as Go will not install from source unless all its tests pass.

(Since Debian has apparently changed their minds about dual binding, this may not affect very many people. I maintain that it should, although it's now a quixotic battle that I am probably not going to win any time soon.)

If this affects you, the simple fix is probably to just apply the patch from Joerg Sonnenberger that's (currently) at the end of Go issue 679. I opted for a slightly different fix, because I wanted to force Go to use IPv4 sockets where possible. Thus, I forced preferIPv4 in src/pkg/net/ipsock.go to true and applied only his patch to src/pkg/net/sock.go to always turn off IPV6_V6ONLY on IPv6 sockets.

(A more thorough fix for preferIPv4 would be to test if the kernel let you bind IPv4 addresses to an IPv6 socket. But I didn't feel like going to that much effort for what is ultimately a quick hack that the Go maintainers are unlikely to support.)

While this is an incomplete hack with some limits, I think it is generally going to do what I want from Go even with servers, provided that I am careful (basically I can't mix an explicit IPv4 server with a Go-based IPv6 one). A better fix would be to change the code to explicitly force IPV6_V6ONLY only when we are using IPv6 sockets with IPv4 addresses, and I may try that fix at some point when I feel more ambitious about hacking up the innards of Go packages.

(One of the attraction of Go is that it looks familiar enough to me that I can fumble my way through this sort of chainsaw modifications and usually get them to work.)

As a side note: since OpenBSD doesn't allow dual binding under any circumstances, this is going to be a real issue if anyone ever attempts to port Go to OpenBSD. I suspect that the solution will be to turn off a bunch of tests.

linux/GoIpv6DualBinding written at 00:00:17; Add Comment

2010-09-07

My new view of DomainKeys

Now that I actually understand how DKIM works, I have a much better view of it than I used to and as a result it's much more attractive and likely to be implemented here some day.

The way I choose to look at it is that for us, it is essentially a lightweight anti-tampering feature. We can sign outgoing messages to transparently add some basic integrity protection for our users' mail, and verify inbound DKIM signatures for a basic integrity check. This makes using it a lot like our existing habit of using TLS SMTP whenever possible, ie it does some good for frustrating bad people and basically no harm.

(We're not going be able to actually start using DKIM any time soon, because all of our mail servers are running Exim versions that are too old to have DKIM support. The inbound gateway is due to be upgraded soon since it's still running Ubuntu 6.06, but the mail submission machine is running RHEL 5 which is good for years yet. And supporting DKIM is in no way important enough to justify compiling a local version of Exim from source.)

The one fly in the ointment would be if people implementing DKIM checkers (whether in MTAs or MUAs) had made my mistake and are reading more into DKIM than is actually there, especially if they're treating it as a kind of SPF. In that case, adding DKIM DNS data and DKIM signatures to some of our outgoing email might harm the delivery prospects of email from us and our users that wasn't DKIM signed by us, as would happen for, eg, email that our users send from GMail using their CSLab addresses. Hopefully this is not the case.

(Making it harder for our users to use GMail would be, to put it one way, an extremely unpopular move.)

Sidebar: Where we would put DKIM in our incoming email flow

Our inbound spam and virus filtering can alter the message body (if a virus is removed) and the Subject: header (if a virus is removed or spam is detected). My current opinion is that we should thus do DKIM signature validation before spam and virus filtering, essentially treating it as a validation of the message state before we started mucking around with the message. I can justify this from a message integrity view by saying that our alterations are automatically okay (well, from our perspective).

(Logically this implies that we should re-sign messages when we forward them, but I'm not quite comfortable with that for various reasons.)

If we do DKIM validation after the filtering, all virus-cleaned messages will fail DKIM checks and most spam-scored messages will (the Subject: header is usually part of the signature). I don't think that that's useful, whereas it could be somewhat useful to know if a spam or virus message was altered in transit (or sent by a spammer that forged the DKIM signature) or clearly came that way originally.

spam/DKIMView written at 23:30:31; Add Comment

2010-09-06

Sorting out DomainKeys and understanding its limits

Okay, first a disclaimer: most everyone talks about DomainKeys, but it is formally DomainKeys Identified Mail (DKIM). Plain 'DomainKeys' is the name of the earlier specification that was folded into DKIM.

Until I started digging into this in detail, I had the basic idea that DKIM signed email header fields, crucially including the From:, in order to assert that the email had been sent from a valid mail server for the sender's domain. In other words, I thought it was the message header version of SPF (which attempts to authorize the envelope sender); because it worked on message headers instead of the envelope sender, it avoided SPF's problems with forwarded email.

It turns out that this is totally wrong, and the Wikipedia DKIM page even sort of explains how it is wrong, if you read it carefully.

Simplified, DKIM is nothing more and nothing less than a way of letting a domain take authoritative responsibility for a 'message', where the message is the email body plus selected message headers (which headers are up to the DKIM signer, but the RFC requires From: to be included). Note especially that this responsible domain does not necessarily have anything at all to do with the domain of the From: header (and the RFC update is very explicit about this); you can't conclude that the From: is valid or honest from the presence of a valid DKIM signature, or conversely that the From: header is forged because the message is lacking a DKIM signature from the domain.

Thus, when GMail DKIM-signs their outgoing email all they are saying is 'this really originated on GMail, and you can verify that it has not been tampered with in transit'. They are not saying anything about whether the email really came from who it claimed to come from in the From: header, at least not in the DKIM signature; what From: addresses GMail lets people use is a local policy issue that is outside of what DKIM has anything to say about.

(As it happens, GMail does try to verify From: addresses before it lets people use them.)

It follows that you cannot use DKIM for two useful things without outside knowledge:

  • you cannot use DKIM to verify that email From: your bank is actually from your bank, unless you already know that your bank always sends email with a DKIM signature pointing to its own domain and thus that email without a DKIM signature or with a valid one that points to another domain is invalid by policy.

  • you cannot use DKIM to verify that email really originated at a domain's mail servers unless you have already know that the domain always, without any exceptions at all, DKIM-signs all outgoing email and thus an email message with anything but a valid signature from that domain is invalid by policy.

As far as I can tell, without such advance policy knowledge there are only two useful things that you can do with a DKIM signature. First, if there is a signature but it does not validate, either the message has been tampered with in transit (possibly accidentally, possibly due to having a virus sliced out of it by someone's mail filters) or the header has been forged. Second, if the signature does validate you theoretically have someone to blame if the message is spam or otherwise bad (not that this does you much good in practice).

spam/UnderstandingDKIM written at 22:50:53; Add Comment

2010-09-05

An observation from changing my password

I've changed my password at work, or started to change it at least (this will be an extended process). Doing this has reinforced some things that I know but rarely think about, and exposed a surprising inconvenience in how I do things.

The big thing is that you don't really remember how many machines you have accounts on until you try to work out how many different places you need to change your password. This is not really an issue for users (if us sysadmins are doing our job right, they change their password once and it magically propagates everywhere), but as a sysadmin I have access to all sorts of isolated machines that are not part of our password propagation system. Which means that I get to change my password on all of them, assuming that I can remember what they all are.

(In looking at this, I see that usermod on Linux machines actually has an option to just staple a new encrypted password into place. This reduces the problem to running a command as root on most of those machines, which is a mostly solved problem around here. In fact, I was already using 'run a command everywhere' to check /etc/shadow to see if I'd updated my password by looking at the last-changed field.)

The surprising inconvenience is that I have set up ssh identities to give me passwordless access to my account on most machines; in fact, a lot of my usual environment relies on it. This did not strike me as a problem until I changed my password and suddenly started wanting to type the new one as much as possible to reinforce it in my mind and my fingers. Suddenly all of that passwordless access was inconvenient as well as convenient, since it meant that I'm really not typing my password all that much. This has both surprised and amused me, because sometimes I am easily amused by the perversities of life.

(Turning my ssh identities off completely would likely make various parts of my environment explode in even less convenient ways, so I've resorted to modifying an ssh cover script I already had lying around to turn this off, and using the cover script periodically just to reinforce things. You might wonder why I have an ssh cover script lying around, one that I do not mind hacking up this way; the answer is that it's set up to ignore my known-hosts file, which is very convenient when you keep reinstalling virtual machines that you want to ssh in to.)

sysadmin/PasswordChangeNotes written at 23:57:53; Add Comment

A plan to deal with my feed reader problem

I have a feed reader problem, one that has long ago reached epic levels: in practice, I'm not actually really reading feed entries. For years, Liferea has been telling me that I have thousands of unread entries and I have been ignoring them. I think it's time to declare feed reader bankruptcy (which is much like email bankruptcy) and deal honestly with the results.

(This will be a bit traumatic, because I'm somewhat obsessive about some things. It hurts to consciously and deliberately throw away unread entries.)

In thinking about this, I have realized that I have two sorts of feeds that I follow: casual reading feeds, that I keep around so that I have something to browse when I'm feeling bored and want to poke at their topic, and feeds that I am strongly interested in and want to read all or almost all of, even if it takes me a while. If I'm being honest about it, almost all of the feeds I currently have in Liferea are casual reading feeds (which is one reason I keep not reading them).

So here's my current plan for dealing with all of this:

A certain amount of the casual feeds are simply going to be discarded (a process that I've already started); I'll trust that anything worth reading that they produce will show up on the usual link sources that I browse (such as Hacker News). The rest of them will go into Google Reader, because Google Reader will quietly expire old unread entries for me. Throwing away old entries to keep the volume manageable is exactly the behavior I want for casual feeds.

(Google Reader is also better for casual browsing because I can use it from anywhere. Liferea is tied to a particular machine.)

My important feeds will stay in Liferea where I can exert more control over them, for example deciding exactly when they expire (or don't). I will probably also find some feeds that are more convenient to read in Liferea than in Google Reader. If I do this right I will have only a relatively small number of feeds in Liferea, and they will generally not have many unread entries.

I'm not sure that this will actually work, but I'll have to see how it goes. Something certainly needs to change; thousands of unread feed entries that just keep expiring off the bottom of feeds just don't work.

(They 'work' in one sense, but they create a kind of mental pressure that makes me avoid having much to do with them. Right now I avoid entire categories of feeds in Liferea because of all of the unread entries.)

PS: if I'm being honest with myself, I should probably throw away at least half of my casual feeds. Many of them were added because they looked sort of interesting, way back in my early days of feed reading enthusiasm when I felt that I had a lot more time for this. Rather than putting them in Google Reader only to ignore them, I should just save them in a file somewhere.

(This reminds me rather vividly of mailing lists, and if I go far enough back, Usenet. I went through much the same pattern with them that I am going through with feeds now, and if I got into something like Twitter I suspect that I would go through the same pattern with it too.)

tech/DealingWithMyFeeds written at 00:30:37; Add Comment

2010-09-04

The laziness of a programmer, illustrated

At work, I have fallen into the bad habit of keeping a lot of iconified Firefox windows around, full of various things that I am going to read sometime (honest). As I've mentioned before, I have all of these iconified windows very carefully placed and organized so that I can find them again and keep track of them.

Naturally, this makes quitting and restarting Firefox kind of a pain. I have Firefox set to preserve all of the active windows and tabs over restarts, but it doesn't preserve the positions of the iconified windows (and it doesn't entirely preserve the regular window position either); any time I have to start Firefox again I have to re-position all of those icons. Generally this means that I don't; I never exit Firefox unless I'm forced to, because it's such a pain to get everything set up again.

(Which implies that I never log out, either; I just leave my screen locked.)

Recently I got tired of this (in the aftermath of my Fedora 13 upgrade, I've been restarting things more than usual). Thus I decided that clearly there had to be a way to fish around in the depths of X to find the current icon positions, so I could write a quick script that recorded them in a file and then shuffled the icons back into the right spots for me.

(This is less crazy than it sounds; I already have command line utilities to reposition windows, and X comes with a fair number of commands to poke at various aspects of window state.)

I'll cut to the chase: yes, except that it wasn't exactly a quick script. The most convenient way of doing this turned out to be writing an FVWM module in Perl that finds out all of this information and writes a file of FVWM commands that can be loaded back in to FVWM to (re)position and (re)iconify all of my Firefox windows just right. In the process of doing this I had to remember my Perl, look up a certain amount of Perl's OO support (my last serious Perl programming pretty much predates it), and figure out how to work with FVWM's underdocumented Perl bindings.

(FVWM has no current Python bindings for would be module authors.)

But all of this was less work than continuing to re-position all of my Firefox windows by hand. Honest.

(The resulting module is sort of theoretically general. If you are really interested, see here. As a bonus, you get to laugh at my hack-job Perl.)

programming/ProgrammerLaziness written at 01:10:22; Add Comment

2010-09-03

Finally understanding the attraction of AJAX

I'll admit it; I'm slow sometimes. For a very long time now I haven't really gotten why people keep sprinkling AJAX over their web pages (partly because I assiduously use NoScript and so mostly don't see it). Oh, I understood that you needed it to create actual applications on the web and that it could be convenient for making vaguely friendly things, but I didn't really understand it in the context of relatively ordinary web apps like DWiki.

But my recent thinking about my comment form design mistake has finally fixed that. Here is my recent insight in a nutshell:

AJAX lets you do things without page changes and refreshes, so you can preserve the user's context on the page and make them less confused.

In a conventional non-AJAX web interface, any significant action forces a (full) page reload. This creates a visible page refresh except in extremely ideal circumstances and in general means that the user has to find their place again and reorient themselves. This is sort of tolerable if what the user is working on fits entirely inside their browser window; it's fairly horrible if it doesn't and they have to actively scroll around to find where they were before. This is the core problem I have with a revision to my comment form design; I'm pretty sure that people would get lost among everything else going on.

(The ideal circumstances are that you're using fragment identifiers in the URL, the browser accurately repositions things back at the fragment identifier, and the entire system loads the new page so fast that there is no visible flicker.)

In an AJAX web interface the user can perform actions without this lurching jump. For example, when they click on 'add comments', they don't get yanked to a new page; instead, a comment form unfolds right then and there in front of them. This is less confusing in two ways. First, it is happening right in front of you, clearly visible. Second, it is the only thing that is happening; you don't have to pick out the significant change from all of the other flickering and movement and so on that's going on as the page reloads.

This creates a more fluid, less disorienting interface, one that is easier and faster to work with because you spend more time doing what you're interested in and less time finding your place again every so often. In a sense, the result is much closer to a direct manipulation interface than a standard, non-AJAX web page can manage.

I don't think that there's any way to pull this off without AJAX; you really need some way to do a partial page content update without anything else flickering or moving. That's just not something that browsers offer (you don't even get it on plain user-initiated page refresh).

(I suspect that this is old hat for people in the field, but all of it only clicked for me when I started really thinking over the problem of people getting lost in my comment form under various circumstances, cf TemplateLimitations.)

PS: looking backwards, this makes me slightly more sympathetic to old HTML frames. Although they were almost never used this way, you can argue that they were a crude first attempt at the sort of limited page update you'd need to pull this off.

web/FinallyGettingAJAX written at 01:11:31; Add Comment

2010-09-02

Why Python's global is necessary

When I started out programming in Python, I didn't really like global. For a long time I considered it unaesthetic, annoying, and on the whole an irritating wart of the bytecode implementation. As I mentioned recently, I have come around to a different view of global, and it goes like this.

If you want to have both global variables and lexically scoped local variables, you have to be able to tell whether a given name being assigned to in a function is a local or a global variable at the time that the function is being defined. Assuming that you want as much as possible of this to be implicit for various reasons, there are three relatively reasonable choices that I can think of:

  • you must declare globals explicitly; otherwise names are local.
  • you must declare locals explicitly; otherwise names are global.
  • the decision is made implicitly by what global names already exist when the function is being defined; a name that exists globally is taken as a global variable, and otherwise the name is taken as a local variable.

(If a name is never assigned to within a function but only read from, it's either a global variable or a 'use of an undefined value' error. Python opts to consider it a global variable.)

The third option is fragile (and un-Pythonic). This leaves you with a choice between the first and the second options, and either way you are going to need a keyword for it. Python makes the decision that writing to global variables will be rare and so it forces you to declare them explicitly; local variables, the common case, are handled implicitly. So it needs global, because having local instead would be worse (and having neither would be much worse).

(This decision might be either a pragmatic one, based on what was expected to be common, or a philosophical choice to make global variables more inconvenient in the hopes of making them less common. I don't know the Python history involved, so I have no idea which it was.)

Other languages make different choices here, sometimes for philosophical reasons that come down on the other side and sometimes just for historical ones (eg, if they started out without local variables or lexical scoping at all).

Sidebar: the many problems with the fully implicit option

The core problem with the fully implicit option, why it is fragile in many ways, is that it makes the meaning of a function dependent on its surrounding context. You can't just read a function and know what it does and what it manipulates; instead you have to know what global names exist when the function is defined.

One consequence of this is that anything that changes what global names are defined can change the meaning of the function. In a language like Python where function definition is an ordinary executable statement, one done immediately when encountered, merely moving a function definition forward or backwards inside a file could change the function's meaning even without any other code changes (as you move it before or after where global names are created or even deleted).

python/WhyGlobalNecessary written at 00:00:26; Add Comment

2010-08-31

I don't understand how net.ipv4.conf.*.rp_filter can work

First, the background. net.ipv4.conf.*.rp_filter controls some IP address source validation filtering done on incoming IPv4 packets. It has three values:

0 No filtering is done.
1 Packets are discarded if they come in on any interface except the one that a reply to the source IP would go out on.
2 Packets are discarded if a reply to the source IP could not be sent out any interface.

(A more formal description is in ip-sysctl.txt in the kernel documentation. Like all interface sysctls, it can be set separately for each interface, as a default, and for all interfaces.)

I don't understand how this can possibly work. Well, I understand how it works, I just don't understand how it can possibly do any good in most configurations. And I don't understand how a setting of '1' can possibly work at all in multihomed configurations where the multihomed machine is not the sole router for every network it's connected to that is not where its default route points.

First, as far as I can tell a setting of '2' is equivalent to '0' if you have a default route set (the usual case). With a default route set, all source IPs are reachable and so '2' will never discard packets, which is exactly the same as '0'.

For a machine with a single network interface and a default route, all settings are equivalent (for the same reason as above; all source IPs are reachable through your single interface). If you do not have a default route, either '1' or '2' will discard packets that come from networks you do not have routes for.

It is the multihomed case where things explode. Suppose that you have a multihomed host with two network interfaces, net-1 and net-2, with IP-1 on net-1 and IP-2 on net-2. With an rp_filter value of 1, a machine on net-2 cannot talk to this machine's IP-1 address unless the packets pass through the multihomed machine on the way to net-1, ie the multihomed machine is the router for the net-2 machine. If the packets go through another router, they will arrive on the multihomed machine's net-1 interface but the replies would go out the net-2 interface, so they fail the check.

Effectively this creates a bad version of an isolated interface, with the packet reachability restrictions but without the multiple split routing tables that make multihomed hosts actually work. As a bonus it hides the restriction deep in the networking sysctls, where you have to be an expert to find it.

(I suppose that there are some advantages to this half-hearted approach, in that it avoids some limits in the policy based routing version of it.)

By the way, I stumbled over this courtesy of Ubuntu 10.04 setting rp_filter to 1 by default. We have multihomed non-routing machines, and when we set up an Ubuntu 10.04 test version things promptly exploded. If I was not already suspicious of network sysctls, we could have spent quite a lot of time trying to find out just why the machine was ignoring certain sorts of network traffic.

(As it was I did 'sysctl -a | fgrep net. | sort' on both a 10.04 and an 8.04 machine and then looked for settings that were different. Ubuntu 10.04 may not be the first version that sets this, but 8.04 definitely didn't.)

PS: a much more useful version of this sysctl would be a 'private' flag on interfaces. If an interface had the private flag set, packets with a source IP address that was routed through that interface would only be accepted on that interface; all other interfaces would discard such packets.

linux/RpfilterPuzzle written at 14:24:29; Add Comment

2010-08-30

My avoidance of Python global variables

I spent part of today writing a quick one-off data conversion program. The core of it was a function that filtered items from a list through a number of things in order to sort them into the right category. Once the dust settled on all of the sorting needed, the function had quite a lot of stock arguments, things that didn't vary from call to call in my program. In fact, an unwieldy number of them.

There are at least three vaguely Pythonic options for how to deal with this (plus how I actually did), but what interests me in retrospect is the one answer that I didn't even think about. Namely, global variables.

There are all sorts of reasons to avoid global variables in general, but this was a one-off program and if I'm being honest, that's what all of those stock parameters really were. I was making them local variables in the calling function and then passing them in to the classifying function not so much because it was a good idea but because that's what I do in Python. I just don't use global variables very much even when they'd arguably make sense, and when I do use them I feel irritated.

As best I can tell, what does it is the pesky global keyword. Having to declare variables global any time I want to rebind them adds just enough extra friction to using global variables in practice that I would rather not bother and instead pass lots of things around as parameters. I generally resort to global variables only when passing the same information as parameters would add arguments to too many layers of function calls.

(This is the situation where you have four or five layers of function calls and some of the stuff down at the leaves wants to gather some expensive piece of information only once. The nominally logical thing to do is to call the 'gather information' function once at the start of your program and then pass the parameter all the way down to the leaves, but that means you have to pass the information object through all of the intermediate layers, where all it does is clutter up parameter lists. Really, you want to put it in a global variable, especially if you have several different clusters of these functions that want different chunks of information; passing the information they need down as parameters doesn't scale.)

Part of the friction is the annoyance of the extra line in any function that will rebind the global variable. But another part is just having to think about it at all, partly because I sort of consider global to be a wart (especially because I know what the bytecode is doing behind the scenes).

(Global's not really a wart, but that's another entry.)

Sidebar: the three options that I am thinking of

The three Python options that immediately come to mind are:

  • embed the classifying function into its caller as a closure, giving it direct access to all of what used to be the stock parameters. This feels like a hack to me, and I don't like the extra level of indentation.

  • make the classifying function a method on a class which otherwise had all of the stock parameters as instance variables. It's probably the classical solution but it feels completely artificial to me.

  • make a structure to hold all of the stock parameters, then just pass the structure instead of all of the parameters separately.

Since this was a quick hack, I was lazy and did the poor man's structure: I made a tuple with all of the stock parameters and just passed in the tuple (and then unpacked it in the classifying function). This is less aesthetically pleasing than a structure, but also less code, and it is the obvious next step when one's parameter list spirals out of control and most of it is the same from call to call.

(My eventual code had two arguments that varied from call to call and six that were the same, packed into a tuple. I'm sure that this is a code smell, but it was a quick hack.)

python/AvoidingGlobals written at 22:47:07; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
[There's more, starting at 2010/08/29 or Previous 10]
(Previous day)
By day for September 2010: 2 3 4 5 6 7 9; before September.

Page tools: See As Blogdir, See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.