Wandering Thoughts archives

2016-06-05

My approach for inspecting Go error values

Dave Cheney sort of recently wrote Don't just check errors, handle them gracefully, where he strongly suggested that basically you should never check the actual values of errors. This is generally a great idea, but sometimes you don't have a choice. For example, for a long time it was the case that the only way to tell if your DNS lookup had hit a temporary DNS error (such as 'no authoritative servers for this domain are responding') or a permanent one ('this name doesn't exist') was to examine the specific error that you received. While net.DNSError had a .Temporary() function, it didn't return true in enough cases; you had to go digging deeper to know.

(This was Go issue 8434 and has since been fixed, although it took a while.)

When I had to work around this issue (in code that I suppose I should now remove), I was at least smart enough to try the official way first:

 var serverrstr = "server misbehaving"
 func isTemporary(err error) bool {
    if e, ok := err.(*net.DNSError); ok {
       if e.Temporary() || e.Err == serverrstr {
          return true
       }
    }
    return false
 }

Checking the official way first made it so that once this issue was resolved, my code would immediately start relying on the official way. Checking the error string only for net.DNSError errors made sure that I wouldn't get false positives from other error types, which seemed like a good idea at the time.

When I wrote this code I felt reasonably smart about it; I thought I'd done about as well as I could. Then Dave Cheney's article showed me that I wasn't quite doing this right; as he says in one section ('Assert errors for behaviour, not type'), I should have really checked for .Temporary() through an interface instead of just directly checking the error as a net.DNSError. After all, maybe someday net.LookupMX() and company will return an additional type of error in some circumstances that has a .Temporary() method; if that would happen, my code here wouldn't work right.

(I even put some comments in musing about the idea, but then rejected it on the grounds that the current net package code didn't do that so there didn't seem to be any point. In retrospect that was the wrong position to take, because I wasn't thinking about potential future developments in the net package.)

I'm conflicted over whether to cast to specific error types if you have to check the actual error value in some way (as I do here). I think it comes down to which way is safer for the code to fail. If you check the value through error.Error(), future changes in the code you're calling may cause you to match on things that aren't the specific error type you're expecting. Sometimes this will be the right answer and sometimes it will be the wrong one, so you have to weigh the harm of a false positive against the harm of a false negative.

programming/GoInspectingErrors written at 01:28:22; Add Comment

2016-06-04

The (Unix) shell is not just for running programs

In the Reddit comments on yesterday's entry, I ran across the following comment:

No. The shell literally has the sole purpose of running external programs. Anything more is extra.

The V1 shell read a line, split on whitespace, and executed the command from /bin. You could change the current directory from in the shell, that was it.

On any version of Unix as far back as at least V7, this is false. The Unix shell may have started out simply being a way to run programs, but it long ago stopped being just that. Since the V7 shell is a ground up rewrite, one cannot even argue that the shell simply drifted into these additional features for convenience. The V7 shell was consciously designed from scratch, and as part of that design it included major programming features including control flow constructs drawn directly from the general Algol line of computer language design. Inclusion of these programming features is not an accident and not a drift over time; it is a core part of the shell's design and thus its intended purpose. The V7 shell is there both to run programs and to write programs (shell scripts), and this is completely intended.

(In terms of control flow, I'm thinking here of if, while, and for, and there's also case.)

In short, the shell as in part a programming language is part of Unix's nature from at least the first really popular Unix version (V7 became the base of many further lines of Unix). To the extent that the Unix design ethos or philosophy exists as a coherent thing, it demonstrably includes a strongly programmable shell.

You can make an argument that the V6 shell (the 'Mashey shell') shows this too, but it was apparently a derivative of and deliberately backwards compatible with the original 'just run things' Thompson shell. The V7 Bourne shell is a clear, from scratch break with the original Thompson shell, and it was demonstrably accepted by Research Unix as being, well, proper Unix.

(If you want even more proof that Research Unix's view of the shell includes programming, the shell was reimplemented once again for Version 10 and Plan 9 in the form of Tom Duff's rc shell and, you guessed it, that included programmability too, this time with more C-like syntax instead of the Algol-like syntax of the Bourne shell.)

(You can argue that this conjoining of 'just run programs for people' and 'write shell scripts' in a single program is a mistake and these roles should be split apart into two programs, but that's a different argument. I happen to think that it's also wrong, and on more than one level.)

unix/ShellNotJustProgramRunner written at 01:28:04; Add Comment

2016-06-03

One thing that makes the Bourne shell an odd language

In many ways, the Bourne shell is a relatively conventional programming language. It has a few syntactic abnormalities, a few flourishes created by the fact that it is an engine for running programs (although other languages have featured equivalents of $(...) in the form of various levels of 'eval' functionality), and a different treatment of unquoted words, but the overall control structure is an extremely familiar Algol-style one (which is not surprising, since Steve Bourne really liked Algol).

But the Bourne shell does have one thing that clearly makes it an odd language, namely that it has outsourced what are normally core language functions to external programs. Or rather it started out in its original version by outsourcing those functions; versions of the Bourne shell since then have pulled them back in in various ways. Here I am thinking of both evaluating conditionals via test aka [ and arithmetic via expr (which also does some other things too).

(Bourne shells have had test as a builtin for some time (sometimes with some annoyances) and built in arithmetic is often present these days as $...)

There's no reason why test has to be a separate program and neither test nor expr seems to have existed in Research Unix V6, so they both appeared in V7 along with the Bourne shell itself. They aren't written in BourneGol, so they may not have been written by Steve Bourne himself, but at least test was clearly written as a companion program (the V7 Bourne shell manpage explicitly mentions it, among other things).

I don't know why the original Bourne shell made this decision. It's possible that it was simply forced by the limitations of the PDP-11 environment of V7. Maybe a version of the Bourne shell with test and/or expr built into the main shell code would have either been too big or just considered over-bloated for something that would mostly be used interactively (and thus not be using test et al very often). Or possibly they were just easier to write as separate programs (the V7 expr is just a single yacc file).

Note that there are structural reasons in the Bourne shell to make if et al conditions be the result of commands, instead of restricting them to (only) be actual conditions. But the original Bourne shell could have done this with test or the equivalent as a built-in command, and it certainly has other built in commands. Perhaps test needing to be an actual command was one of the things that pushed it towards not being built in. You can certainly see a spirit of minimalism at work here if you want to (although I have no idea if that's the reason).

(This expands on a tweet of mine.)

Sidebar: It's not clear when test picked up its [ alias

Before I started writing this entry, I expected that test was also known as [ right from the beginning in V7. Now I'm not so sure. On the one hand, the actual V7 shell scripts I can find eg here consistently use test instead of [ and the V7 compile scripts don't seem to create a [ hardlink. On the other hand, the V7 test source already has special magic handling if it's invoked as [.

(There are V7 disk images out there that you can boot up on a PDP-11 emulator, so in theory I could fire one up and see if it has a /bin/[. In practice I'm not that energetic.)

unix/BourneShellOutsourcedBits written at 01:25:41; Add Comment

2016-06-02

Spammers can abandon SMTP connections not infrequently

As a result of looking at my SMTP session logs, one of the things that I've started tracking on my 'sinkhole' spamtrap SMTP server is how many senders reach the point where they actively get rejected by my server versus how many senders just disconnect with incomplete sessions where everything has gone fine up to that point. My SMTP session logging said that at least some just gave up, but I wasn't sure how many did this.

(Under normal circumstances you'd expect real sending mailers to almost never just abandon an incomplete session. It's not 'never' because there will always be some sending mailers that have their machine reboot out from underneath them or the like as they're trying to send out a message, but this is not exactly common so it should be very low.)

My results so far are early and somewhat incomplete, but I'll give you representative numbers anyways. The numbers I have handy right now are that over the past two and a half days, I've seen 123 abandoned sessions to 440 sessions with refused SMTP commands, or about a fifth of the sessions are just being abandoned. I don't particularly have data on where the sessions are being abandoned, but looking at my SMTP logs say that some senders drop the connection while I'm sending my initial SMTP greeting banner and some drop it as I answer their EHLO or HELO.

Now, I don't and can't know why senders are choosing to abandon their SMTP sessions to my sinkhole server. But one thing that my server does is trickle out its SMTP replies rather slowly (including the initial banner), specifically at a rate of one character every tenth of a second. I took this idea from OpenBSD's spamd, but when I put it in I didn't really expect it to do anything. It may be that I'm wrong here and there is a not insignificant amount of spammer software that either specifically recognizes this behavior or simply isn't interested in wasting its time on too-slow mailers.

(I don't yet feel like experimenting by turning this feature off and seeing if the number of abandoned sessions basically goes almost to zero.)

Applications of this to real, non-sinkhole mailers are left as an exercise. As far as I know, no real sending mailer cares about somewhat slow responses at this level, but I admit I haven't exactly attempted to get every major ISP and so on to send my sinkhole server some email just to see if it would work. Big places like Google and Outlook don't seem to have had any problems coping with my sinkhole server, for what that's worth.

Sidebar: what I consider an abandoned session versus a rejected one

A session counts as 'rejected' if the most recent valid HELO/EHLO, MAIL FROM, RCPT TO, DATA or final '.' on messages was either 5xx'd or 4xx'd. This doesn't consider QUIT, RSET, or other similar commands and it doesn't consider out of sequence commands. A session counts as 'abandoned' if it got 'go ahead' 2xx/354 responses to every valid, in-sequence SMTP command it tried but the sender either closed the TCP connection or sent a QUIT.

Sessions with things like TLS setup failures don't count as either abandoned or rejected. I see some amount of those, some for sad reasons.

spam/SpammersAbandonSMTPSessions written at 00:17:45; Add Comment

2016-05-31

Understanding the modern view of security

David Magda wrote a good and interesting question in a comment on my entry on the browser security dilemma:

I'm not sure why they can't have an about:config item called something like "DoNotBlameFirefox" (akin to Sendmail's idea).

There is a direct answer to this question (and I sort of wrote it in my comment), but the larger answer is that there has been a broad change in the consensus view of (computer) security. Browsers are a microcosm of this shift and also make a great illustration of it.

In the beginning, the view of security was that your job was to create a system that could be operated securely (often but not always it was secure by default) and give it to people. Where the system ran into problems or operating issues, it would tell people and give them options for what to do next. In the beginning, the diagnostics when something went wrong were terrible (which is a serious problem), but after a while people worked on making them better, clearer, and more understandable by normal people. If people chose to override the security precautions or operate the systems in insecure ways, well, that was their decision and their problem; you trusted people to know what they were doing and your hands were clean if they didn't. Let us call this model the 'Security 1' model.

(PGP is another poster child for the Security 1 model. It's certainly possible to use PGP securely, but it's also famously easy to screw it up in dozens of ways such that you're either insecure or you leak way more information than you intend to.)

The Security 1 model is completely consistent and logical and sound, and it can create solid security. However, like the 'Safety-I' model of safety, it has a serious problem: it not infrequently doesn't actually yield security in real world operation when it is challenged with real security failures. Even when provided with systems that are secure by default, people will often opt to operate them in insecure ways for reasons that make perfect sense to the people on the spot but which are catastrophic for security. Browser TLS security warnings have been ground zero for illustrating this; browser developers have experimentally determined that there is basically no level of strong warnings that will dissuade enough people from going forward to connect to what they think is eg Facebook. There are all sorts of reasons for this, including the vast prevalence of false positives in security alerts and the barrage of warning messages that we've trained people to click through because they're just in the way in the end.

The security failures of the resulting total system of 'human plus computer system' are in one sense not the fault of the designers of the computer system, any more than it is your fault if you provide people with a saw and careful instructions to use it only on wood and they occasionally saw their own limbs off despite your instructions, warnings, stubbornly attached limb guards, and so on. At the same time, the security failures are an entirely predictable failure of the total system. This has resulted in a major shift in thinking about security, which I will call 'Security 2'.

In Security 2 thinking, it is not good enough to have a secure system if people will wind up operating it insecurely. What matters and the goal that designers must focus on is making the total system operate securely, even in adverse conditions; another way to put this is that the security goal has become protecting people in the real world. As a result, a Security 2 focused designer shouldn't allow security overrides to exist if they know those overrides will wind up being (mis)used in a way that defeats the overall security of the system. It doesn't matter if the misuse is user error on the part of the people using the security system; the result is still an insecure total system and people getting owned and compromised, and the designer has failed.

Security 2 systems are designed not necessarily so much to be easy to use as to be hard or impossible to screw up in such a way that you get owned (although often this means making them easy to use too). For example, all the time, automatic end to end encryption of messages in an instant messaging system is a Security 2 feature; optional, must be selected or turned on by hand end to end encryption of messages is a Security 1 feature.

Part of the browser shift to a Security 2 mindset has been to increasingly disallow any and all ways to override core security precautions, including being willing to listen to websites over users when it comes to TLS failures. This is pretty much what I'd expect from a modern Security 2 design, given what we know about actual user behavior.

(The Security 2 mindset raises serious issues when it intersects with user control over their own devices and software, because it more or less inherently involves removing some of that control. For example, I cannot tell modern versions of Firefox to do my bidding over some TLS failures without rebuilding them from source with increasing amounts of hackery applied.)

tech/UnderstandingModernSecurity written at 23:03:58; Add Comment

2016-05-30

The browser security dilemma

So Pete Zaitcev ran into the failure mode of modern browsers being strict about security, which is that the browser locks you out of something that you need to access. The only thing I'm much surprised about is that it happened to Pete Zaitcev before it happened to me. On the one hand, this is really frustrating when it happens to you; on the other hand, the browsers are caught on the horns of a real security dilemma here.

To simplify, there are two sorts of browser users; let us call them sysadmins and ordinary people. Sysadmins both know what they're doing and deal with broken cryptography things on a not infrequent basis, such as device management websites that only support terribly outdated cryptography (say SSLv3 only), or have only weak certificates or keys (512 bytes only, yes really), or their certificate has long since expired and are for the wrong name anyways. As a result, sysadmins both want ways to override TLS failures and can (in theory) be trusted to use them safely. By contrast, ordinary people both don't normally encounter broken cryptography and don't really know enough to handle it safely if they do.

In an ideal world, a browser would be able to tell which sort of person you were and give you an appropriate interface. In this less than ideal world, what browser vendors have discovered is that if you expose a 'sysadmin' interface in basically any way, ordinary people will eventually wind up using it for TLS failures that they definitely should not override. It doesn't matter how well you hide it; sooner or later someone will find it and write it up on the Internet and search engines will index it and people will search for it and navigate the ten steps necessary to enable it (and ignore your scary warnings in the process). If we have learned anything, we've learned that people are extremely motivated to get to their websites and are willing to jump through all sorts of hoops to do so. Even when this is a terrible idea.

Since ordinary people vastly outnumber sysadmins, browsers are increasingly opting to throw sysadmins under the bus (ie, completely not supporting our need to override these checks some of the time). At the moment, some major browsers are less strict than others, but I suspect that this will pass and sooner or later Chrome too will give me and Pete Zaitcev no option here. Maybe we'll still be able to rely on more obscure things (on Linux) like Konqueror, at least if they're functional enough to handle the device management websites and IPMIs and so on that I need to deal with.

(Failing that, there may come a day where I keep around an ancient copy of Firefox to handle such sites, in just the same way that I keep around an ancient copy of Java to deal with various Java based 'KVM over IP' IPMI things. Don't worry, my ancient Java isn't wired up as an applet and only works in a non-default browser setup in the first place.)

web/BrowserSecurityDilemma written at 22:58:28; Add Comment

'Command line text editor' is not the same as 'terminal-based text editor'

A while back, I saw a mention about what was called a new command line text editor. My ears perked up, and then I was disappointed:

Today's irritation: people who say 'command line text editor' when they mean 'terminal/cursor-based text editor'.

I understand why the confusion comes up, I really do; an in-terminal full screen editor like vi generally has to be started from the command line instead of eg from a GUI menu or icon. But for people like me, the two are not the same and another full screen, text based editor along the lines of vi (or nano or GNU Emacs without X) is not anywhere near as interesting as a new real command line text editor is (or would be).

So, what do people like me mean by 'command line text editor'? Well, generally some form of editor that you use from the command line but that doesn't take over your terminal screen and have you cursor around it and all that. The archetype of interactive command line text editors is ed, but there are other editors which have such a mode (sam has one, for example, although it's not used very much in practice).

Now, a lot of the nominal advantages of ed and similar things are no longer applicable today. Once upon a time they were good for things like low bandwidth connections where you wanted to make quick edits, or slow and heavily loaded machines where you didn't want to wait for even vi to start up and operate. These days this is not something that most people worry about, and full screen text editors undeniably make life easier on you. Paradoxically, this is a good part of why I would be interested in a new real command line editor. Anyone who creates one in this day and age probably has something they think it does really well to make up for not being a full screen editor, and I want to take a look at it to see this.

I also think that there are plausible advantages of a nice command line text editor. The two that I can think of are truly command line based editing (where you have commands or can easily build shell scripts to do canned editing operations, and then you invoke the command to do the edit) and quick text editing in a way that doesn't lose the context of what's already on your screen. I imagine the latter as something akin to current shell 'readline' command line editing, which basically uses only a line or two on the screen. I don't know if either of these could be made to work well, but I'd love to see someone try. It would certainly be different from what we usually get.

(I don't consider terminal emulator alternate screens to be a solution to the 'loss of context' issue, because you still can't see the context at the same time as your editing. You just get it back after you quit your editor again.)

unix/CommandLineTextEditors written at 00:16:19; Add Comment

2016-05-29

What does 'success' mean for a research operating system?

Sometimes people talk about how successful (nor not successful) an operating system has been, when that operating system was created as a research project instead of a product. One of the issues here is that there are several different things that people can mean by a research OS being a success. In particular, I think that there are at least four sorts of it:

  • The OS actually works and thus serves as a proof of concept for the underlying ideas that motivated this particular research OS variation. What 'works' means may vary somewhat, since research projects rarely reach production status; generally you get some demos running acceptably fast.

    Having your research OS actually work is about the baseline definition of success. It means that your ideas don't conflict with each other, can be made to work acceptably, and don't require big compromises to be implemented.

  • The OS works well enough and is attractive enough that people in your research group can and do build things on it and actively use it. If it's a general purpose OS, people voluntarily and productively use it for everyday activity; if it's a specialized real time or whatever OS, people voluntarily build their own projects on top of it and have them work.

    A research OS that has reached this sort of success is more than just a technology demonstration and proving ground. It can do real things.

  • At least some of your OS's ideas are attractive enough that they get implemented in other OSes or at least clearly influence the development of other OSes. This is especially so if your ideas propagate to production OSes in some form or other (often in a somewhat modified and less pure form, because that's just how things go).

    (As anyone who's familiar with academic research knows, a lot of research is basically not particularly influential. Being influential means you've achieved more success than usual.)

  • Some form of your research OS winds up being used by outside people to do real work; it becomes a 'success' in the sense of 'it is out in the real world doing things'. Sometimes this is your OS relatively straight, sometimes it's a heavily adopted version of your work, and I'm sure that there have been cases where companies took the ideas and redid the implementation.

Most research OSes reach the first level of success, or at least most that you ever hear about (the research community rarely publishes negative results, among other issues). Or at least they reach the appearance of it; there may be all sorts of warts under the surface in practice in terms of performance, reliability, and so on. On the other hand some research OSes are genuine attempts to achieve genuinely usable, reliable, and performant results in order to demonstrate that their ideas are not merely possible but are actively practical.

It's quite rare for a research OS to reach the fourth level of success of making it into the real world. There are not many 'real world' OSes in the first place and there are very large practical obstacles in the way. To put it one way, there is a lot of non-research work involved in making something a product (even a free one).

(In general purpose OSes, I think only two research OSes have made a truly successful jump into the real world from the 1970s onwards, although it's probably been tried with a few more. I don't know enough about the real time and embedded computing worlds to have an idea there.)

tech/SuccessForResearchOSes written at 01:15:23; Add Comment

2016-05-28

A problem with using old OmniOS versions: disconnection from the community

One of the less obvious problems with us probably never doing another OmniOS upgrade is that I'm clearly going to become more and more disconnected from the OmniOS community. This is only natural, since most or almost all of the community is using recent versions; as time goes on, those versions and the version we're running are only going to drift more and more apart.

(It's true that OmniOS r151014 is an OmniOS LTS release, supported through early 2018 per here. But in practice I expect that most OmniOS people will be running the one of the more up to date stable releases instead, since they won't have our upgrade concerns.)

Being disconnected from the community makes me sad, because the OmniOS community is one of the great parts about OmniOS. There are several dimensions to this disconnection. First, the more disconnected I am from the community, the less I'll be able to give back to it, the less I can contribute answers or information or whatever. Giving back to the community is something that I would like to do for all sorts of reasons (including that I plain like being able to contribute).

Obviously, the more distant we are from what the community is running the less the community can help us with advice and information and all of that if we run into issues or just have questions about how best to do something or what the community's experiences are. At best they may be able to tell us how things would look or would be done on a newer version of OmniOS. Of course, some things only change slowly, but I suspect that there is only going to be more and more of a gap here over time. I don't want to put too much weight on this; I'm very grateful to the help that the community has given us, but at the same time it's not help that I think we should count on and significantly factor into our plans.

(To put it one way, community help comes from the goodness of its heart and is best considered a pleasant surprise instead of a guarantee or an entitlement. I don't know if all of this makes sense to anyone but me, though.)

Finally, I'll just plain be paying less attention to the community and drifting away it. It's inevitable; more and more, community discussions will be about things that aren't relevant to our version and that I can't contribute to. If people have problems or questions, I'll only have outdated information or more and more uninformed opinions. That's a recipe for disengagement, even from a nice community.

Having written all of this, I think that what I should do is build one experimental OmniOS server to keep up to date. It doesn't have to use our fileserver hardware; for a lot of things, any old server running OmniOS will serve to keep me at least somewhat current. As a bonus it will provide me with a platform to test things on the current OmniOS version (whatever that is at the time).

(We have enough spare SSDs for our current fileservers so that I could take the test fileserver and build a system SSD set for the current OmniOS, just so I have it around. We did this sort of back and forth OmniOS version testing during our transition to r151014, so we actually have a template for it.)

solaris/OmniOSCommunityDisconnect written at 00:35:28; Add Comment

2016-05-27

Your overall anti-spam system should have manual emergency blocks

We mostly rely on a commercial anti-spam system for our incoming spam filtering (as described here), and many other people rely on a variety of open source options for their spam filtering. This generally works very well, with us (and you) getting to offload the work of maintaining a high quality anti-spam system to other people (and it's certainly a lot of work). But not always (and not just because it malfunctions). The realities of life are that sooner or later you will be hit by a spam run that your anti-spam system doesn't recognize, either because the spam run is really new or because it's pretty specific to you.

Much of the time, you can shrug your shoulders and let this go. No anti-spam system is perfect and one of the tradeoffs you make when relying on a third-party system is that it's broadly out of your hands (sometimes this is an advantage). But some of the time this isn't going to be good enough; either the volume or the threat to your users will be so high that you can't just sit on your hands.

(Modern ransomware is making this clear by creating a potentially very high cost of allowing some things through.)

When this day comes to pass, you'll want to have the ability to step in and block the traffic even though your automated anti-spam system is happy with it. This can take many forms, depending on how you want to handle it; you could figure out how to write custom rules for your anti-spam system (so you can outright block certain sorts of files or certain URLs or whatever), or you can build blocking features into your mailer configuration itself, or any number of other options.

Having been through having to do this on the fly during an emergency, my strong suggestion is that you build the infrastructure for these manual blocks now, before you need them. It's some additional up front work and if you're lucky you may never need it, but doing it now when you have time to plan and test and figure out the best way to do things beats having to do it on the fly, under pressure.

Sidebar: What I think you should have manual blocks for

On the one hand attacker ingenuity is very deep, but on the other hand certain patterns repeat over and over again. So my view is that you can probably cover most ground with the ability to put in place manual blocks against sending IPs, sending domains, file extensions (including inside file containers like ZIP files), and whole and partial URLs (for phishing campaigns). You might also want a general message header and body regular expression matching system, but that's starting to feel like scope creep to me.

(Of course real scope creep would be to start by creating a general, generic framework for writing relatively arbitrary manual blocks on message attributes.)

spam/PlanForManualSpamBlocks written at 01:43:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.