Wandering Thoughts


When the SSH protocol does and doesn't do user authentication

A while back, I was reading this lobste.rs posting and ran into what struck me as an interesting question from evaryont:

I haven’t taken a deep look at this, but doesn’t OpenSSH need frequent access to the [user] secret key material? [...]

My gut reaction was that it didn't, but gut reactions are dangerous in crytographic protocols so I decided to find out for sure. This turned out to be more involved than I expected, partly because the various RFCs describing the SSH protocol use somewhat confusing terminology.

SSH is composed of three protocols, which are nominally stacked on top of each other: the SSH transport protocol, the SSH user authentication protocol, and the SSH connection protocol. It is the connection protocol that actually gets things like interactive logins and port forwarding done; everything else is just there to support it. However these protocols are not nested in a direct sense; instead they are sort of sequenced, one after the other, and their actual operation all happens at the same level.

What flows over an SSH connection is a sequence of messages (in a binary packet format); each message has a type (a byte) and different protocols have different range of message types that they use for their various messages. When the SSH connection starts these packets are unencrypted, so the first job is to use the SSH transport protocol to work out session keys and switch on encryption (and in the process the client obtains the server's host key). Once the connection is encrypted the next thing the client must do is request a service, and in basically all situations the client is going to request (and the server is going to require) that this be the user authentication service.

The user authentication service (and protocol) has its own set of messages that are exchanged over the now-encrypted connection between client and server. User authentication is obviously where you initially need the user's secret key material (once challenged by the server). As part of requesting user authentication the client specifies a service name, which is the service to 'start' for the user after user authentication is complete. What starting the service really means is that the server and client will stop sending user authentication messages to each other and start sending connection protocol messages back and forth.

The SSH transport protocol has explicit provisions for re-keying the connection; since all of the SSH (sub)protocols are just sending messages in the overall message stream, this re-keying can happen at any time during another protocol's message flow. Server and client implementations will probably arrange to make it transparent to higher-level code that handles things like the connection protocol. By contrast, the SSH user authentication protocol has no provisions for re-authenticating you; in fact the protocol explicitly states that after user authentication is successful, all further authentication related messages received should be silently ignored. Once user authentication is complete, control is transferred to the SSH connection protocol and that's it.

So that brings us around to our answer: SSH only needs user secret key material once, at the start of your session. Once authenticated you can never be challenged to re-authenticate, and re-keying the overall encryption is a separate thing entirely (and doesn't use your secret keys, although the initial session keys can figure into the user authentication process).

Sidebar: The terminology confusion

I say that the SSH RFCs use confusing terminology because they often say things like that the SSH connection protocol 'run[s] on top of the SSH transport layer and user authentication protocols'. This is sort of true in a logical sense, but it is not true in a wire sense. SSH runs on top of TCP which runs on top of IP in a literal and wire sense, in that SSH data is inside TCP data which is inside IP data. But the various SSH protocols are all messages at the same level in the TCP stream and are not 'on top of' each other in that way. This is especially true of user authentication and the connection protocol, because what really happens is that first user authentication is done and then we shift over to the connection protocol.

tech/SSHWhenUserAuthentication written at 23:37:31; Add Comment


I'm cautiously optimistic about the new OmniOS Community Edition

You may recall that back in April, OmniTI suspended active development of OmniOS, leaving its future in some doubt and causing me to wonder what we'd do about our current generation of fileservers. There was a certain amount of back and forth on the OmniOS mailing list, but in general nothing concrete happened about, say, updates to the current OmniOS release, and people started to get nervous. Then just over a week ago, the OmniOS Community Edition was announced, complete with OmniOSCE.org. Since then, they've already released one weekly update (r151022i) with various fixes.

All of this leaves me cautiously optimistic for our moderate term needs, where we basically need a replacement for OmniOS r151014 (the previous LTS release) that gets security updates. I'm optimistic for the obvious reason, which is that things are really happening here; I'm cautious because maintaining a distribution of anything is a bunch of work over time and it's easy to burn out people doing it. I'm hopeful that the initial people behind OmniOS CE will be able to get more people to help and spread the load out, making it more viable over the months to come.

(I won't be one of the people helping, for previously discussed reasons.)

We're probably not in a rush to try to update from r151014 to the OmniOS CE version of r151022. Building out a new version of OmniOS and testing it takes a bunch of effort, the process of deployment is disruptive, and there's probably no point in doing too much of that work until the moderate term situation with OmniOS CE is clearer. For a start, it's not clear to me if OmniOS CE r151022 will receive long-term security updates or if users will be expected to move to r151024 when it's released (and I suppose I should ask).

For our longer term needs, ie the next generation of fileservers, a lot of things are up in the air. If we move to smaller fileservers we will probably move to directly attached disks, which means we now care about SAS driver support, and in general there's been the big question of good Illumos support for 10G-T Ethernet hardware (which I believe is still not there today for Intel 10G-T cards, or at least I haven't really seen any big update to the ixgbe driver). What will happen with OmniOS CE over the longer term is merely one of the issues in play; it may turn out to be important, or it may turn out to be irrelevant because our decision is forced by other things.

solaris/OmniOSCECautiousOptimism written at 23:47:29; Add Comment

HTTPS is a legacy protocol

Ted Unangst recently wrote moving to https (via), in which he gave us the following (in his usual inimitable style):

On the security front, however, there may be a few things to mention. Curiously, some browsers react to the addition of encryption to a website by issuing a security warning. Yesterday, reading this page in plaintext was perfectly fine, but today, add some AES to the mix, and it’s a terrible menace, unfit for even casual viewing. But fill out the right forms and ask the right people and we can fix that, right?

(You may have trouble reading Ted's post, especially on mobile devices.)

One way to look at this situation is to observe that HTTPS today is a legacy protocol, much like every other widely deployed Internet protocol. Every widely deployed protocol, HTTPS included, is at least somewhat old (because it takes time to get widely deployed), and that means that they're all encrusted with at least some old decisions that we're now stuck with in the name of backwards compatibility. What we end up with is almost never what people would design if they were to develop these protocols from scratch today.

A modern version of HTTP(S) would probably be encrypted from the start regardless of whether the web site had a certificate, as encryption has become much more important today. This isn't just because we're in a post-Snowden world; it's also because today's Internet has become a place full of creepy ad-driven surveillance and privacy invasion, where ISPs are one of your enemies. When semi-passive eavesdroppers are demonstrably more or less everywhere, pervasive encryption is a priority for a whole bunch of people for all sorts of reasons, both for privacy and for integrity of what gets transferred.

But here on the legacy web, our only path to encryption is with HTTPS, and HTTPS comes with encryption tightly coupled to web site authentication. In theory you could split them apart by fiat with browser and web server cooperation (eg); in practice there's a chicken and egg problem with how old and new browsers interact with various sorts of sites, and how users and various sorts of middleware software expect HTTP and HTTPS links and traffic to behave. At this point there may not be any way out of the tangle of history and people's expectations. That HTTPS is a legacy protocol means that we're kind of stuck with some things that are less than ideal, including this one.

(I don't know what the initial HTTPS and SSL threat model was, but I suspect that the Netscape people involved didn't foresee anything close to the modern web environment we've wound up with.)

So in short, we're stuck with a situation where adding some AES to your website does indeed involve making it into a horrible menace unless you ask the right people. This isn't because it's particularly sensible; it's because that's how things evolved, for better or worse. We can mock the silliness of the result if we want to (although every legacy protocol has issues like this), but the only real way to do better is to destroy backwards compatibility. Some people are broadly fine with this sort of move, but a lot of people aren't, and it's very hard to pull off successfully in a diverse ecology where no single party has strong control.

(It's not useless to point out the absurdity yet again, not exactly, but it is shooting well-known fish in a barrel. This is not a new issue and, as mentioned, it's not likely that it's ever going to be fixed. But Ted Unangst does write entertaining rants.)

web/HTTPSLegacyProtocol written at 00:30:53; Add Comment


I've become resigned to Firefox slowly leaking memory

Over the years I've written a number of things here about how my Firefox setup seems to be fairly fragile as far as memory usage goes, in that any number of addons or other changes seem to cause it to leak memory, often rapidly. Sometimes there have been apparently innocuous changes in addons I use, like NoScript, that cause a new version of the addon to make my Firefox sessions explode.

(I've actually bisected some of those changes down to relatively innocent changes and found at least one pattern in addon and even core Firefox JavaScript that seems to cause memory leaks, but I'm not sure I believe my results.)

For a long time I held out hope that if I only found the right combination of addons and options and so on, I could get my Firefox to have stable memory usage over a period of a week or more (with a fixed set of long-term browser windows open, none of which run JavaScript). But by now I've slowly and reluctantly come around to the idea that that's not going to happen. Instead, even with my best efforts I can expect Firefox's Resident Set Size to slowly grow over a few days from a starting point of around 600 to 700 MBytes, eventually crossing over the 1 GB boundary, and then I'll wind up wanting to restart it once I notice.

The good news is that Firefox performance doesn't seem to degrade drastically at this sort of memory usage. I haven't kept close track of how it feels, but it's certainly not the glaringly obvious responsiveness issues that used to happen to me. Instead I wind up restarting Firefox because it's just using too much of my RAM and I want it to use less.

(It's possible that Firefox's performance would degrade noticeably if I let it keep growing its memory usage, which of course is one reason not to.)

Restarting Firefox is not too much of a pain (I built a tool to help a while back), but it makes me vaguely unhappy despite my resignation. Software should be better than this, but apparently it isn't and I just get to live with it. Restarting Firefox feels like giving in, but not restarting Firefox is clearly just tilting at windmills.

Sidebar: The JavaScript pattern that seemed to leak

The short version is 'calling console.log() with an exception object'. The general pattern seemed to be:

try {
  [... whatever ...]
} catch (e) {

My theory is that this causes the browser-wide Firefox developer console to capture the exception object, which in turn presumably captures a bunch of JavaScript state, variables, and code, and means that none of them can be garbage collected the way they normally would be. Trigger such exceptions very often and there you go.

Replacing the console.log(e) with 'console.log("some-message")' seemed to usually make the prominent leaks go away. The loss of information was irrelevant; it's not as if I'm going to debug addons (or core Firefox code written in JavaScript). I never even look at the browser console.

It's possible that opening the browser console every so often and explicitly clearing it would make my memory usage drop. I may try that the next time I have a bloated-up Firefox, just to see. It's also possible that there's a magic option that causes Firefox to just throw away everything sent to console.log(), which would be fine by me.

web/FirefoxResignedToLeaks written at 00:20:23; Add Comment


Python's complicating concept of a callable

Through Planet Python I recently wound up reading this article on Python decorators. It has the commendable and generally achieved goal of a clear, easily followed explanation of decorators, starting out by talking about how functions can return other functions and then defining decorators as:

A decorator is a function (such as foobar in the above example) that takes a function object as an argument, and returns a function object as a return value.

Some experienced Python people are now jumping up and and down to say that this definition is not complete and thus not correct. To be complete and correct, you have to change function to callable. These people are correct, but at the same time this correctness creates a hassle in this sort of explanation.

In Python, it's pretty easy to understand what a function is. We have an intuitive view that pretty much matches reality; if you write 'def foobar(...):' (in module scope), you have a function. It's not so easy to inventory and understand all of the things in Python that can be callables. Can you do it? I'm not sure that I can:

  • functions, including lambdas
  • classes
  • objects if they're instances of a class with a __call__ special method
  • bound methods of an object (eg anobj.afunc)
  • methods of classes in general, especially class methods and static methods

(I don't think you can make modules callable, and in any case it would be perverse.)

Being technically correct in this sort of explanation exacts an annoying price. Rather than say 'function', you must say 'callable' and then probably try to give some sort of brief and useful explanation of just what a callable is beyond a function (which should cover at least classes and callable objects, which are the two common cases for things like decorators). This is inevitably going to complicate your writing and put at least a speed bump in the way for readers.

The generality of Python accepting callables for many things is important, but it does drag a relatively complicated concept into any number of otherwise relatively straightforward explanations of things. I don't have any particular way to square the circle here; even web specific hacks like writing function as an <ABBR> element with its own pseudo-footnote seem kind of ugly.

(You could try to separate the idea of 'many things can be callable' from the specifics of just what they are beyond functions. I'm not sure that would work well, but I'm probably too close to the issue; it's quite possible that people who don't already know about callables would be happy with that.)

python/ComplicatingCallableConcept written at 00:58:52; Add Comment


Link: NASA DASHlink - Real System Failures

The Observed Failures slide deck from NASA DASHlink (via many places) is an interesting and even alarming collection of observed failures in hardware and software, mostly avionics related. I find it both entertaining and a useful reminder that all of this digital stuff is really analog underneath and that leads to interesting failure modes. Lest you think that all of these were hardware faults and us software people can be smug, well, not really. There are more; read the whole thing, as they say.

links/NASAObservedFailures written at 22:15:39; Add Comment

Why I think Emacs readline bindings work better than Vi ones

I recently saw a discussion about whether people used the Emacs bindings for readline editing or the Vi bindings (primarily in shells, although there are plenty of other places that use readline). The discussion made me realize that I actually had some opinions here, and that my view was that Emacs bindings are better.

(My personal background is that vim has been my primary editor for years now, but I use Emacs bindings in readline and can't imagine switching.)

The Emacs bindings for readline aren't better because Emacs bindings are better in general (I have no opinion on that for various reasons). Instead, they're better here because the nature of Emacs bindings make going back and forth between entering text and editing text, especially without errors. This is because Emacs bindings don't reuse normal characters. Vi gains a certain amount of its power and speed from reusing normal letters for editing commands (especially lower case letters, which are the easiest to type), while Emacs exiles all editing commands to special key sequences. Vi's choice is fine for large scale text editing, where you generally spend substantial blocks of time first entering text and then editing it, but it is not as great if you're constantly going back and forth over short periods of time, which is much more typical of how I do things in a single command line. The vi approach also opens you up to destructive errors if you forget that you're in editing mode. With Emacs bindings there is no such back and forth switching or confusion (well, mostly no such, as there are still times when plain letters are special or control and meta characters aren't).

Another way of putting this is that Emacs bindings at least feel like they're optimized for quickly making small edits, while vi ones feel more optimized for longer, larger-scale edits. Since typo-fixes and the like are most of what I do with command line editing, it falls into the 'small edits' camp where Emacs bindings shine.

Sidebar: Let's admit to the practical side too

Readline defaults to Emacs style bindings. If you only use a few readline programs on a few systems, it's probably no big deal to change the bindings (hopefully they all respect $HOME/.inputrc). But I'm a sysadmin, and I routinely use many systems (some of them not configured at all) as many users (me, root, and others). Trying to change all of those readline configurations is simply not feasible, plus some programs use alternate readline libraries that may not have switchable bindings.

In this overall environment, sticking with the default Emacs bindings is far easier and thus I may be justifying to myself why it 'makes sense' to do so. I do think that Emacs bindings make quick edits easier, but to really be sure of that I'd have to switch a frequently used part of my environment to vi bindings for long enough to give it a fair shake, and I haven't ever tried that.

As a related issue, my impression is that using Emacs bindings have become the default in basically anything that offers command line editing, even if they're not using readline at all and have reimplemented it from scratch. This provides its own momentum for sticking with Emacs bindings, since you're going to run into them sooner or later no matter how you set your shell et al.

unix/EmacsForReadline written at 00:24:26; Add Comment


Why upstreams can't document their program's behavior for us

In reaction to SELinux's problem of keeping up with app development, one obvious suggestion is to have upstreams do this work instead. A variant of this idea is what DrScriptt suggested in a comment on that entry:

I would be interested in up stream app developers publishing things about their application, including what it should be doing. [...]

Setting aside the practical issue that upstream developers are not interested in spending their time on this, I happen to believe that there are serious and probably unsolvable problems with this idea even in theory.

The first issue is that the behavior of a sophisticated modern application (which are what we most care about confining well) is actually a composite of at least four different sources of behavior and behavior changes: the program itself, the libraries it uses, how a particular distribution configures and builds both of these, and how individual systems are configured. Oh, and as covered, this is really not 'the program' and 'the libraries', but 'the version of the program and the libraries used by a particular distribution' (or when the app was built locally).

In most Linux systems, even simple looking operations can go very deep here. Does your program call gethostbyname()? If so, what files it will access and what network resources it attempts to contact cannot be predicted in advance without knowing how nsswitch.conf (and other things) are configured on the specific system it's running on. The only useful thing that the upstream developers can possibly tell you is 'this calls gethostbyname(), you figure out what that means'. The same is true for calls like getpwuid() or getpwnam(), as well as any number of other things.

The other significant issue is that when prepared by an upstream, this information is essentially a form of code comments. Without a way for upstreams to test and verify the information, it's more or less guaranteed to be incomplete and sometimes outright wrong (just as comments are incomplete and periodically wrong). So we're asking upstreams to create security sensitive documentation that can be predicted in advance to be partly incorrect, and we'd also like it to be detailed and comprehensive (since we want to use this information as the basis for a fine-grained policy on things like what files the app will be allowed access to).

(I'm completely ignoring the very large question of what format this information would be in. I don't think there's any current machine-readable format that would do, which means either trying to invent a new one or having people eventually translate ad-hoc human readable documentation into SELinux policies and other things. Don't expect the documentation to be written with specification-level rigor, either; if nothing else, producing that grade of documentation is fairly expensive and time-consuming.)

linux/AppBehaviorDocsProblem written at 01:18:05; Add Comment


Some people feel that all permanent SMTP failures are actually temporary

It all started with a routine delivery attempt to my sinkhole SMTP server that I use as a spamtrap:

remote at 2017-07-03 13:30:28
220 This server does not deliver email.
EHLO mail.travelshopnews7.com
MAIL FROM:<newsletter@travelshopnews7.com>
250 Okay, I'll believe you for now
RCPT TO:<redacted@redacted>
250 Okay, I'll believe you for now
354 Send away
. <end of data>
554 Rejected with ID 433b5458d9d3e8a93020aca44406d2ec1d8ba82a
221 Goodbye

That ID is the hash of the whole message and its important envelope information (including the sending IP). So far, so normal, and these people stood out in a good way by actually QUITing instead of just dropping the connection. But then:

remote at 2017-07-03 13:37:10
220 This server does not deliver email.
EHLO mail.travelshopnews7.com
554 Rejected with ID 433b5458d9d3e8a93020aca44406d2ec1d8ba82a

They re-delivered the exact same message again. And again. And again. In less than 24 hours (up to July 4th at 9:28 am) they did 21 deliveries, despite getting a permanent refusal after each DATA. At that point I got tired of logging repeated deliveries for the same message and put them in a category of earlier blocks:

remote at 2017-07-04 10:38:10
220 This server does not deliver email.
EHLO mail.travelshopnews7.com
MAIL FROM:<newsletter@travelshopnews7.com>
550 Bad address
RCPT TO:<redacted@redacted>
503 Out of sequence command

You can guess what happened next:

remote at 2017-07-04 11:48:36
220 This server does not deliver email.
EHLO mail.travelshopnews7.com
MAIL FROM:<newsletter@travelshopnews7.com>
550 Bad address
RCPT TO:<redacted@redacted>
503 Out of sequence command

They didn't stop there, of course.

remote at 2017-07-13 15:37:31
220 This server does not deliver email.
EHLO mail.travelshopnews7.com
MAIL FROM:<newsletter@travelshopnews7.com>
550 Bad address
RCPT TO:<redacted@redacted>
503 Out of sequence command

Out of curiosity I switched things over so that I'd capture their message again and it turns out that they're still sending, although they've now switched over to trying to deliver a different message. Apparently they do have some sort of delivery expiry, presumably based purely on the message's age and totally ignoring SMTP status codes.

(As before they're still re-delivering their new message despite the post-DATA permanent rejection; so far, it's been two more deliveries of the exact same message.)

These people are not completely ignoring SMTP status codes, because they know that they didn't deliver the message so they'll try again. Well, I suppose they could be slamming everyone with dozens or hundreds of copies of every message even when the first copy was successfully delivered, but I don't believe they'd be that bad. This may be an optimistic assumption.

(Based on what shows up on www.<domain>, they appear to be running something called 'nuevoMailer v.6.5'. The program's website claims that it's 'a self-hosted email marketing software for managing mailing lists, sending email campaigns and following up with autoresponders and triggers'. I expect that their view of 'managing mailing lists' does not include 'respecting SMTP permanent failures' and is more about, say, conveniently importing massive lists of email addresses through a nice web GUI.)

spam/IgnoringSMTPFailures written at 20:48:07; Add Comment

Link: ZFS Storage Overhead

ZFS Storage Overhead (via) is not quite about what you might think. It's not about, say, the overhead added by ZFS's RAIDZ storage (where there are surprises); instead it's about some interesting low level issues of where space disappears to even in very simple pools. The bit about metaslabs was especially interesting to me. It goes well with Matthew Ahrens' classic ZFS RAIDZ stripe width, or: How I Learned to Stop Worrying and Love RAIDZ, which is endlessly linked and cited for very good reasons.

links/ZFSStorageOverhead written at 12:25:07; Add Comment

SELinux's problem of keeping up with general Linux development

Fedora 26 was released on Tuesday, so today I did my usual thing of doing a stock install of it in a virtual machine as a test, to see how it looks and so on. Predictable things ensued with SELinux. In the resulting Twitter conversation, I came to a realization:

It seems possible that the rate of change in what programs legitimately do is higher than the rate at which SELinux policies can be fixed.

Most people who talk about SELinux policy problems, myself included, usually implicitly treat developing SELinux policies as a static thing. If only one could understand the program's behavior well enough one could write a fully correct policy and be done with it, but the problem is that fully understanding program behavior is very hard.

However, this is not actually true. In reality, programs not infrequently change their (legitimate) behavior over time as new versions are developed and released. There are all sorts of ways this can happen; there's new features in the program, changes to how the program itself works, changes in how libraries the program uses work, changes in what libraries the program uses, and so on. When these changes in behavior happen (at whatever level and for whatever reason), the SELinux policies need to be changed to match them in order for things to still work.

In effect, the people developing SELinux policies are in a race with the people developing the actual programs, libraries, and so on. In order to end up with a working set of policies, the SELinux people have to be able to fix them faster than upstream development can break them. It would certainly be nice if the SELinux people can win this race, but I don't think it's at all guaranteed. Certainly with enough churn in enough projects, you could wind up in a situation where the SELinux people simply can't work fast enough to produce a full set of working policies.

As a corollary, this predicts that SELinux should work better in a distribution environment that rigidly limits change in program and library versions than in one that allows relatively wide freedom for changes. If you lock down your release and refuse to change anything unless you absolutely have to, you have a much higher chance of the SELinux policy developers catching up to the (lack of) changes in the rest of the system.

This is a more potentially pessimistic view of SELinux's inherent complexity than I had before. Of course I don't know if SELinux policy development currently is in this kind of race in any important way. It's certainly possible that SELinux policy developers aren't having any problems keeping up with upstream changes, and what's really causing them these problems is the inherent complexity of the job even for a static target.

One answer to this issue is to try to change who does the work. However, for various reasons beyond the scope of this entry, I don't think that having upstreams maintain SELinux policies for their projects is going to work very well even in theory. In practice it's clearly not going to happen (cf) for good reasons. As is traditional in the open source world, the people who care about some issue get to be the ones to do the work to make it happen, and right now SELinux is far from a universal issue.

(Since I'm totally indifferent about whether SELinux works, I'm not going to be filing any bugs here. Interested parties who care can peruse some logs I extracted.)

linux/SELinuxCatchupProblem written at 01:19:14; Add Comment

(Previous 11 or go back to July 2017 at 2017/07/12)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.