2016-05-31
Understanding the modern view of security
David Magda wrote a good and interesting question in a comment on my entry on the browser security dilemma:
I'm not sure why they can't have an about:config item called something like "DoNotBlameFirefox" (akin to Sendmail's idea).
There is a direct answer to this question (and I sort of wrote it in my comment), but the larger answer is that there has been a broad change in the consensus view of (computer) security. Browsers are a microcosm of this shift and also make a great illustration of it.
In the beginning, the view of security was that your job was to create a system that could be operated securely (often but not always it was secure by default) and give it to people. Where the system ran into problems or operating issues, it would tell people and give them options for what to do next. In the beginning, the diagnostics when something went wrong were terrible (which is a serious problem), but after a while people worked on making them better, clearer, and more understandable by normal people. If people chose to override the security precautions or operate the systems in insecure ways, well, that was their decision and their problem; you trusted people to know what they were doing and your hands were clean if they didn't. Let us call this model the 'Security 1' model.
(PGP is another poster child for the Security 1 model. It's certainly possible to use PGP securely, but it's also famously easy to screw it up in dozens of ways such that you're either insecure or you leak way more information than you intend to.)
The Security 1 model is completely consistent and logical and sound, and it can create solid security. However, like the 'Safety-I' model of safety, it has a serious problem: it not infrequently doesn't actually yield security in real world operation when it is challenged with real security failures. Even when provided with systems that are secure by default, people will often opt to operate them in insecure ways for reasons that make perfect sense to the people on the spot but which are catastrophic for security. Browser TLS security warnings have been ground zero for illustrating this; browser developers have experimentally determined that there is basically no level of strong warnings that will dissuade enough people from going forward to connect to what they think is eg Facebook. There are all sorts of reasons for this, including the vast prevalence of false positives in security alerts and the barrage of warning messages that we've trained people to click through because they're just in the way in the end.
The security failures of the resulting total system of 'human plus computer system' are in one sense not the fault of the designers of the computer system, any more than it is your fault if you provide people with a saw and careful instructions to use it only on wood and they occasionally saw their own limbs off despite your instructions, warnings, stubbornly attached limb guards, and so on. At the same time, the security failures are an entirely predictable failure of the total system. This has resulted in a major shift in thinking about security, which I will call 'Security 2'.
In Security 2 thinking, it is not good enough to have a secure system if people will wind up operating it insecurely. What matters and the goal that designers must focus on is making the total system operate securely, even in adverse conditions; another way to put this is that the security goal has become protecting people in the real world. As a result, a Security 2 focused designer shouldn't allow security overrides to exist if they know those overrides will wind up being (mis)used in a way that defeats the overall security of the system. It doesn't matter if the misuse is user error on the part of the people using the security system; the result is still an insecure total system and people getting owned and compromised, and the designer has failed.
Security 2 systems are designed not necessarily so much to be easy to use as to be hard or impossible to screw up in such a way that you get owned (although often this means making them easy to use too). For example, all the time, automatic end to end encryption of messages in an instant messaging system is a Security 2 feature; optional, must be selected or turned on by hand end to end encryption of messages is a Security 1 feature.
Part of the browser shift to a Security 2 mindset has been to increasingly disallow any and all ways to override core security precautions, including being willing to listen to websites over users when it comes to TLS failures. This is pretty much what I'd expect from a modern Security 2 design, given what we know about actual user behavior.
(The Security 2 mindset raises serious issues when it intersects with user control over their own devices and software, because it more or less inherently involves removing some of that control. For example, I cannot tell modern versions of Firefox to do my bidding over some TLS failures without rebuilding them from source with increasing amounts of hackery applied.)
2016-05-29
What does 'success' mean for a research operating system?
Sometimes people talk about how successful (nor not successful) an operating system has been, when that operating system was created as a research project instead of a product. One of the issues here is that there are several different things that people can mean by a research OS being a success. In particular, I think that there are at least four sorts of it:
- The OS actually works and thus serves as a proof of concept for the
underlying ideas that motivated this particular research OS
variation. What 'works' means may vary somewhat, since research
projects rarely reach production status; generally you get some
demos running acceptably fast.
Having your research OS actually work is about the baseline definition of success. It means that your ideas don't conflict with each other, can be made to work acceptably, and don't require big compromises to be implemented.
- The OS works well enough and is attractive enough that people
in your research group can and do build things on it and actively
use it. If it's a general purpose OS, people voluntarily and
productively use it for everyday activity; if it's a specialized
real time or whatever OS, people voluntarily build their own
projects on top of it and have them work.
A research OS that has reached this sort of success is more than just a technology demonstration and proving ground. It can do real things.
- At least some of your OS's ideas are attractive enough that they
get implemented in other OSes or at least clearly influence the
development of other OSes. This is especially so if your ideas
propagate to production OSes in some form or other (often in a
somewhat modified and less pure form, because that's just how
things go).
(As anyone who's familiar with academic research knows, a lot of research is basically not particularly influential. Being influential means you've achieved more success than usual.)
- Some form of your research OS winds up being used by outside people to do real work; it becomes a 'success' in the sense of 'it is out in the real world doing things'. Sometimes this is your OS relatively straight, sometimes it's a heavily adopted version of your work, and I'm sure that there have been cases where companies took the ideas and redid the implementation.
Most research OSes reach the first level of success, or at least most that you ever hear about (the research community rarely publishes negative results, among other issues). Or at least they reach the appearance of it; there may be all sorts of warts under the surface in practice in terms of performance, reliability, and so on. On the other hand some research OSes are genuine attempts to achieve genuinely usable, reliable, and performant results in order to demonstrate that their ideas are not merely possible but are actively practical.
It's quite rare for a research OS to reach the fourth level of success of making it into the real world. There are not many 'real world' OSes in the first place and there are very large practical obstacles in the way. To put it one way, there is a lot of non-research work involved in making something a product (even a free one).
(In general purpose OSes, I think only two research OSes have made a truly successful jump into the real world from the 1970s onwards, although it's probably been tried with a few more. I don't know enough about the real time and embedded computing worlds to have an idea there.)
2016-05-14
IPv6 is the future of the Internet
I say, have said, and will say a lot of negative things about IPv6 deployment and usability. I'm on record as believing that large scale IPv6 usage will cause lots of problems in the field, with all sorts of weird failures and broken software (and some software that is not broken as such but is IPv4 only), and that in practice lots of people will be very slow to update to IPv6 and there will be plenty of IPv4 only places for, oh, the next decade or more.
But let me say something explicitly: despite all that, I believe that IPv6 is the inevitable future of the Internet. IPv6 solves real problems, those problems are getting more acute over time, the deployment momentum is there, and and sooner or later people will upgrade. I don't have any idea of how soon this will happen ('not soon' is probably still a good bet), but over time it's clear that more and more traffic on the Internet will be IPv6, despite all of the warts and pain involved. The transition will be slow, but at this point I believe it's long since become inevitable.
(Whether different design and deployment decisions could have made it happen faster is an academically interesting question but probably not one that can really be answered today, although I have my doubts.)
This doesn't mean that I'm suddenly going to go all in on moving to IPv6. I still have all my old cautions and reluctance about that. I continue to think that the shift will be a bumpy road and I'm not eager to rush into it. But I do think that I should probably be working more on it than I currently am. I would like not to be on the trailing edge, and sooner or later there are going to be IPv6 only services that I want to use.
(IPv6 only websites and other services are probably inevitable but I don't know how soon we can expect them. Anything popular will probably be a sign of the trailing edge, but I wouldn't be surprised to see a certain sort of tech-oriented website go IPv6 only earlier than that as a way of making a point.)
As a result, I now feel that I should be working to move my software and my environment towards using IPv6, or at least being something that I can make IPv6 enabled. In part this means looking at programs and systems I'm using that are IPv4 only and considering what to do about them. Hopefully it will also mean making a conscious effort not to write IPv4 only code in the future, even if that code is easier.
(I would say 'old programs', but I have recently written something that's sort of implicitly IPv4 only because it contains embedded assumptions about eg doing DNS blocklist lookups.)
Probably I should attempt to embark on another program of learning about IPv6. I've tried that before, but it's proven to have the same issue for me as learning computer languages; without an actual concrete project, I just can't feel motivated about learning the intricacies of IPv6 DHCP and route discovery and this and that and the other. But probably I can look into DNS blocklists in the world of IPv6 and similar things; I do have a project that could use that knowledge.
2016-05-08
Issues in fair share scheduling of RAM via resident set sizes
Yesterday I talked about how fair share allocation of things needs a dynamic situation and how memory was not necessarily all that dynamic and flow-based. One possible approach to fair share allocation of memory is to do it on Resident Set Size. If you look at things from the right angle, RSS is sort of a flow in that the kernel and user programs already push it back and forth dynamically.
(Let's ignore all of the complications introduced on modern systems by memory sharing.)
While there has been some work on various fair share approaches to RSS, I think that one issue limiting the appeal here is that significantly constraining RSS often has significant undesirable side effects. Every program has a 'natural' RSS, which is the RSS at which it only infrequently or rarely has to ask for something that's been removed from its set of active memory. If you clamp a program's RSS below this value (and actually evict things from RAM), the program will start trying to page memory back in at a steadily increasing rate. Eventually you can clamp the program's RSS so low that it makes very little forward progress in between all of the page-ins of things it needs.
Up until very recently, all of this page-in activity had another serious effect: it ate up a lot of IO bandwidth to your disks. More exactly, it tended to eat up your very limited random IO capacity, since these sort of page-ins are often random IO. So if you pushed a program into having a small enough RSS, the resulting paging would kill the ability of pretty much all programs to get IO done. This wasn't technically swap death, but it might as well have been. To escape this, the kernel probably needs to limit not just the RSS but also the paging rate; a program that was paging heavily would wind up going to sleep more and more of the time in order to keep its load impact down.
(These days a SSD based system might have enough IO bandwidth and IOPS to not care about this.)
All of this is doable but it's also complicated, and it doesn't get you the sort of more or less obviously right results that fair share CPU scheduling does. I suspect that this has made fair share RSS allocation much less attractive than simpler things like CPU scheduling.
2016-05-07
'Fair share' scheduling pretty much requires a dynamic situation
When I was writing about fair share scheduling with systemd the other day, I rambled in passing about how I wished Linux had fair share memory allocation. Considering what fair share memory allocation would involve set off a cascade of actual thought, and so today I have what is probably an obvious observation.
In general what we mean by 'fair share' scheduling or allocation is something where your share of a resource is not statically assigned but is instead dynamically determined based on how much other people also want. Rather than saying that you get, say, 25% of the network bandwidth, we say that you get 1/Nth of it where N is how many consumers want network bandwidth. Fair share scheduling is attractive both because it's 'fair' (no one gets to over-consume a resource), it doesn't require setting hard caps or allocations in advance, and it responds to usage on the fly.
But given this, fair share scheduling really needs to be about something dynamic, something that can easily be adjusted on the fly from moment to moment and where current decisions are in no way permanent. Put another way, fair share scheduling wants to be about dividing up flows; the flow of CPU time, the flow of disk bandwidth, the flow of network bandwidth, and so on. Flows are easy to adjust; you (the fair share allocator) just give the consumers more or less this time around. If more consumers show up, the part of the flow that everyone gets becomes smaller; if consumers go away, the remaining ones get a larger share of the flow. The dynamic nature of the resource (or of the use of the resource) means that you can always easily reconsider and change how much of it the consumers get.
If you don't have something that's dynamic like this, well, I don't think that fair share scheduling or allocation is going to be very practical. If adjusting current allocations is hard or ineffective (or even just slow), you can't respond very well to consumers coming and going and thus the 'fair share' of the resource changing.
The bad news here is pretty simple: memory is not very much of a flow. Nor is, say, disk space. With relatively little dynamic flow nature to allocations of these things, they don't strike me as things where fair share scheduling is going to be very successful.