2012-04-25
Models of providing computing access in a university department
Suppose that you are a university department, such as the Department of Computer Science. You need to provide access to computing services (and perhaps computing services themselves) to your members, especially to graduate students. Or to put it pragmatically, when a new graduate student shows up how do they get on your network and get access to whatever general services you offer?
(These days the question to ask is 'where do they get whatever machine they use to connect to the network?')
There's at least three ways that we (as a department) could provide computing access to people: it could be provided by the department as a whole, by the professor or research group that people are associated with, or by the individual person themselves. In the old days when computing access was expensive, it was basically always provided by the department or by research groups (in the form of labs, terminals in graduate student space, and so on). These days computing access is cheap enough that in theory you could require new graduate students to buy their own machine in the same way that they have to buy their own textbooks.
(For undergraduate computing the choices get reduced to the department providing computer labs or undergrads being forced to buy their own machines.)
All of these approaches have problems. With departmentally provided computers, the problem is cost and thus the quality of machines that people will get; unless you are very well endowed, you can't afford to buy everyone nice machines and keep them up to date. With group provided computers, the problem is inequality; different research groups may have vastly different funding levels and so their graduate students (and professors) will get machines of quite different quality levels. With personally provided computers, the problem is the cost to the graduate students and what happens if the machine breaks (or is stolen).
(It's generally considered bad form to force graduate students to leave because they can't afford to replace their laptop and so can't get back on your network to do their work. If nothing else, good professors aren't going to let it happen to good graduate students.)
Of course the models are not pure and have never really been. Grant funding has always meant that professors and research groups sometimes bought (or were given) their own computing, even if the department provided some basic level of computing support. And these days a certain amount of new graduate students (and professors) will show up with their own computers which they are quite attached to and are not interested in replacing, thanks.
(One of the differences between the corporate world and academia is that in academia you are not in any real position to tell new people that they can't bring in their own personal computers and put them on your network. Among other things, it theoretically saves a bunch of money and universities can be all about saving money.)
By the way, I don't think there's a single correct answer to this general issue. Every department in every university is probably going to evolve its own scheme that suits the local culture, funding levels and sources, and politics. We (the Department of Computer Science) have our own (current) answers, which this entry has grown far too long to contain.
Sidebar: computing access versus core computing services
I'm drawing a distinction here between access to the department's computing services and the core computing services themselves (if any). Core computing services are shared by everyone in the department and can generally be sensibly used even by groups who have their own computing as well. Computing access is generally one machine per person; if your professor provides you with a nice laptop, you have no use for the clapped out desktop that the department may have planted on your assigned desk.
What this means is that it's much easier for the department to provide a useful backstop of basic computing services than it is for the department to provide a sensible backstop of computing access, because the computing access can easily go completely unused (which leaves the department buying a lot of basic desktops that no one actually wants).
2012-04-24
Universities and their non-employees
In an ordinary conventional company, everyone there matters to the company (at least in theory) and if someone can't do their work it's a real problem; the company is losing money for every minute they sit idle. At a minimum the company is losing their salary, and generally it will be losing more than that in forgone profit (because, again in theory, everyone is making some sort of contribution to the company's bottom line or they wouldn't be there).
(In practice you can have employees who add negative value, where having them sit idle would actually be a net win. But this is a pathology and we'll ignore it.)
Universities are not like that. In particular, universities are full of people who do not matter to it in this way (at least individually; they do collectively). A university is fundamentally indifferent to whether an individual undergraduate student or graduate student can do their work productively, or even at all; except in perhaps a vague way, the university is not losing money or forgoing profit by letting them sit idle in the way that a company would be with an employee. Indeed, a university expects a certain amount of these people to fail to do work as a matter of course.
(A university still has people that matter to it in the same way that employees matter to companies, but generally the university staff and professors are dwarfed by the undergraduate and graduate student population.)
This has important consequences for IT support (and indeed various other sorts of support and working conditions).
In a company, not providing computing support to people is ultimately cutting off your nose to spite your face; regardless of whose theoretical fault their problem is, the practical effect is that the company is losing money when they can't work. It's possible to have rational business reasons for denying support, but it is always at least a little bit destructive to the value of the company. This makes issues like the Bring Your Own Device debate very sharp, because you need to get people working almost no matter what.
In a university, not providing (much) computing support to certain people is perfectly viable because it is ultimately not the university's problem if they can't get their work done, it is their problem. Only a relatively small portion of the university population must be supported at all cost; the staff and at least some of the professors. The majority can be more or less marooned on their own, given just enough support so that enough of them can get enough work done to keep the engines of the university turning over (and to keep the students from revolting in protest at terrible working conditions).
Or in short: in a university, it's viable to not support people. In a university, but not in a company, it's perfectly workable to tell a lot of people 'tough luck, you're on your own, if you can't make it work that's your problem'.
(This is related to how universities do not have a return on investment, and is part of my slow running series on how universities are peculiar.)
2012-04-15
My view on why CISC (well, x86) won over RISC in the end
One of Tanebaum's now-famous three predictions in the Tanenbaum/Torvalds debate was that x86 would die out and be replaced by RISC architectures. Some of you may be tempted to immediately laugh at this prediction, but before you do that you should remember the historical context; specifically, that this prediction was made in 1992. Back then this looked like a very good bet, one so good that Intel made it itself with the Itanium. RISC was everywhere, the x86 architecture was low end and not very well performing, and so on. So what happened on the way between 1992 and now?
In my view, there are two major reasons and one moderate reason why the x86 architecture has proven to be evergreen and why RISC never overtook it, especially traditional 1992-style RISC (note that ARM is not a 1992 style RISC).
The first reason is that people (vastly) underestimated how far x86 performance could be pushed with enough effort. I'm not sure if specialists in 1992 could have made a good prediction of future x86 performance, but certainly the general computing perception was that x86 performance and CISC performance in general were fundamentally limited by the complexity (and limitations) of the architecture. RISC was designed to go real fast, and CISC just wasn't.
The second reason is that people vastly underestimated how much money there would be in increasing x86 performance. Pushing x86 performance to its current heights has required absurdly large amounts of money; back in 1992 you had to be at least somewhat contrarian and open-minded to see that for various reasons x86 was going to have this huge amount available to improve it. I think it might have been possible to see the wave coming but it certainly was far from obvious or easy.
(Once again, note that even Intel missed this as late as 1994, when it started the whole Itanium dead end.)
The moderate reason is the issue of memory bandwidth and latency. Traditional RISC designs have larger instructions than x86 and need more of them to get the same amount of work done; this puts more pressure on caches and demands more memory bandwidth. Memory access speed was not seen as a big concern back in 1992, but it has become an increasingly important one since then and this favours architectures with compact instruction encodings (even if this results in them being far from regular and easy to decode).
(There is a more general argument that RISC was always a larger bet than it appeared, but that will take another entry.)
2012-04-12
Tanenbaum was wrong in the Tanenbaum-Torvalds debate
One of the reactions to the Stackoverflow question that asked why Tanenbaum was wrong in his predictions for the future in the Tanenbaum/Torvalds debate was for people to say that Tanenbaum was right or mostly right (although not as fast as expected). This is incorrect.
The question quotes three Tanenbaum predictions:
- Microkernels are the future
- x86 will die out and RISC architectures will dominate the market
- (5 years from then) everyone will be running a free GNU OS
The usual defense of the first prediction is to point to the growing use of virtualization hypervisors and claim that this counts. It doesn't; hypervisors are not microkernels. Neither are kernels with loadable device drivers and other sorts of modularity; well structured kernels are not anywhere near the same thing as microkernels. When Tanenbaum was talking about microkernels here, he really meant microkernels.
I've seen two defenses of the second prediction. The first is to point to the growth of ARM-based devices, and the second is to note that the microcode and internal execution units of x86 chips are increasing RISCy; the x86 instruction set is sort of a CISC frontend to a RISC. To mount either defense either misses or deliberately ignores the historical context of Tanenbaum's prediction (and ignores the first part of it). At the time he made it, Tanenbaum was clearly predicting the replacement of x86 chips and the ugly x86 architecture in general purpose computers by RISC chips. He was not making a wide-ranging prediction about a future world with lots of smart devices (many of them RISC-based), and he was not predicting the future of the inner bits of the x86 implementation. He was predicting that your desktop computer, your laptop, and your servers would all use some RISC chipset and not run x86 code. This has clearly not come to pass and doesn't seem likely to come to pass any time soon.
Anyone who wants to argue that the growing number of Android devices is starting to validate Tanenbaum's third prediction is similarly misreading it. In the time and place Tanenbaum made the prediction, he did not mean 'in non-computer smart devices', he meant on your computers (desktop, server, etc). Even if we allow Linux to meet his criteria (which is unlikely to be what he meant at the time, since he was arguing against Torvalds), this is a clear failure.
(I'm ignoring the time limit given in the third prediction for fuzzy reasons. If you don't, the prediction clearly and unarguably failed.)
PS: I don't think that we should be harsh on Tanenbaum for getting these predictions wrong. Among other reasons, remember the context of the entire thing; Tanenbaum was having a casual argument on Usenet, not conducting a carefully thought out academic debate. Only through an odd series of happenstances has this episode been exhumed years later to be scrutinized minutely.
Hypervisors are not microkernels
As a result of a Stackoverflow question on the Tanenbaum/Torvalds debate I got to see a certain number of people claim that (virtualization) hypervisors are more or less microkernels, certainly enough so to make one of Tanenbaum's predictions (ultimately) correct. This is wrong. It's true that hypervisors and microkernels can both sit 'underneath' normal monolithic kernels as a low-level layer, and both can potentially be 'small' for some vague definition of small. But those are about the only solid points of similarity, and once you look at their purposes and how they are supposed to be used they become very different.
Microkernels are intended to create a minimal set of low-level operations that would be used to build an operating system. While it's popular to slap a monolithic kernel on top of your microkernel, this is not how microkernel based OSes are supposed to be; a real microkernel OS should have lots of separate pieces that used the microkernel services to work with each other. Using a microkernel as not much more than an overgrown MMU and task switching abstraction layer for someone's monolithic kernel is a cheap hack driven by the needs of academic research, not how they are supposed to be.
(There have been a few real microkernel OSes, such as QNX; Tanenbaum's Minix is or was one as well.)
By contrast, hypervisors virtualize and emulate hardware at various levels of abstraction. This involves providing some of the same things that microkernels do (eg memory isolation, scheduling), but people interact with hypervisors in very different ways than they interact with microkernels. Even with 'cooperative' hypervisors, where the guest OSes must be guest-aware and make explicit calls to the hypervisor, the guests are far more independent, self-contained, and isolated than they would be in a microkernel. With typical 'hardware emulating' hypervisors this is even more extremely so because much or all of the interaction with the hypervisor is indirect, done by manipulating emulated hardware and then having the hypervisor reverse engineer your manipulations. As a consequence, something like guest to guest communication delays are likely to be several orders of magnitude worse than IPC between processes in a microkernel.
Or in short: writing a microkernel style OS on top of a hypervisor is pretty clearly an absurd notion. This shows that hypervisors are not a species of microkernel.
(I'm sure this is plenty obvious to most people as it is, but I feel like writing it down here in order to be explicit.)