2013-09-29
Universities and long term perspectives
I expect that in many places my habit of looking five or ten years into the future for things I'm considering would be seen as more than a little bit daft and pointless. In a company the problems you have in five or ten years are likely to look much different than what you have today; if a system has survived at all, it's quite likely to have grown or otherwise changed drastically as the business scaled up or mutated (this is especially true in startups and other small companies that are trying hard to grow).
This is not the case at universities. It is a rare department that changes so much in even ten years as to require its existing IT systems to change dramatically. Departments just generally don't get drastically larger or smaller or different; the fundamental things that they do haven't changed for a long time (and are unlikely to change in the future) and changes in scale tend to be quite slow. A department doubling in size is considered really drastic growth.
(Universities also naturally have a significant amount of stability in who is there. Professors generally have tenure and so stay for decades while graduate students frequently take half a decade or more from when they start to when they get their PhD and leave. Even the undergraduates are generally here for four or five years.)
The result is that software and large scale systems can routinely live for a long time, and in fact the long term people like this long term stability in the environment around them. Of course in an ideal environment you'll turn over the physical hardware reasonably frequently, but your overall design can stay the same because the work it needs to do is the same and scope and scale haven't changed drastically. Real shifts in technology or in the demand for services are relatively infrequent (at least in my experience).
(Of course you don't have to keep systems going for five or ten years, but change for the sake of change and nothing more is generally not a good thing. In some ways this is a luxury and in some ways this is a burden.)
2013-09-23
The FTE pricing gamble (for vendors)
A popular move with a number of vendors that we've tried to deal with is to offer us site licenses based (only) on how many FTEs we have (for those that have been lucky enough not to encounter this term, it means how many 'full time equivalent' people you have). In the process of this I've come to feel that offering FTE-based pricing is a high stakes gamble for a vendor, one that doesn't necessarily work in their favour. One way to put it is that FTE-based pricing is an artificial attempt to make us use the product everywhere.
In an FTE pricing model the actual per-unit price of the product depends on how widely you use it. If you use it everywhere or nearly everywhere then its price can be quite low. If you use it in only a few places and only for a few people, its effective price is very high. Compounding this is that FTE based licensing is generally priced with the assumption (either implicit or explicit) that the product will be widely used.
(Another way to put this comes from Windows licensing; you pay a 'FTE based product tax' for every person whether or not they use the product. The vendor goal is what it was for Microsoft, to make any alternative the more expensive choice because you are already paying for the vendor's product.)
Some vendors come to us with FTE pricing when we already use their product widely or near universally. These vendors can get away with FTE pricing even if it doesn't save us money (and sometimes it makes internal political sense). But other vendors have come to us with FTE pricing when they are not so widely used (and the vendor should know this), or even worse used only a bit (or not yet used at all). These vendors are playing a very high stakes gamble: they are betting that they can force us to pay their price and as a result push their product throughout the organization. This can and does backfire and when it backfires, it often does so violently. For obvious reasons, this goes especially badly when a vendor is trying to change a much cheaper long-standing pricing model to an FTE model.
(You would think that vendors would avoid doing something like this, but apparently not. I've been a (distant) spectator to just such a backfire recently, which is why this issue is on my mind.)
By the way, this FTE pricing model is especially dangerous with a larger organization because the absolute dollars involved are much bigger. If the only organizational unit a vendor will license for is 'the entire University of Toronto', well, I believe we are at something over 10,000 FTEs. You can imagine what that does to prices, among other things.
Sidebar: the problem with expensive university-wide licenses here
In some universities, IT and the IT budget are centrally provided and centrally managed. That is not the case here; faculties and departments and groups fund their IT individually. This means that there is generally no one place to fund an expensive university-wide license in one decision; instead it must be funded by running around to lots of different people to get them to chip in. This takes a lot of time and work and injects a lot of uncertainty into the process, especially if it must be renewed on a year to year basis (since next year a particular department may not have the budget or may decide that they don't have that much need and so on and so forth).
2013-09-19
Load is a whole system phenomenon
Here's something obvious: load and its companion overload is something that's created by everything that's going on on your system at once. Oh, sure, some subset of the activity can be saturating a particular resource, but in general (and without quota-based things like Linux's cgroups) it is the sum of all activity (or all relevant activity) that matters.
So far this probably all sounds very obvious, and it is. But there's a big corollary: if you want to limit load, you must take a global perspective on activity. If you have ten things that each could create load, you can't limit overall system load just by limiting those ten things individually and in isolation from each other. A 'reasonable load' for one thing by itself is not necessarily reasonable when all ten are loaded at once. If you have no dynamic global system the best you can do is to assign static quotas such that each thing gets a limit of (say) 1/10th of the machine and can't use more even when the system is otherwise idle.
Now this comes with an exception: if all activity funnels through one central point at some point in processing, you can (sometimes) put load limits on that single point and be done. That's because the single point implicitly has a global view of the load; it 'knows' what the global total load is because it sees all traffic.
All of this sounds hopelessly abstract, so let's talk web servers and web applications. Suppose you have a web server serving ten web apps, each of which is handled by its own separate daemon. You want your machine to not explode under load, no matter what load it is. Can you get this by just putting individual limits on each web app (eg 'only so much concurrency at once')? My answer is 'not unless you're going to use low limits', at least if demand for the apps is unpredictable. To do this properly you need some central point to apply a whole system view and whole system limits. One such spot might be the front-end web server; another might be a daemon that handles or at least monitors all web apps at once.
In short, now you know why I feel that separate standalone daemons are the wrong approach for scalable app deployment. Separate daemons mean separate limits and you can't configure those sensibly without risking blowing up your machine under load. The more apps you have the worse this gets (because the less their 'safe' share of the machine is).
2013-09-15
Identities, trust, and work
As part of thinking about 'web of trust' systems, I've recently come to think that there are effectively two sorts of identities on the Internet. For lack of a better terminology I will call these 'internal' and 'external'.
An external identity is an identity that is linked to something in the outside Internet world. In one sense, the identity exists to assert that the person behind a series of work is the same person and this new work comes from the same person as a series of previous work. 'Trust' for such an identity within your identity system is essentially meaningless; people don't care that Linus Torvalds' GPG key has lots of signatures, they care that it continues to sign Linux kernel releases and that the 'Linus Torvalds' on kernel mailing lists doesn't denounce it as forged and so on. The work done in the name of the identity is its proof and source of trust.
An internal identity is an identity without this property. Its only significant existence is within your identity system and it is otherwise free-floating, not tied to something else out on the Internet that people care about or look at. Trust for these identities is necessarily created within your identity system because there is nothing else to do it; there is nothing significant on the Internet to say 'yes, this is my identity'.
Internal identities are necessarily much more vulnerable than external identities because there is nothing else there; your identity system is it.
Man in the middle attacks are possible on unsupported external identities in situations where you can actually do two-way impersonation and keep it up. When it comes to personal identities I think that this is rare. Other sorts of identities are much more attackable this way and so need stronger internal support from your identity system; here the 'trust' your identity system needs to create is that you are talking to the real thing, not an imposter in the middle.
2013-09-14
A basic overview of SAS and using SATA with SAS
I have been busy having a somewhat painful learning experience about SAS, especially bearing on putting SATA disks behind SAS. In my usual way I'm going to write down what I've learned so that it will stick.
SAS is yet another disk interconnect technology. It came out of SCSI (as opposed to SATA coming out of ATA) and so it is generally more large-scale, enterprisey, and costly. You can, at least in theory, use SATA disks in (some) SAS systems.
(For my purposes here I'm going to skip over the actual and claimed differences between SAS disks and SATA disks of the same basic specification. You can find partisans on both sides and I lack the knowledge to have an informed opinion.)
Like SATA, SAS is fundamentally a directly connected point to point system; one SAS port, one disk. However normal SAS connectors are 'multi-lane', which is to say that they have the actual wiring for several SAS ports bundled into one physical cable and connector. The essentially standard sorts of connectors (such as 'IPASS' aka SFF-8087) has four lanes and so one connector on your SAS card or SAS-equipped motherboard can connect to four drives (through various things in the middle to break out the lanes). SAS also has standard support for SAS expanders, which are very roughly the equivalent of SATA port multipliers except that they support many more drives and generally work better.
Many cases that support SAS drives (such as the SuperMicro SC836 series) have a SAS backplane that sits between the drives and the rest of the world (such as your server motherboard). There are at least three general ways that such a backplane can work: it can have one or more SAS expanders, it can wire all drives through with completely separate connectors (aka '1:1'), or it can have IPASS connectors on the outside and break out each lane to a drive on the inside (aka '4:1'). If the backplane has SAS expanders you need only a few SAS ports to talk to all of the disks (often only one or maybe two). Otherwise you need as many SAS ports as you have drives.
(I think that a JBOD SAS disk enclosure is almost always going to have a SAS expander or two so that you only have to connect up one or two external cables. A server chassis may use any of the three options and your vendor may even let you choose. A full discussion of which one you might want when is beyond both my experience and the scope of this entry.)
You can plug SATA drives into SAS drive connectors (although not vice versa). While the two sorts of drives use completely different protocols, a properly functioning SAS environment will arrange to speak SATA to SATA drives. If you are not using a SAS expander I believe that the host SAS controller does this directly. If you are using a SAS expander there is STP (the SATA Tunneled Protocol) and I believe that your host SAS controller talks STP to the SAS expander, which turns it into SATA when it talks to the drive(s). On the Internet you can read all sorts of bad things about using SATA drives in SAS systems but my impression is that what creates most of this badness is putting SATA drives behind SAS expanders (see eg this). SATA drives that are directly connected to host SAS controllers seem to be reported as working okay.
As far as I can tell this means that JBOD SAS disk enclosures with SATA disks are probably not recommended but SAS server cases may be okay depending on what their SAS backplane is. You want a non-expander option, which will probably have the inconvenient side effect that you'll need an additional SAS controller card or two (8-port cards seem to be the standard). This may have implications for your long term storage expansion plans.
(SAS disks remain sufficiently more expensive than SATA disks that we can't deal with the whole mess by just using SAS disks.)