Wandering Thoughts archives

2007-07-22

Universities are not businesses: an implication

Part of the peculiarities of the university environment is that universities are not businesses. Well, yes, of course. But consider one of the implications: universities do not directly make money.

In a company, you can justify a more expensive thing on the grounds that it never the less makes the company more money, either directly or indirectly (through increased productivity, for example). Because a company makes money, what matters is not the absolute expense but the profit/expense ratio.

(It matters so much that business has a special term for it, and spends a great deal of time and effort working out the numbers.)

Without profits, in a university the force is generally for low expenses, period. Sometimes (if you are lucky) you can get people to take a global view, justifying one expense on the grounds that it reduces another one by more, but this can be complicated if the expenses involved cross organizational boundaries (where group A is paying more so that group B can save money).

(While universities make money, most of it comes in in ways that are difficult to link to anything in specific that the university did. Generally, no one can tell why undergraduate enrollment is up 20% or the like.)

Of course this is not unique to universities; many companies have portions that don't earn money but are just necessary overhead, and I expect that they are subject to many of the same pressures.

UniversitiesAndROI written at 23:14:07; Add Comment

2007-07-15

Problems I see with the ATA-over-Ethernet protocol

I've been experimenting with AoE lately, and as a result I've been looking at the protocol more than I did in my earlier exposure. Unfortunately, the more I look at the AoE protocol, the more uncomfortable I get.

The AoE protocol is quite simple; requests and replies are simple Ethernet frames, and a request's result must fit in a single reply packet. This means that the maximum read and write sizes per request are bounded by the size of the Ethernet frame, and thus on a normal Ethernet the maximum is 1K per request. (AoE does all IO in 512-byte sectors.)

So, the problems I see:

  • AoE effectively requires the target to do buffering in order to bridge the gap between AoE's small requests and the large IO requests that modern disk systems need to see to get decent performance.

    Buffering writes makes targets less transparent and more dangerous. Requiring read buffering means that target performance goes down dramatically if the target can't do it, either because it can't predict the necessary readaheads pattern or because it's run out of spare memory.

    (I am especially worried about readahead prediction because we will be using this for NFS servers that are used by a lot of people at once, so the targets will see what looks like random IO. I do not expect target-based readahead to do at all well in that situation.)

  • because AoE uses such small requests and replies it must send and receive a huge number of packets a second to get full bandwidth. For example, on a normal Ethernet getting 100 Mbytes/sec of read bandwidth requires handling over 200,000 packets per second (about 100,000 pps sent and 100,000 pps received).

    This is a problem because most systems are much better at handling high network bandwidth than they are at handling high numbers of packets per second. (And historically, the pps rate machines can handle has grown more slowly than network bandwidth has.)

The packets per second issue probably only really affects reads; there are few disk systems that can sustain 100 Mbytes/sec of writes, but it is not difficult to build one that can do 100 Mbytes/sec of reads.

(And the interesting thing for us is to build a system that will still manage to use the full network bandwidth when it is not one streaming read but 30 different people each doing their own streaming reads, all being mixed together on the target.)

I find all of this unfortunate. I would like to like AoE, because it has an appealing simplicity; however, I'm a pragmatist, so simplicity without performance is not good enough.

Sidebar: the buffer count problem

There's a third, smaller problem. The 'Buffer Count' in the server configuration reply (section 3.2 of the AoE specification) cannot mean what it says it means. The protocol claims that this is a global limit, that it is:

The maximum number of outstanding messages the server can queue for processing.

The problem is that one initiator has no idea how many messages other initiators are currently sending the server. So this has to actually be the number of outstanding messages a single initiator can send the server, and it is the server's responsibility to divide up a global pool among all of the initiators.

(In practice this means that the server needs to be manually configured to know how many initiators it has.)

AOEProtocolProblems written at 23:39:52; Add Comment

2007-07-05

What OpenID is good for

Given my earlier entry that talked about OpenID's limitations, one might wonder what it's good for. There's a number of good uses for it that I can think of:

  • it's a better signature than just asking people to leave their name or their website, in that it's harder for other people to forge and it does more to tell readers who they are.

  • an OpenID identity makes a better password than yet another piece of text that both you and the user have to keep track of, and if you want to you can also use it for the user's login, so that they don't have to keep track of that either.

    This means that you can make a very lightweight user registration system by just asking people for their OpenID; they don't have to go through the hassle of coming up with an email address for you, or a login name, or a password.

    (Of course one motive for collecting people's email addresses when they register is so you can later email them marketing stuff, but this is one big reason why people are so reluctant to give them to you.)

  • you can easily give accounts on your website to specific people who have OpenIDs, especially if you know their OpenIDs already.

Another way to put this is that OpenID means users have to keep track of fewer identities; instead of a mostly separate identity per website, they just have an OpenID identity. At the extreme, websites don't even ask you to register, they just give you opportunities to naturally use an OpenID, give you a cookie to keep track of it, and then start offering you additional features as long as you're 'logged in'.

(The clever way would be to give you two cookies, one for your session and one long-term one that just marked that you had given the website an OpenID at some point. Then later if you visit with only the long-term cookie, the website can show you a more prominent 'log in here with your OpenID to re-establish all your personalizations' login box.)

OpenIDUses written at 23:26:25; Add Comment

2007-07-04

What OpenID is (and is not)

Put simply, OpenID lets you prove that you are associated with a URL. More specifically, it is a protocol for letting your website ask the remote URL if some visitor is associated with it.

This neatly points to the issue with putting too much weight on someone just having an OpenID: you have no idea how the remote URL makes that decision. It is perfectly possible to create an OpenID server that always says 'yes, that person is associated with me' when asked, and in fact it's been done.

This means that an OpenID in general is only a weak identity; anyone can have one or many and a given identity may have any number of people using it, much like a website login posted to bugmenot.

(This is ultimately why LiveJournal considers 'people with an OpenID' to be in the same class as entirely anonymous users, because they are. Someone with an OpenID has just gone to slightly more work than the completely anonymous people.)

If you want stronger identity information about people, you need to restrict what sorts of OpenID remote URLs you accept, because then you can know more about the policies those URLs use. The ultimate case of this is using known OpenIDs to identify specific people instead of forcing them to get a new identity on your site.

(As has been noted by Simon Willison, you may still want to ask people to register, but OpenID can save them from having to make up a new account name and password for you.)

WhatOpenIDIs written at 23:03:16; Add Comment

By day for July 2007: 4 5 15 22; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.