2009-10-27
A personal experience of web browsers making bad text editors
I wrote about this topic ages ago, but I just had a very illuminating personal experience with this, and I feel like sharing it.
The one place that I pretty much have to use web-based editing is writing replies to comments here on WanderingThoughts (unless I turn a reply into an entry, which I do from time to time). Today I was flailing away trying to write such a reply (on yesterday's entry), and I just couldn't make things come together right; I knew generally what I wanted to say, but every time I started typing away it didn't come out right. And I didn't have undo, which was cramping my writing style for reasons that don't fit into this entry.
So I gave up. Instead of fighting the comment textbox, I iconified the
browser, opened up a terminal window, fired up 'vi /tmp/scratch', and
just drafted my comment in there. When I had it done enough, I copied
and pasted it back into the browser's comment textbox.
The end result is the most substantive comment I think I've written for some time. And writing it felt like it was a lot less effort than writing substantive comments here usually is.
You would think that this would not surprise me, but I said, this was an illuminating experience; I had not expected the drawback of writing in a web browser to be quite so personal and blunt. One of the things that I'm taking away from this is that it's too easy for friction to be invisible. Another thing is that I need to look much more seriously at the Firefox extension It's All Text.
(For regular people, looking at Firefox extensions is easy. For me, not so much for some of them; my primary Firefox is still an ancient, personally compiled version that I stick with because it's the last version that supports bitmapped X fonts, and I am in love with a particular X font for browsing. Someday I will have to surrender and change browser fonts, but not today.)
2009-10-17
Automated web software should never fill in the Referer header
Yesterday, I noticed that Yahoo Pipes does a really irritating thing: if
someone has asked it to pull a syndication feed, it puts the Yahoo Pipes
info page about the feed into the Referer header of its feed requests.
Wrong.
I feel very strongly that no automated web software should fill in
the Referer header, ever. In practice and custom (if not in the
spec), Referer has a very well defined meaning; it is there to tell
webmasters where a real human visitor came from. If you do not have
a real human activating a link right then, you do not get to fill in
Referer. In practice, this means that if you are not a web browser
(and Yahoo Pipes is not), you do not get to ever use Referer.
Why not? Simple. You do not get to use Referer because doing so
makes the lives of webmasters harder. If software sprays irrelevant
and inaccurate information all over Referer, webmasters have to work
harder to remove it when they look at and analyze their logs. Making
webmasters work harder irritates them, and it's also pure wasted time on
their part.
Yes, it's useful to tell people this information. However, like other
people who want to convey this same information, Yahoo Pipes should
put it into the customary and appropriate place for it, that being the
User-Agent field. All sorts of feed aggregators and fetchers already
put this information there, so YP would have lots of company. The
potential argument that this makes the information harder to extract is
incorrect; since this use of Referer is not standardized, webmasters
need custom parsing code to extract the information regardless of where
you put it.
(Considering that YP uses a User-Agent field whose entire contents are
'Yahoo Pipes 1.0', they have lots of room to add other things. Like,
say, the URL of an an overall information page about their software
agent and how it behaves.)
(I have written about this before, but that was only in the context of web crawlers, not general web software.)
2009-10-13
There are two different uses of conditional GETs
There are two different (although overlapping) uses of conditional GETs, especially the HTTP ETag header; reducing bandwidth, and reducing your computation. A successful conditional GET always reduces bandwidth usage, and it may let you skip doing expensive operations to compute the request page.
Reducing bandwidth is useful in general because it improves the user experience (although these days there is evidence that the big time hit comes from the number of requests, not their size), but it probably doesn't help your servers very much; most websites are not constrained by their outgoing bandwidth (although cloud services may change this). Reducing computation helps your servers for the obvious reason.
This implies that you need to think about what your goal is before
you start implementing conditional GET in your application. The most
straightforward and general ways of computing ETag values are all
post-facto approaches (you get the ETag value for a page as part
of generating the page itself), but obviously these only reduce your
bandwidth, not your computation.
If your goal is reducing your computation, you need to be able to check
(and possibly generate) ETag values with as little work as possible;
this implies things about your internal application architecture and
how you work out ETag values. For example, the traditional black box,
highly dynamic page templating system is not very suitable for such an
environment, because it's hard (or sometimes impossible) to check if
it's going to give you the same output without running it.
(The other obvious approach to fast ETag checking is a cache with
explicit invalidation, but then you have to track which pages are
changed by some low-level change, and this too has implications
for your architecture.)