Wandering Thoughts archives

2007-01-29

A gotcha with <textarea>

Textareas are one of those treacherous areas of web programming, because it is really easy to get them 95% right and then never notice that you've fumbled the remaining 5%. The problem area is textareas with initial content, for example blog comment previews; what almost completely works is to just put the raw content into the textarea in your HTML. This approach makes intuitive sense and even works fine if you test by throwing markup in, like '<h1>this is a test</h1>'.

There are only two noticeable problems with this, both of them obscure:

  • any valid entity references in the text will be decoded to their real character values, so '&lt;' turns into '<'.
  • if there's a literal '</textarea>' in the text, it will become the end of the textarea (and your page layout may explode).

Since most people using your website don't do either of these, the simple solution works almost all of the time.

The real problem is that people develop the wrong mental model of what <textarea> does. They think (just as I thought when I started to write DWiki) that <textarea> means 'the browser treats as completely literal all the initial content that I insert here'. The defect with this mental model is exposed by putting a '</textarea>' in the initial content you insert into a textarea: how is the browser supposed to tell the </textarea> you inserted (that it is supposed to ignore) apart from the real </textarea> in your HTML that closes the textarea? The answer is that it can't, and thus that the mental model is wrong.

What is actually going on is that browsers treat the contents of <textarea> as what the HTML 4.01 specification calls #PCDATA: 'document text', in which character entities are allowed and interpreted (technically markup is forbidden; in practice browsers treat it as literal text). It has to be this way; since HTML has no other quoting mechanism besides character entities, allowing character entities is the only way to escape your inserted '</textarea>' so it doesn't terminate the textarea.

This means that you need to quote at least some things in your textarea initial content; minimally '&' and '<', but if you already have a general HTML quoting function (and you should), just use it and be done. (The browser will strip this quoting when it creates the actual initial contents, and thus you will get back the unquoted version when the user POSTs for the next round.)

TextareaGotcha written at 23:57:52; Add Comment

How to have your web spider irritate me intensely (part 2)

In the spirit of previous cleverness, here's a simple new technique:

Have your web spider make up random Referer headers.

This wasn't Referer spamming, since the websites in the Referer headers were completely random URLs, apparently drawn from legitimate sites around the Internet (often repeated). Nor were the websites ones that actually linked to us, or had any relationship to the URLs that were being crawled.

Even in low volume this is a sure-fire ticket to our kernel level IP filters, since it insures that we're mostly unable to get anything useful from our Referer logs without a lot of additional work and is therefor deeply irritating.

Today's offender is the IP address 212.52.80.101, which is an unnamed iol.it IP address; it is using a User-Agent value of 'Mozilla/5.0 (arianna.libero.it,ariannaadm@pisa.iol.it)'. It does seem to have requested robots.txt, but of course the User-Agent string gives no clues as to what User-Agent setting in there will turn it off. Ironically it appears to respect nofollow, unlike many other better-behaved web spiders.

HowToGetYourSpiderBannedIV written at 12:56:32; Add Comment

2007-01-28

Why DWiki doesn't use fully REST-ful URLs

REST is a style of web application writing where, among other things, you use simple structured URLs to represent resources instead of heavily parameterized ones. For example, 'http://example.com/users/cks/' is a RESTful URL but 'http://example.com/users?name=cks' is not.

(RESTful URLs are virtuous for a number of reasons, including being less alarming to search engines and being simpler, so it's easier for people to remember them and pass them around and use them. Non-RESTUful URLs map more directly to what the web application is actually doing, so the application doesn't need to decode and crack apart the URL to determine what to do.)

DWiki URLs are mostly but not entirely REST; things like the oldest 10 blog entries are '.../blog/oldest/10/', but actions like adding a comment use URLs like '.../Entry?writecomment'. I chose to use URL parameters for handling actions because that way I could guarantee there never would be a name collision between an action and the name of a real page.

This name collision issue comes up because a fully REST approach overloads the URL; it both names a resource and specifies what you want to do with it. If a given URL can have both sub-resources and things done to it, you have a potential for name collision, and either way you lose. Since at least some DWiki URLs have this potential problem, I opted to punt and go with explicit URL parameters for actions. (Well, usually. Logging in to DWiki uses a synthetic page with a name that can never be valid. I could have given actions similar illegal page names, but that would have made their URLs look ugly.)

For URLs that are more user visible, like '.../blog/oldest/10/' and '.../blog/2007/01/', I decided that I wanted pretty URLs more than I wanted to avoid the chance of name collision. Since these are only alternate views for resources that you can get at already, they just turn off if there's a name collision with a real page.

In hindsight the one blemish in the action approach is that 'show page with comments' is an action, but is for something that users will routinely see (and thus see the uglier URL). Since only real pages (not directories) can have comments, it would have been unambiguous to use REST URLs like '.../Entry/withcomments' instead of the current approach of '.../Entry?showcomments'.

(As a corollary, any action that only applies to real pages could be done that way. But I prefer to keep action handling uniform, even at the cost of somewhat uglier URLs.)

RESTNameCollisions written at 23:03:30; Add Comment

2007-01-20

Browsers are the wrong place to report HTML validation errors

A popular idea for dealing with 'malformed' HTML is to have the browsers warn users about it (the most recent example I've run across is in comments here), on the theory that this will cause authors to make their HTML validate. Unfortunately, doing this is about as useful as showing error pukes to website visitors, and for the same reason: it is reporting the problem to the wrong person.

Almost everyone visiting your site is a visitor, not the site's author. It follows that almost every time this hypothetical 'page is malformed' error would go off it would go off to a visitor, who can't do anything about the problem, instead of to the site's author (who can).

The usual retort is that the site's author can visit the page as the final step in publishing and see the warning and do something about it. This is a marvelous theory, but (I argue) incorrect in fact, in part because it assumes that site authors actually bother to check their work, and in part because it assumes that site authors are going to notice a little status notice any more than they notice any of the other little broken things that they let slip by now.

(And if site authors do care about validated HTML they are probably already using one of the validation tools to check their pages, and this feature would not be a particularly big bonus to them.)

This is also a terrible feature from a pragmatic user interface point of view: on today's Internet, it would be the boy who screams wolf all the time, because a rather large number of the pages out there do not pass validation. Such a warning notice would be on a lot; if it is intrusive it gets in your face almost all the time (about something you can't do anything about), and if it's not intrusive it's pretty much a noisy waste of space. This is not a winning user interface element.

(But if you really want it, you can get Firefox extensions that do this.)

ValidatingBrowsers written at 00:29:18; Add Comment

2007-01-12

The easy way to get me to not comment on a weblog

It's simple: just don't offer any sort of comment preview option.

This insures that I will never leave a comment, no matter how much I have something I want to say. Without the chance to preview my work before I'm committed to it, I feel too nervous and it becomes too much of an annoyance.

This is one of those blog software things that really puzzle me. It is not as if blogs all format comments in the same way, so I would think that software authors would see the attraction of people being able to check that they got it right before they post their comment. (For bonus points, a number of the 'no comment preview' blogs I've run across also have no explanation of how your comment text will get formatted, or how to include links, or etc.)

This isn't the only reason to want to see a comment preview, of course. In many cases, comment preview is the first time you see the whole text of your comment at once, since most comment form textareas are dinky and grow scrollbars once you have more than a little bit of text (which make it much harder to get a sense of your comment as a coherent whole, since you have to hold everything out of the scroll area in your mind).

Another effect I've personally experienced is that it's hard to judge how big a paragraph or block of text will be in a comment textarea, because it is in a different font than the final version (textareas are all in monospace, and often in a different size). What feels right in the textarea not infrequently turns out to be too small and too little text when I preview it. (This can be exacerbated by links and the like, which take up extra space in the source but not in the final result.)

(And I just plain benefit from being able to step back a bit and reread my draft comment; often what looked good when I wrote it is not so hot once I preview it. I revise my draft comments a lot.)

While I have a rather limited sample, looking at comment form usage patterns here on WanderingThoughts suggests that I am not alone in this; a number of commentators here seem to go through multiple preview drafts. (Not all of them, though; some are more write and go people.)

WeblogNoComment written at 23:30:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.