|
2012-01-30 HTML is not a SGML dialect and never really has beenThere is a persistent story that makes the rounds among the web specification world (for example, in this otherwise realistic article on XHTML) that HTML is a SGML dialect but web browsers persistently mishandle and mis-parse certain SGML features such as minimization. Although I have pandered to this belief before, it is false in practice and in reality. HTML is really a documentation standard; the standard followed behind existing practice, not preceded it. In the very beginning, people just created browsers and a vague format that the browsers understood. This format was inspired by SGML, but it was never an SGML dialect and as such it never had various obscure SGML features. At some point, when people in the W3C were writing down the HTML standard of the time (or perhaps evolving it), they decided to 'fix' this obvious omission by writing into the new version of the HTML specification that it was a SGML dialect. (Looking at the historical specifications via wikipedia, this appears to go as far back as HTML 2.0.) You can guess what happened next. All of the browsers of the time promptly ignored this new bit of the standard, and pretty much every browser written since then has as well; none of them ever parsed HTML as SGML, supporting all of the little odd SGML features that that implies. HTML may be an SGML dialect as far as the W3 standards and their validator are concerned, but it is not in real life and anyone who writes HTML believing otherwise is going to have problems. As you might expect, HTML5 very firmly puts a stake in this particular issue; the current spec draft says explicitly (emphasis mine):
Perhaps someday all of the common HTML validators will be updated to understand HTML as it really is.
2012-01-19 How not to do repeated fields in web formsThere's a certain sort of web form which really wants to make sure that you've entered something correctly, so they ask you to enter it twice in two different fields. You've probably seen this in some web form sooner or later; this is the 'please enter your password again in this field too' or 'please re-enter your email address' field. I tend to think that this is bad on its own, but I've now seen an even worse implementation of this basic idea, which I'll call an anti-confirmation field, one that's practically designed to create errors. What the people behind this did was quite simple: they made it so that their second fields would not accept pasted input (probably using JavaScript, which I had on because I didn't feel like finding out which bits of the registration process required it). I had to retype both my email address and my password by hand, which was especially annoying because I was pasting both of them from elsewhere. I call this an anti-confirmation field because of course retyping things by hand is more error-prone than pasting things in; in fact, I twice made a mistake retyping the password. (My web password for this site was a strong random password, as usual. Random jumbles are hard to transcribe accurately by hand, especially when they jump back and forth between character case.) I suspect that the website designers justified this by saying that they were worried about people entering a bad email address by hand in the first field and then 'confirming' it by just cutting & pasting it into the second field. However, even at its best this logic doesn't work for password fields since browsers don't let you copy the plaintext content of a password field once you've entered it. I also suspect that the designers do not have any actual data on how many genuine errors this prevents (versus how many artificial errors are created). Sidebar: how to measure the numbersAssuming that you've committed yourself to (anti-)confirmation fields in the first place, you just need to track field values across time when a submission fails because of mismatched fields. In a transcription error the first of the two fields will turn out to be correct (ie, the same as the final submitted value) and the second field will change. In a genuine error the first field will be different between the failed submission and a subsequent valid one. Doing this with email addresses raises basically no security issues. If you do this with the password field you'll want to one-way hash them somehow in your tracking data. (5 comments.)
AntiConfirmationFields written at 22:59:00; Add Comment
2012-01-17 The first browser blinks on XHTML parsingI'm late to the party, but Opera has decided to stop strict parsing of XHTML (via Sam Ruby):
I have long said that draconian XHTML error handling is an unstable equilibrium and it would only last as long as all of the browser vendors didn't blink. Well, Opera has blinked; they've picked the user friendly alternative over the strictly standards compliant one (or semi-strict, since they apparently already offered an option to reinterpret the page as HTML, unlike eg Firefox). It now remains to be seen how long it will be before other browser vendors do the same thing. (I expect Firefox to be the last holdout because Firefox people are in some ways very user hostile in the name of doing 'the right thing'.) While this Opera blog entry was about a development snapshot, the announcement for Opera 11.60 mentions this as a feature of 11.60. So this is now out there in the wild in a general release browser. (Now I'm wondering if someone has or could make Firefox extension to do the same thing.)
2011-12-31 Why CA-based SSL is not likely to be replaced any time soonA whole lot of things have been written this year about better versions of SSL designed to get away from the many practical weaknesses of the current CA-based SSL model (you know, the one where CAs have terrible business practices, get compromised, and get leaned on by governments). Unfortunately, I don't think that any of them are likely to catch on any time soon, because of what is essentially the inverse of the false positives problem. The reality is that forged SSL certificates are in general a very infrequent occurrence. All (or almost all) of the proposed solutions to them are at least moderately difficult; they involve additional code, additional infrastructure, additional TCP connections during SSL connections, disclosure risks for what you're making SSL connections to, and so on. In short, the proposed solutions add pain. And the reality of the world is that taking on pain (possibly a lot of it) in order to solve a rare problem is not a successful sales pitch anywhere outside of the mathematical side of security and cryptography. Especially, browser vendors are going to be naturally unenthused about anything that makes SSL connections worse in practice (slower, fail more often, etc) in order to deal with a rare occurrence. The corollary of this is that any realistic replacement for CA-based SSL must be cheap and simple overall. My impression is that the only possible candidate for this is SSL certificate information in DNSSec-signed DNS records. This has the virtue that it needs almost no extra connections or queries, does not require any outside infrastructure, and does not disclose your browsing to third parties. It can also be deployed incrementally. (It has the drawback that it only works for sites that have a DNSSec trust path to the root. If your TLD has not gone DNSSec yet, you lose even if you're all ready yourself.)
2011-12-28 Blog entries should have visible dates (usually)There are any number of blog packages (and blog design templates) that don't make the date of entries very obvious; it's not in the URL, it's not at the top of the page, and perhaps it's hiding in small type down at the bottom of the page. I've come to feel that this is a significant mistake, because of a combination of real blog usability and how blogs are written. How blogs are written means that you probably never go back to revise and update old entries, even if the information in them is now out of date. However, many visitors are arriving at your entries through web searches and these searches will happily return those old entries, whether or not their information is still useful. So, how does a reader figure out whether or not what they're reading is still good or if it's now probably out of date? Their best clue is when the entry was written. This is why I've come to believe that blog entries should have the date visible 'above the fold', somewhere around the entry title and other details. I don't think it has to be prominent, but I do think you should be able to find it easily when you look. I also think it's possible to be clever about this. To view it one way, older an entry is the more important its date becomes. This implies that you can start out with the date not very visible, perhaps in small type and mostly faded out, and then on older entries you can progressively increase the date's visibility by doing things like fading it in and making its text bigger. How fast this should happen depends on how fast you think entries potentially go stale. A corollary is that some blogs don't need to do this at all, either because the content of their entries already makes it clear when they were written or because the content essentially never goes stale. My ire about blogs without dates is primarily directed to things like, say, blogs with technical information. (I'm aware that WanderingThoughts fails on this; the date isn't in the URL and is only visible down at the bottom in small sized text. I'm going to have to think about how to fix that, but it's potentially complex in DWiki's architecture.)
2011-12-14 Practical issues with REST's use of the Accept headerIn full blown REST, a
single resource can have multiple representations (formats). You
decide which representation to ask for and to serve by what is in
the HTTP The issue I have with this in practice is that it makes the Accept header part of the name of the specific representation of the resource (okay, this is really the MIME type). You cannot talk about, for example, 'the URL of the Atom feed'; you must specify both the URL and the type when you are talking about it. By 'talk about' I mean more than just 'share with people'. Any time you want to retrieve a specific representation of the resource, the tools you use to do this must be told the type as well as the URL, or must default to the right type. (And if the tool doesn't support specifying the type or has an awkward procedure for this, you lose.) This is a problem. To start with, we simply don't have an agreed on notation for 'a URL in a specific type' in the same sense that we have a notation for 'an URL'. Names and notation are important, and lack of good notation is a serious drawback because (among other things) it gets in the way of communication. Communication between people, communication between people and programs, and communication between programs (and even inside programs). Of course, REST has a good reason for doing this. From my outsider
perspective, the problem REST is solving is auto-discovering how to
retrieve a specific representation of a resource. If you have a base
URL, in a pure REST environment you now know how to retrieve any
representation of the URL that's available. Want a JSON or an Atom
feed version of URL <X>? Request <X> with an Of course this raises questions in practice about what the equivalent of a URL is in another representation. For example, what is the proper Atom feed representation of the front page of a blog that shows only the most recent five entries: is it an Atom feed with only the five entries on the front page, or is it a full-sized Atom feed with more entries? I suspect that the proper REST answer is the former while most people would consider the latter to be more useful. I'm a pragmatist. I care a lot about clear names and easy communication,
and I live in a world of imperfect tools (many of which were designed
before the (One comment.)
PracticalRESTAccept written at 23:51:44; Add Comment
2011-12-03 Why I have comments hereRecently I read Comments Off (via) and what struck me about Matt Gemmell's framing of the issue is that he seems to view having comments as something that you do for your blog's readers. What this says to me is that Matt Gemmell and I have significantly different views of comments. Let me tell you a little secret: I don't have comments here for my readers, at least not primarily; I have comments here for me. If that ever was to change, if I stopped feeling that comments were a benefit to me instead of just (possibly) my readers, then I'd stop having them (either quietly or not). (If I had comments primarily for my readers, I would probably structure things here to make them more prominent.) There's several ways that comments are a benefit for me. To start with, I enjoy reading them. People's comments here teach me things, they give me ideas, they correct my mistakes, they keep me on my toes and in general keep me honest, they're periodically entertaining, and they make me change my mind every so often (that's a random example). On top of that, I have received a few comments that are hugely valuable in their own right, to the point where I feel they have pretty much justified the entire effort to implement and manage comments here all by themselves. (As examples, a comment led me to pca, which has for years been the only sane way to manage patches on Solaris. Another comment introduced me to dmenu, which in less than a year has significantly changed how I use my long standing environment. Both of these changed my computing life for the significantly better and I doubt I'd have found either on my own.) If you don't feel that comments on your blog are there for you, if you're really only supporting comments for your readers, then I tend to agree with what Matt Gemmell says. I think that it's hard to arrange a setup where comments really benefit your readers, and doing so takes a lot of work and a certain amount of luck (plus you're probably going to have to become a community manager in addition to a blogger). In today's world, not supporting comments in this situation makes a lot of sense. (To be very clear: I don't think that comments here degrade the experience for my readers. I just think that they're probably a neutral thing because I expect that most people don't read comments. And I can't expect them to; unless you go to a lot of effort to build a community that people find appealing, people generally are going to be coming to your blog because they want to read your writing, not other people's. (And I am no exception to this.)) However, as I've written before I do feel that there are practical reasons that people like comments, although maybe those reasons are lower in the modern web of Twitter and Facebook and so on. (Matt Gemmell is clearly aware of the downsides of his decision, as shown by the followup email exchange he has at the bottom of his entry.) Sidebar: an honorable mentionIn the 'comments that justified having comments' category: if I read weighty things more promptly I would be able to count the pointer to Russ Cox's first article on regexps that was left here. Instead I only read the whole series when the second and third parts appeared a couple of years later and made Hacker News. (2 comments.)
WhyCommentsHere written at 00:59:47; Add Comment
2011-11-28 The login name problemI have been vaguely considering getting a Twitter account or two for a while, but so far haven't done so. As before, the big stumbling block is that Twitter makes you pick a username and none of the ones that I find even vaguely attractive are still available. My usual login name is taken, as normal (it goes fast on many services). So is my last name (probably by a relative), my first name, a variant I sometimes use, a variant of my last name, and I don't feel like going on any more (it's too depressing). I can find only one vaguely appealing variant that isn't already taken (and I'm not going to say what it is). Part of the problem is that people on Twitter use your username, which places a premium on short and memorable ones (especially given the size limit on tweets). But beyond that it's not just that Twitter requires me to pick a username per se, it's that it requires me to pick a public, more or less permanent identifier for myself. This is a fundamental problem because, as always, good names are hard. They are hard for people to come up with and there's only a limited supply of them. (Twitter apparently allows you to change your username, but I suspect that that orphans your old Twitter URL and probably confuses people who knew you under the old name.) Doing this is generally not actually necessary in most web services. Flickr, Facebook, and Google Plus (despite its serious flaws) all get this right; you can start using each service without creating such a permanent identifier. Oh, Flickr and Facebook have optional permanent IDs (and G+ may as well someday), but they really are optional; you can use the service for years (even as a paying customer) without having to commit yourself to one, and everything works fine. The most that happens is that the URL for your stuff is somewhat uglier that it could otherwise be. (To be fair, all of these services make you give (or pick) your name and Flickr makes you pick a username. However, you probably already have a name you want to use and you can change all of this stuff if you want to. I've seen people rename themselves on Flickr all the time, not infrequently to add temporary status messages.) The usual reason to force your users to pick login names is to generate URLs for them. However, Flickr shows that this isn't necessary; you can generate ugly URLs for now and let users improve them to nice URLs later when they make up their minds. Flickr even has convenient ways of referring to people who have not done so. (To be fair, what differentiates Twitter from Flickr here is that Twitter wants people to be able to enter tweets as essentially plain text from outside itself; Flickr is content to require you to use its special markup to refer to other Flickr accounts.) (One comment.)
TheLoginProblem written at 00:30:21; Add Comment
2011-11-05 Understanding Apache's Allow, Deny, and Order directivesSuppose that you want to add some IP access restrictions to your web
server, and you're using Apache. Apache supports this with its The first thing to understand about Thus we get the template for denying bad sources: Order allow,deny Allow from all Deny from BADIP1 Deny from BADIP2 And the template for selectively allowing some sources: Order deny,allow Deny from all Allow from 127.0.0.1 Allow from GOODIP1 If you are a firewall person you are now wondering what the default
policy is if there is no explicit match with either an The default All of this is in the documentation for
2011-11-04 More on my Firefox 7 extensionsRather than try to answer a number of comments on my original entry in more comments, I'm going to promote my replies to an entry (or more than one), and along with it I have some updates on my extensions due to things I discovered because of the comments.
I actually like sending Referer information. To my mind it's the right thing to do from a social perspective; it's basically a form of giving credit where credit is due. There are occasions when I want to suppress it, but they're rare (and I have a manual workaround when this comes up). This probably makes me a peculiar person, and certainly Referer is increasingly degrading in the face of the modern social web (but that's another entry).
For me, the most important feature of the old status bar was that it displayed a great deal of a link's target in a way that did not overlap with the content text. In the process it promoted content readability by creating visual separation between the content and the end of the window without using up too much space (less than a line of text in my usual font). This is not available in the new Add-On bar, and thus as far as I'm concerned makes the Add-On bar mostly a waste of space unless I really need to get at an addon's controls. (This is why I called it an effective disappearance.) Which leads me to a new essential extension, courtesy of another commentator: Status-4-Evar restores the old display of link targets, among other features. I believe that the status bar is now slightly taller than it used to be, but I can live with this since I can actually see where links are going once again. A useful page load status is nice to have back too. (I care about this more than you might think.)
There's two parts of this. One way to put it is that I consider the all or nothing nature of NoScript to be a feature. Before NoScript even existed I worked with JavaScript entirely off, turning it on only when absolutely necessary. Thus for me NoScript is a way of conveniently temporarily turning on a limited amount of JavaScript, instead of (temporarily) turning on all of it. I don't use the default NoScript list of permanently whitelisted websites, as I consider it to be far too permissive. (I just checked my prefs and right now my permanent whitelist is YouTube and some internal sites at work. I think that I can trust the latter. The former is just laziness.) At the same time Ghostery does look interesting, because it stops a lot more than just JavaScript. Unfortunately I think it's too noisy for me to use, because it really wants me to pay attention to it so that it can horrify me with how much I'm being tracked. Well, I already know that I'm being tracked a lot; I just want not to be tracked. (Even in 'status bar only' form Ghostery keeps changing what it looks like by displaying a count of issues. That's too noisy; it should just alter the icon a bit to show 'no bugs' versus 'some bugs, you can pull up the menu if this surprises you'.) Another commentator suggested doing cookie management by forcing almost all cookies to be session cookies (except whitelisted ones). This is an attractive notion for people who close their browsers all the time, but the problem for me is that I strive to keep a single Firefox session running for months. 'Session' cookies thus would persist for potentially far longer than I want. (In practice most of my cookie management (ie, cooking discarding) is done in a filtering proxy. I mostly have a cookie management extension to deal with https sites and any cookies planted on me by JavaScript that I have to run.) Now, an update on cookie extensions. It turns out that there is a Firefox 4 version of CookieSafe, with a page here and the actual source here (it requires manual installing). Since this appears to work and I found a number of limitations of CookieMonster once I started really looking at it, I've now reverted back to this version of CookieSafe. (2 comments.)
Firefox7ExtensionsII written at 01:29:48; Add Comment
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |