2013-01-27
How the modern web 2.0 social web irritates me by hiding discussions
Recently an interesting discussion of my entry on what systemd gets right broke out on Google+, including both interesting stuff and some things that I want to respond to. What sucks about the modern Web 2.0 social web is that I found this discussion basically only through luck.
Oh, I knew that someone on Google+ had linked to my entry and had
a bunch of readers; I could see the Referers from plus.url.google.com
come rolling in in my web server logs. But in common with a lot of
other Web 2.0 sites the Referer values were of absolutely no use
to backtrack to the actual discussion; they
were encoded and basically generic (if I visited one of them I wound
up on a little interstitial 'you are about to visit an outside
website, are you sure?' page).
This is more severe than my earlier irritation with Twitter about this in that G+ is actually hiding a real discussion from me (okay, people have real discussions on Twitter too but perhaps not quite as much), a discussion that in many other circumstances might have happened in the comments section of my entry (where I could directly see it and address it). Of course this is partly the result of a deliberate design decision on G+'s part; G+ wants you to have your discussion on G+, not on some outside site. From Google's perspective what happened here is not a bug but a feature.
(This is unlike a similar issue with Facebook because the discussion here on G+ is public, not private.)
This is nothing new, of course. I just feel like grumbling about it since I was so directly reminded of it.
Sidebar: How I found this discussion
Someone in my Twitter stream had linked to an earlier Lennart Poettering G+ post that reacted to the whole 'FLOS vs Unix' thing and mentioned systemd in relation to that. It struck me that if anyone on G+ was going to link to my entry at this point it might well be Poettering, so I backtracked to his G+ page and there it was. Had his earlier post not appeared on my radar I would have had no real clue.
(I knew that Poettering knew about my entry because he left some comments on it.)
2013-01-16
How I drafted (okay, wrote) an entry in public by accident
Since I tweeted about this recently, I might as well explain myself.
Sometimes there are small drawbacks to the perpetually popular file based approached to blog engines. One of them is the question of how you handle draft entries, ie entries that you're in the process of writing and that you aren't yet ready to publish. The hairshirt approach is to not do anything about them at all; your blog is only for published entries and you have to write drafts entirely outside of it. This is simple but has one to three drawbacks, depending on whether or not you write in HTML.
If you write in HTML, it only has the problem that links to other entries you've written probably have to use a totally different style in your draft than what would be ideal in a published entry. How much different depends on where you draft and preview your entry. If you write in a simple markup language you also have the problem of how you render your draft into HTML form (a job that the blog engine usually does for you one way or another). If you write in a wiki-language that has short names for links within your site you may have a third problem of getting those links to resolve properly in the rendering process.
Thus it's very attractive to have a private area of your file based blog where you can write drafts 'inside' the blog. This handles the rendering (if necessary) and displaying for you, allows you to use the exact same content and markup that you will use for the published entry, and with some moderate magic can resolve all links correctly. Wandering Thoughts is no exception; I long ago created a sort of access-restricted drafts area within CSpace (the (slightly) larger wiki environment that contains the blog).
But this means that there's an important thing you need to do before you
type something like 'vi BlogspotWebFail', that being make sure that
you're in your drafts directory.
If you skip this step and you're outside the blog's directory hierarchy entirely, there's no particular harm done; you just won't get previews. But if you happen to be inside your blog's directory hierarchy, well, a straightforward file based blog engine will happily go 'oh, a file, this is an entry' and make it publicly visible. Bonus points are awarded if you happen to be drafting the entry within your blog's directory hierarchy but at a different place than the final entry will go.
This is what I managed to do by accident. As a result, I wrote this entry in public, in the wrong place, and of course a version (or perhaps several versions) of the entry propagated into my syndication feed and on to at least one planet site (and a phantom version of it is likely still there in some people's feed readers, partly since syndication feeds don't have any way of retracting entries).
(I might have noticed earlier than I did if I'd tried to preview the entry while I was writing it, but I just wrote it in a big burst. Instead I only noticed at the end when I went to spellcheck it in another window; since the other window was in the drafts directory, I got a 'no such file' error and then a sudden sinking feeling.)
So. Yeah. Sorry about that, for anyone who saw oddities in the syndication feed for a while.
2013-01-14
Why JavaScript (or something like it) is in demand
I've recently formed an opinion about why people keep wanting JavaScript, or perhaps I should say instead 'come to a realization'. Put simply, people want to increase the interactivity and responsiveness of their websites.
Let's rewind to the old days of basic HTML websites, which you have to admit are neither very interactive nor very responsive. You interact with such sites pretty much only by clicking on links (or form submission buttons) and they respond by giving you a new web page, which is usually both disruptive and not really fast; your browser window clears then fills in, things may shuffle around, your position on the page almost certainly changes even if the page looks more or less the same, and so on. This is workable but very limited and not particularly attractive; in many ways it goes against almost every tenet of good GUI design that we know.
(Part of the appeal of frames was that they were less disruptive this way because not all of the web page shuffled itself around when you clicked on links.)
Modern HTML and CSS improved this by adding support for a certain amount of standardized interactive behaviors and elements (eg you can now make pure HTML/CSS dropdown menus); because they're rendered in the browser as changes to page appearance they have basically instant responsiveness and don't require your browser to fetch a new page (with all that that implies). But this support can only go so far because in practice there is a fundamental conflict between simplicity and comprehensiveness (and thus flexibility) in specifying interactivity. The more options for interactivity you offer to people, the more complex the descriptions of those options need to be. Fully general interactivity requires a very general way to specify the logic involved, to describe what happens and when.
A very general specification of logic is called 'a programming language'. You cannot have general interactivity without a general language. The more restricted you make your language the less interactivity you can create. If you want as much interactivity as people can design, you need a general language. Hence, well, JavaScript.
(In this sense HTML plus CSS is a language, although an odd and rather restricted one.)
Note that this has nothing directly to do with browser vendors refusing to implement enough specifically hard-coded interactivity features in HTML and CSS. Even if they wanted to, they simply can't make HTML plus CSS powerful enough to cover the union of what everyone wants without turning it into a programming language. And a lot of language design experience suggests that doing so would actually suck badly (see, among other things, XSLT versus the programming language of your choice, and note that XSLT is Turing-complete).
(It also doesn't directly have to do with browser vendors not wanting to implement interactivity features at all. Implementing JavaScript plus all of its DOM and event hooks is implementing interactivity features, it's just not as visibly so as, say, new CSS properties. Implementing better JavaScript may also be a better use of limited browser engineering time because the browser developers can offload developing specific interactive behaviors to page designers.)
Now, you can have a vision of browsers as environments that should have deliberately limited interactivity and responsiveness and then argue from this vision that they should not support JavaScript and other ways of enabling 'too much' interactivity. There are even some reasonably rational arguments you can make for this position. However this argument is doomed to be very unpopular because people want to make pretty much everything more interactive and responsive (since that's a great way of improving the whole experience of using almost anything).
(This realization is undoubtedly obvious and well-trodden ground among the general web development community. I'm slow sometimes and I'm only an occasional tourist in this area.)
Sidebar: one potential rational argument against flexibility
The short version: common and unified interfaces. The more flexible you make interactivity, the more different websites can be from each other. You can argue that a great benefit of the browser over other interfaces is that everything has a common UI (especially in old school HTML) and that browsers should continue to act to enforce a relatively common interface by (among other things) limiting people's ability to create non-standard interactivity.
(Standard interactivity is offered through the browser's HTML and CSS, because the browser can control what you can do and how the result looks to give everything a uniform appearance and interface.)
If you want to make this argument you should be prepared to also argue against a number of current HTML and CSS design options because they do just as much damage to a common 'web UI' (if not more). For example, every CSS option that allows designers to confuse people about what is and isn't a link.
Sidebar: the purpose of CSS and HTML in this model
In this model, CSS (and HTML with it) has two purposes. The first is to be the rendering engine for the (JavaScript-based) interactivity logic. The second is to encapsulate sufficiently common patterns of interactivity into a form where you can express your intentions directly (and then the browser can execute them directly), much like how and why people add new functions to a standard library.
(As a consequence, if all you want is sufficiently common interactivity you can create it without needing JavaScript at all.)
My impression is that you can see both sorts of additions in modern CSS.
(I think I may have now horrified all of the designers who care about CSS. I've sort of horrified myself, honestly.)
Good JavaScript usage is a good thing
A commentator on yesterday's entry wrote the standard sort of anti-JavaScript note. Quoted in part:
[...] 99.9% of websites do not need scripting. (Perhaps 100% -- google maps should be a standalone program.) I am convinced JS is a make-work phenomenon by a large amount of underemployed web-monkeys, desperate to create "work" that is not needed. [...]
I disagree strongly. Despite my grumbles yesterday and despite my use of NoScript to mostly block JavaScript in my own browser, I am all for well-done JavaScript usage. There are two things that JavaScript is really good for.
The first and most obvious is enabling interactive web applications that would not be possible otherwise. Google Maps is a good example, as are a number of other highly interactive and responsive websites (some of them operated by Google). The second is for augmenting already functional websites with additional useful interactivity, a topic I have some experience with. There are any number of websites that do this (if you browse with JavaScript on you probably don't even notice this).
It is no accident that I've emphasized interactivity in both cases. The problem that JavaScript fixes is that without it your only real way of interacting with a web page and a web server is by clicking on things and generally getting a new web pages. Clicking on things is a fine way of getting stuff done, but the problem is that it is obtrusive, limited, and explicit. There is a universe of other ways of interacting with an application (eg clicking and dragging) and another universe of automatic actions that an application might want to take without requiring the user to explicitly invoke things with a click.
What's bad about JavaScript is threefold. First, people have used it to do evil things. Second, people have used it to create terrible, obnoxious interfaces. And third, what I ranted about yesterday, people have used it to destroy otherwise functional websites. But none of these are intrinsic in JavaScript itself. JavaScript is a tool that can be used well or used badly.
Sidebar: Why I disagree that Google Maps should be a program
I will put it simply: if Google Maps was a program, it's likely that I would not be able to use it and it's even more likely that people using FreeBSD would not be able to use it. The advantage of JavaScript applications is that they are more or less portable more or less for free.
(This is not just portability between operating systems. It is also portability between Debian, Fedora, Ubuntu, SUSE, and so on, and between old distribution releases and new ones.)
2013-01-13
Blogspot's massive web 1.0 failure
Once upon a time Blogspot was a popular independent blogging system (cf), one of many at the time. It didn't entirely prosper, so Google bought it and folded it into the massive Google empire. Somewhat recently (ie in the past year or so) Google started rolling out some changes that I happen to think are a terrible idea, and I've finally reached the point where I feel like ranting about them. Well, especially about the larger and most recent change.
Put simply, some Blogspot blogs actively require JavaScript in order to get any content at all. If you visit such a blog without JavaScript, you get basically nothing (okay, a list of useless links). This is not a case where the content is hidden or mangled until JavaScript sets up the page structures; this is a case where the content is not there at all until it's loaded with JavaScript. As such it's the very opposite of 'graceful degradation'.
(Making it worse for me is that such blogs don't work in my main Firefox for some reason even if I turn JavaScript fully on. I'm not sure why but the whole situation irritates me so much that I don't care about such blogs.)
This strikes me as extremely stupid. In my opinion it's harder to get more 'Web 1.0' than blogs; blogs are, well, text. And these days, pictures. It is not as if blogs are a Web 2.0 thing, like Google Maps or some complex interactive web application. What does dynamically loading all content on the fly through AJAX actually do for a blog, especially when you still give each entry its own URL?
(I'm sure it enables some fancy-dancy design tricks and so on. I'm just questioning if said fancy design tricks are actually worth losing graceful degradation through a great big middle finger to all of the people who do not see JavaScript-based content.)
Since there are still plenty of Blogspot blogs that don't work this way (for example, SysAdvent), I assume that this is some new blog 'theme' that Blogspot has rolled out and that (for now) people have to opt in to. Unfortunately I see an ever-increasing number of Blogspot blogs that suffer from this (which insures that I don't read their entries when they get shared on Twitter et al).
By the way, I'm aware that I'm an old fogie here and that 'everyone' (for suitable statistical values of everyone) has JavaScript turned on. I'm sure that Google, land of measurement and statistics, has plenty of numbers that show that a miniscule number of Blogspot visitors are no-JavaScript people.
(If you feel like a conspiracy theory, note that a bunch of money-earning Google features, including some parts of Google Adsense, require JavaScript. Google thus has a motive for pushing people to enable JS for at least Google's domains.)
Sidebar: the smaller Blogspot change
The first and smaller change was that all blogspot.com URLs started redirecting non-US visitors to a country-specific version of blogspot (for that URL); if you visited, say, fred.blogspot.com/entry from Canada you got redirected to fred.blogspot.ca/entry. Allegedly this is so that Google could more easily do country-specific blog blocking if they were legally required to do so. The two problems with this change are that it basically destroys URL history for people outside the US and that it results in a proliferation of URLs for the exact same content. Making the latter situation worse is the fact that if you visit a country specific URL you don't get redirected to the 'proper' version for you; you stay on that country specific URL.
(If you follow people outside the US on modern social networks and they share URLs with you, you may have noticed a proliferation of .ca or .co.uk or whatever blogspot URLs. This is why.)
2013-01-06
24 hours of Atom feed requests here
Because I'm interested in this sort of thing, I decided to generate some statistics on 24 hours of Atom syndication feed requests for Wandering Thoughts. Mostly I'm going to report the relatively raw numbers, although later I'll probably do detailed analysis of one aspect.
The big numbers:
- 2,671 HTTP requests, or one every 32 seconds if distributed evenly.
(They seem to have been reasonably evenly distributed through the day, although there are some shifts between hours; the peak hour was between 6am and 7am Eastern.)
- Requests were made for 95 different feed URLs. Mostly this shows the
hazards of excessive generality; there are a lot of web crawlers that
request Atom feeds for useless virtual directories and so on.
- The most popular feed is the main blog feed (2,010 requests), very distantly followed by the python category (185 requests), the tech category (72 requests), and so on.
The rest of this analysis is going to focus on the 2,010 requests for the main blog feed so that I don't have to worry about the effects of random crawler requests for random feeds.
Almost all of those requests were GET requests; 1,986 GETs to
24 HEADs. Every HEAD request was from Google Producer.
Given the documentation at that link, I have no idea why it's issuing
HEAD requests for my actual Atom feed (but good luck getting anyone
in Google to explain).
HTTP/1.1 was by far the most dominant HTTP protocol; there were 1,528 HTTP/1.1 requests to 482 HTTP/1.0 requests.
Broken down by HTTP response codes, there were 1,312 304's (content not modified) to 646 200's and 52 403 permission denials. All of the 403's were for what my code identified as bad web robots (which are not supposed to crawl my Atom feeds), and most of them (48) were from a single IP address (178.63.170.37) with a bad user-agent.
Those 2,010 requests came from only 200 different IP addresses,
although no single IP was a huge traffic source; the most active
single IP made 133 connections and many made fewer; 85 IP addresses
made only one request. 111 different IP addresses made requests
that got 304 Not Modified responses, of which 35 were IP addresses
that made only one request. This is not really great, since it
implies that about a third of the IPs that made multiple requests
probably don't implement proper conditional GET
support. I'm sad to see that the most prolific source of requests
also didn't seem to support conditional GET; all of its
requests got status 200 responses.
(Most of the other active sources seem to have gotten plenty of
status 304 responses, which is what should happen; if you're going
to poll an Atom feed frequently, you should implement support for
conditional GET.)
Of the 646 requests that got status 200 responses with content, 558 of them clearly support HTTP compression while 64 don't (I can tell by the response sizes). These no-compression requests came from 15 different IPs, several of which made multiple requests. A number of sources appear to be disguised web spiders.
(I am pretty certain that anything that claims to be running 64-bit Ubuntu with Firefox 8.0 is kind of shading the truth. A lot.)
In other trivia, it appears that the most popular User-Agent value for
people pulling the feed that day was 'user-agent' (sic). I think the
sources using that value are actually probably legitimate, at least for
that day. They didn't do conditional GET, though, which makes them
somewhat annoying. (Both different IP addresses did do HTTP compression.)