2015-10-02
What creates a good wikitext dialect depends on how it's going to be used
One of the things I've learned over the course of writing DWiki and then using it here on Wandering Thoughts is that what makes a good wikitext is partly context dependent. In this case, 'context' means 'what sort of things the wiki is going to be used to write about'. This comes about because of two things; the rule that a good wikitext dialect doesn't get in your way, and the limited supply of special characters that we have available to use as markup characters. Good wikitext markup is short, distinctive, and uncommon in your regular text. And of course what is common and uncommon in your text can depend on what you're writing about.
There are two corollaries of this. The first is whether something is a good or bad wikitext dialect for you depends on what you're going to use it for. I feel strongly that there is no more or less universally good wikitext dialect (and probably won't be until we embrace the use of uncommon Unicode characters for markup). The second is that what looks like a good wikitext dialect can turn to bad (or at least 'not as good') if what you're going to use it for changes over time.
As I've mentioned before, a
glaring example of this in DWikiText here is the _ character,
which is used to create monospaced computer text. I originally
created DWiki to write sysadmin
documentation. This has the twin features that you frequently want
to mark off literal computer input and output (conventionally set
in monospaced) and _ is a relatively uncommon character in
things like command lines and so on. Thus it seemed to make good
sense to use _ as a short, unobtrusive, yet visually distinctive
marker of such text. Then I took up blogging with DWiki and started
writing about programming and code; of course _ is extremely
common in identifiers, and this causes heartburn for both me and
commentators (errant _'s is in fact one of the most common
errors here and I've committed various hacks to try to sidestep
them in common cases).
(This origin is also why bold is created with ~~ instead of the
shorter single ~. Single ~'s occur periodically when you write about
Unix paths. Also, looking backwards I have to admit that it turns
out that there is a certain amount of sysadmin things that involve
_'s, because people put them in file names, kernel parameters,
and so on as word separators. Linux /proc is full of them, for
example. This makes _ a somewhat worse choice than it seemed
to me at the time, although I still don't know what I'd have used
instead.)
(This is something that I've sort of written about before as parts of other entries but that I now think is important enough to give a full entry to, just to make it explicit and visible.)
2015-09-30
There are good and bad wikitext dialects
One of the things about wikitext dialects that I've essentially asserted in passing is that there are 'good' and 'bad' dialects. Perhaps this raises your eyebrows, either because you think all wikitext dialects are a bad idea or because you don't see what creates such a difference.
There are two possible criteria for goodness in wikitext (or in general any simple markup scheme), depending on why you're using a wikitext. For a markup aimed at empowering beginners, what probably matters is how simple, straightforward, and error-free it is. This ties into your desired style, in that it should be both easy and obvious to create the kind of content that you want, marked up and styled in the ways that you want.
For a markup aimed at smoothing the way of frequent authors, what matters is how unobtrusive and smooth the markup is. The more that you have to spray large amounts of special text all over your content (and the harder they are to tell apart), the more the markup is getting in your way and the less it's giving you compared to just using normal HTML. This leads me to feel that good wikitext here either looks close to what you'd write in plain text or is simple and terse. The goal in both cases is to minimize the extra friction of adding and using markup.
(I feel that aesthetics matter here because you don't just write your text, you also read it as you're writing and revising. A wikitext dialect that is obtrusive or ugly winds up obscuring your actual content with its markup. If you can't easily read your content for flow in its un-rendered form, well, that's a kind of friction.)
A bad wikitext dialect is one that moves away from these virtues. It's obtrusive and verbose; it's complicated and perhaps hard to keep straight; it makes it hard to decode what the markup means at a glance (due to eg using a bunch of very similar markup characters). It gets in your way. It may make it too easy to put markup mistakes in your text or too hard to find and fix them. Overall, it contorts and distorts your writing process.
(My fuzzy view is that a wikitext dialect being incomplete doesn't necessarily make it bad. In the abstract incompleteness just makes a wikitext unfit for some purposes, but then if you're forced to use the wikitext anyways for these purposes it can turn into something that's getting in your way and thus is bad. I wave my hands.)
2015-09-19
Experimenting with Firefox's 'Reader' mode (or view)
In an exchange of comments on my entry on blocking floating web page elements, Nolan mentioned Firefox's Reader mode and I said that I'd had bad experiences with it and used 'view page in no style' instead. Well, let me take that back. Back the last time I looked at it there were various glitches and issues that convinced me I didn't want to use it (and in fact I banished it from my Firefox's by setting reader.parse-on-load.enabled to false in about:config). But this week, sparked largely by Nolan's comment, I've been giving it another try and it's improved to the point where it's actually reasonably decent now.
(If you use NoScript, as I do, note that you have to explicitly whitelist 'about:reader' or the Reader mode displays an almost totally blank page.)
Reader mode doesn't work on everything and it's not perfect, but what I'm finding is that it often does a more attractive job than the blunt hammer of 'view page in no style'. In part this is because Reader mode makes a whole bunch of page elements go away entirely instead of leaving me to scroll through a mess of unstyled menus and top bars and so on. And some of it is somewhat more aesthetic formatting than an almost plain text sprawl of text (which works well for some but not all web pages).
However there is one fly in the ointment; the Reader view styles unvisited and visited links the same, leading me to irritation. This is a real usability issue for some but not all sites, since it lets me distinguish between things I haven't read (and might want to go on to read) and things that I already have read. I've hacked a CSS modification for this into my custom Firefox build but for some reason it's not entirely successful; some visited links still show as if they're unvisited.
(The Firefox Reader CSS source is quite clear about this; all forms of links are styled the same, explicitly including visited links.)
For a fair number of things this doesn't matter because I'm highly unlikely to follow even unread links to explore more; this is typical of things like news articles. But if I've wound up on an interesting blog entry that has hideous styling (which is far more common than I'd like), I'm much more likely to potentially explore further and so I want that visited/unvisited marker.
(I note, apropos of ongoing controversy over the release of iOS 9 and the popularity of adblockers for it that Reader view also strips out ads. Since you have to load the page normally first, ad networks will still get to count a page load based ad impression and of course spy on you as much as possible. Unless you use an adblocker as well, which I think you should.)
Sidebar: What the unvisited-links problem may be in Reader view
I got curious and used the built in Firefox debugger tools to look
at the version of the links in the raw HTML in Reader view (you
can't just do 'view source', because the HTML is dynamically
inserted). I've been testing my patched Firefox on Wandering
Thoughts itself (it's an easy source of content with visited
URLs), and what my exploration suggests is that I may be running
into problems because Reader view is re-encoding my URLs to change
that /~cks/ part of the URL to /%7Ecks/. In theory this
should not make any different to whether or not they are visited
(the two URLs are the same), but in practice Firefox's history
system seems to consider them different URLs.
(This is where I sigh a lot at the combination of technical issues colliding here.)
The good news is that this is unlikely to be an issue on most sites that I want to use Reader mode on. If I'd used something else as a test case I might not even have noticed it at all.
2015-09-13
I should have started blocking web page elements well before now
uBlock Origin has a feature where it adds a 'Block element' option to Firefox's popup page menu (and I think other adblockers do too). For a long time I ignored it; after all, I had already had an adblocker (several of them, in fact), so what need did I have for yet another way of blocking ads, a manual one at that? Let me tell you, I was wrong. I was wrong about ignoring it, and wrong about what I should be using it on. I discovered this because I recently started using it and it's made a real difference in my experience of the web.
The secret is that the thing to use 'Block element' on is not ads, it's all of those floating bits and pieces that modern websites like to put on top of their content. 'Please subscribe', 'please tweet this', 'please see this other thing'; for some reason far too many websites are far too in love with little messages and calls to action of that sort. What makes these things especially irritating is that they don't run the full width of the page. This screws up page-based scrolling and acts as a constant partial intrusion into the side of the content.
(Conventional overlaid headers and footers run the full width of the page, which I usually find less obtrusive. They may make me unhappy by taking away content space, but they no longer cause problems with page-based scrolling.)
Before I started this experiment of blocking all of those floating elements I didn't realize how much they irritated me and degraded my site browsing experience for sites that they appeared on. Now that I'm aggressively getting rid of them, well, I've been surprised at how much of a difference it makes in how I feel about some sites. They seem to have been yet another one of those low level irritations that I don't realize were nagging away at me until I get rid of them.
(Perhaps they are less obtrusive for other people because other people browse with much wider browser windows than I do, so the floaters aren't necessarily over the content but instead over unimportant side elements. Or perhaps the people designing these sites just don't care.)
2015-08-23
I think you should get your TLS configuration advice from Mozilla
If you decide that you care about having good TLS support in, say, a web server and look around, there are a lot of places that will tell you all about what configuration you should have in order to be secure and widely available and so on. Old ones live on in their dusty now-inaccuracy (TLS configuration advice has a half life of six months at most) and new ones spring up every so often. Many of them contradict each other in whole or in part. The whole thing is one of the frustrations of good TLS in practice.
Given this, I've wound up with the strong opinion that you should be getting your TLS configuration advice from the Mozilla server side TLS configuration guide. It's certainly become my primary source of configuration guidelines and I've been happy with the results.
(Other worthwhile resources are the Mozilla web server config generator and the Qualys SSL Server Test. Note that I've seen some people disagree with the SSL server test's scoring of some things.)
The advantage of Mozilla's guide isn't just that it seems to be good advice. It has two important virtues beyond that, virtues that I feel make it trustworthy. First, it's actively maintained by people who know what they're doing. Second, it's such a visible and public resource that I think any bad advice it has is very likely to produce reactions from knowledgeable outsiders. Some random person writing an article with bad TLS advice is yawn worthy; there might be a little snark on Twitter but that's probably it. Mozilla getting it wrong? You're very likely to hear a lot of noise about that.
Other TLS configuration advice may be perfectly good, well maintained, and written by people who know what they're doing (although my experience leads me to believe that it often isn't). But as an outsider it's much harder to tell if this is the case and to spot if (and when) it stops being so, which makes using the advice potentially dangerous.
2015-08-02
My view on the potential death of the ad-supported web
Partly due to the impending release of iOS 9, a certain amount of angst has been written lately about the potential increasing future of adblocking on the web. One of the things often written in such articles is more or less 'how will you like it if widespread adblocking kills the ad-supported web?' Well, it's funny you should mention that.
I'm sure that in the short term I would hate the decline of the ad supported web. Like pretty much everyone, I visit plenty of ad supported websites every day and use a certain number of ad supported services like Twitter. Having them go away or become paywalled would be disruptive and quite unwelcome; it's already annoying enough when I follow a link and hit a paywall and the more often that happens the more annoying it would be.
But in the long term? In the long term I'd be fine, and such a shutdown would probably even be good for me. The reality is that essentially all of the ad supported sites I visit are diversions. They're entertaining and informative and amusing and above all absorbing, because that's what the modern web has driven such sites to be, but they're not essential or even important; they're just how I pass time on the Internet right now. These sites are very good at getting me to visit and to spend time on them, but while that is (currently) good for the sites that is not necessarily good for me. In many ways I'm a rat pressing a lever for an intermittent reward, even if the reward is fun; it's almost all a giant distraction that drains my time in little increments.
(This includes newspaper sites, by the way. Knowing the news, especially in detail and up to the minute, is not essential or even important for me or many other people.)
The sites and services that I really care about are almost entirely boutique products of passion, and they're mostly going to continue for as long as that passion lasts. Oh, some would die when their 'free' ad-subsidized hosting dried up, but the cost of hosting your own website has fallen to amazingly cheap levels today. A good number of the people who care would continue in various ways and forms.
The hard reality is that the Internet was a perfectly fine place in the days before ad supported things were a thing. The Internet inhabitants back then found plenty of ways to spend our time, just as we do today; they were just different ways. In fact many of those old diversions are still around today, lurking in the corners and ready to be revived if needed. To the extent that the Internet was less diverting in the old days, well, I got other things done, often more in-depth things than constantly following Twitter and other sources of chatter and diverting links. I wouldn't entirely mind going back to that world, even if I lack the willpower to move there on my own.
(If Twitter went away, for example, I'd expect several of my online communities there would wind up on IRC channels. Or someone would put together a boutique version of Twitter for the small community. What makes Twitter hard is not the basic features, it's the scale. Drop the scale and you can support a few thousand people on a cheap virtual server.)
By the way, I have to admit that all of this rests on the assumption that not absolutely all of the ad supported Internet will dry up, just a lot of it. I would be fairly badly affected if Internet search stopped existing, and that's ad supported. But I don't think adblockers have any chance of killing that, whereas they do have a chance of killing even things like major newspaper websites.
(People who use GMail et al are also probably fairly safe, and things like Github are not ad-supported. As for eg Firefox development, well, we'll have to cross our fingers there.)
2015-07-31
The future problem with Firefox's Electrolysis project
Firefox Electrolysis ('e10s' for short) is a project to push Firefox towards a multiprocess model like Chrome's. This is both a daunting amount of work and a praiseworthy goal with a number of benefits, but there is a problem lurking in the future and that is Firefox addons.
The direct problem is that any number of addons are not Electrolysis compatible for technical reasons. Firefox developers have partly worked around this with shims, but shims are an incomplete solution and can't make all addons work. Checking arewee10syet makes for depressing reading much of the time; a great many popular extensions are not working under Electrolysis (including NoScript, one of my critical extensions). It seems quite likely that a number of reasonably popular extensions will never be updated to be Electrolysis compatible and so people will be faced with a choice between not getting Electrolysis or abandoning them (the likely choice here being 'don't go e10s').
(The popularity of an addon has no relationship with the attention and spare time of its developer(s). There are any number of popular addons that have basically been abandoned by their developers.)
The indirect problem is that at some point Mozilla is going to want to turn Electrolysis on by default in a released Firefox version. In a straightforward version of the switch, some amount of reasonably popular extensions will partially or completely stop working. If people are lucky this will be obvious, so at least you know that you have a different browser now; if people are unlucky, the extension will quietly stop doing whatever it does, which is bad if this is, say, 'protecting me from some sort of bad stuff'. There are various things Firefox could do here to avoid silent breakage, like not enabling Electrolysis unless all your addons are known to be e10s compatible or warning you about some addons perhaps breaking, but none of the options are particularly good ones.
(Well, they're not particularly good ones if Mozilla's goal is widespread Electrolysis adoption. Mozilla could take the safe conservative approach if they wanted to; I just don't think they will, based on past behavior.)
When this future comes to pass, knowledgeable people can go in and turn off Electrolysis in order to get a fully working browser back (at least one hopes). Other people, well, I suspect we're going to see a lot of quietly or loudly upset people and Firefox is going to leak some more browser share as well as seeing some more people turn off Firefox automatic updates (with the resulting damage to security).
2015-07-18
Some data on how long it is between fetches of my Atom feed
Recently I became interested in a relatively simple question: on average, how much time passes between two fetches of the Atom feed for Wandering Thoughts? Today I want to give some preliminary answers to that. To make life simple, I'm looking only at the blog's main feed and I'm taking 24 hours of data over Friday (local time). Excluding feed fetch attempts that are blocked for some reason, I get the following numbers:
- the straight average is one fetch every 12.9 seconds (with a standard deviance of 13.7).
- the median is one fetch every 9 seconds.
- the longest gap between two feed requests was 130 seconds.
- 90% of the inter-request gaps were 31 seconds or less, 75%
were 18 seconds or less, and 25% were 3 seconds or less.
- 6% of the feed fetch requests came at the same time (to the second) as another request; the peak number of fetches in one second is four, which happened several times.
- 7.5% came one second after the previous request (and this is the mode, the most common gap), 6% two seconds, 6% three seconds, and 5.5% four seconds. I'm going to stop there.
Of course averages are misleading; a thorough workup here would involve gnuplot and peering at charts (and also more than just 24 hours of data).
This is an interesting question partly because every so often people accidentally publish a blog entry and then want to retract it. Retraction is difficult in the face of syndication feeds; once an entry has started to appear in people's syndication feed fetches, you can no longer just remove it. My numbers suggest strongly that even moderately popular blogs have very little time before this starts happening.
2015-07-04
Googlebot and Feedfetcher are still aggressively grabbing syndication feeds
Somewhat more than a year ago I wrote about how I'd detected Googlebot aggressively crawling my syndication feeds, despite them being marked as 'stay away'. At the time I was contacted by someone from Google about this and forwarded various information about it.
Well, you can probably guess what happened next: nothing. It is now more than a year later and Googlebot is still determinedly attempting to pound away at fetching my syndication feed. In fact it made 25 requests for it yesterday, all of which got 403s as a result of me blocking it back then. In fact Googlebot is still trying on the order of 25 times a day despite getting 403s on all of its requests for this URL for literally more than a year.
(At least it seems to be down to only trying to fetch one feed URL.)
Also, because I was looking, back what is now more than a year and a half ago I discovered that Google Feedfetcher was still fetching feeds; as a result I blocked it. Well, that's still happening too. Based on the last 30 days or so, Google Feedfetcher is making anywhere between four and ten attempts a day. And yes, that's despite getting 403s for more than a year and a half. Apparently those don't really discourage Google's crawling activities if Google really wants your data.
I'd like to say that I'm surprised, but I'm not in the least bit. Google long ago stopped caring about being a good Internet citizen, regardless of what its propaganda may say. These days the only reason to tolerate it and its behavior is because you have no choice.
(As far as I can tell it remains the 800 pound gorilla of search traffic, although various things make it much harder for me to tell these days.)
Sidebar: The grumpy crazy idea of useless random content
If I was a real crazy person, it would be awfully tempting to divert Google's feed requests to something that fed them an endless or at least very large reply. It would probably want to be machine generated valid Atom feed entries full of more or less random content. There are of course all sorts of tricks that could be played here, like embedding honeypot URLs on a special web server and seeing if Google shows up to crawl them.
I don't care enough to do this, though. I have other fish to fry in my life, even if this stuff makes me very grumpy when I wind up looking at it.
2015-06-12
My pragmatic view of HTTPS versus caching
One of the criticisms of going all HTTPS on the web is that it pretty much destroys caching. As Aristotle Pagaltzis commented on my entry, caching somewhat obscures traffic flow by itself (depending on where the cache is and who is watching), and as other people have commented in various places (cf), caching can serve valuable bandwidth reduction purposes. I and other people advocating an all HTTPS world should not ignore or understate this. On the contrary we should be explicit and admit that by advocating all HTTPS we are throwing some number of people who use caching under the bus.
The problem is that there is no good choice; regardless of what we choose here someone is getting thrown under the bus. If we go HTTPS and lose caching, we throw cache users under the bus. But everything that stays HTTP throws a significant number of other people under the bus of ISP traffic inspection, interception, tampering, and general monetization through various means. Our only choice is who gets thrown under in what circumstances, and what the effects of getting run over by the bus are. We cannot in any way pretend that there are no downsides of staying with HTTP, because there clearly are and they are happening today.
The effects of losing caching are mostly that for some people web browsing gets slower and perhaps more expensive due to bandwidth charges. The effects of losing privacy and content integrity are that for lots of people, well, they lose privacy, have their activities tracked quite intrusively, have advertising shoved down their throat and sometimes have their browsing weaponized and so on.
Faced with this tradeoff, I pick throwing people using caching under the bus of slower access. Sorry, cache users, I regret that you're going to have this happen to you (at least until people develop some more sophisticated HTTPS-capable caches and systems), but as far as I'm concerned it's clearly the lesser of two evils (as seen from my position, which is biased in some ways).
(I will not go so far as saying that cache users who insist that everyone else continue to have traffic intercepted, monitored, and monetized in order for the cache users to have an easier time are being selfish, partly because of the cost issues. But sometimes I do sort of feel that way.)