2015-04-30
I'm considering ways to mass-add URLs to Firefox's history database
I wrote yesterday about how I keep my browser history forever, because it represents the memory of what I've read. A corollary of this is that it bugs me if things I've read don't show up as visited URLs. For example, if all of the blog entries and so on here at Wandering Thoughts were to turn unvisited tomorrow, that'd make me twitch every time I read something here and saw a blue link that should instead be visited purple.
(One of the reasons for this is that links showing visited purple is a sign that they point to the right place. Under normal circumstances, if links on Wandering Thoughts suddenly go blue, something has probably broken. And when I'm drafting entries, a nominal link to an older entry that shows blue is a sign that I got the link wrong.)
Which winds up with the problem: Wandering Thoughts and indeed this entire site is in the process of moving from HTTP to HTTPS. The HTTP versions of all of the entries and so on are in my Firefox history database, but Firefox properly considers the HTTPS version to be a completely different URL and so not in the history. So, all of a sudden, all of my entries and links and so on are unvisited blue. At one level this is not a problem. After all, I know that I've read them all (I wrote them). In theory, I could leave everything here alone, then maybe re-visit links one by one as I use them in new entries or otherwise run across them. But the whole situation bugs me; by now, seeing all the links be purple is reassuring and the way things should be, while blue links here make me twitch.
Conceptually the fix is simple. All I have to do is get every HTTP URL for here out of my existing history database, mechanically turn the 'http:' into 'https:', and then add all of the new URLs to Firefox's history database. All of the last visited and so on values can be exactly copied from the HTTP version of the URL. The only problem is that as far as I know there is no tool or extension for doing this.
(There are plenty of addons for removing history entries, which is of course exactly the opposite of what I want.)
These days, Firefox's history in is a SQLite database (places.sqlite in your profile directory). There are plenty of tools and packages to manipulate SQLite databases, which leaves me with merely the problem of figuring out what actually goes into a history entry in concrete detail (and then calculating everything that isn't obvious). So all of this is achievable, but on the other hand it's clearly going to be a bunch of work.
(While the Places database is documented,
parts of this documentation are out of date. In particular, current Firefox
places.sqlite has a unique guid field in the moz_places table.)
PS: The other obvious nefarious hack is to literally rewrite the
URLs in all current history entries to be 'https:' instead of
'http:', possibly by dumping and then reloading the moz_places
table. Assuming that you can change the URL schema without invalidating
any linkages in the database, this is simple. Unfortunately it has
a brute force inelegance that makes me grumpy; it's clearly the
expedient fix instead of the right one.
Why I have a perpetual browser history
I've mentioned in passing that I keep my browser's history database basically forever, and I've also kind of mentioned that it drives me up the wall when web sites make visited links and unvisited links look the same. These two things are closely related.
Put simply, the visited versus unvisited distinction between links is a visible, visual representation of your current state of dealing with a (good) site. A visited link tells you 'yep, I've been there, no need to visit again'; an unvisited link tells you that you might want to go follow it. This representation of state is very important because otherwise we must fall back on our fallible, limited, and easily fooled human memories to try to keep track of what we've read and haven't read. This fallback is both error-prone and a cognitive load; mental effort you're spending to keep track of what you've read is mental effort you can't use on reading.
Of course this doesn't work on all sites (and doesn't work all the time even on 'good' sites). I'm sure you can come up with any number of sites and any number of ways that this breaks down, and so the visited versus unvisited state of a page is not important or useful information. But it works well enough on enough sites to be extremely useful in practice, at least for me.
And this is why I want my browser history to last forever. My browser history is the collected state representation of what I have and haven't read. It tracks things not just now, in my currently active browsing session as I work through something, but also back through time, because I don't necessarily forget things I've read long ago (but at the same time I don't necessarily remember them well enough to be absolutely confident that I've already read them). For that matter, I don't always get through big or deep sites in one go, so again the visited link history is a history of how far I've gotten in archives or reference articles or the like.
There is nothing else on the web that can give me this state recall, nothing else that serves to keep track of 'how far have I gotten' and 'have I already seen this'. The web without it is a much more spastic and hyperactive place. It's a relatively more hyperactive place if I only have a short-term state recall; I really do want mine to last basically forever.
(In fact for me anything without a read versus unread state indicator is an irritatingly spastic and hyperactive place. All sorts of things are vastly improved by having it, and lack of it causes me annoyance (and that example is on the web).)
2015-04-10
My Firefox 37 extensions and addons (sort of)
A lot has changed in less than a year since I last tried to do a comprehensive inventory of my extensions, so I've decided it's time for an update since things seem to have stabilized for the moment. I'm labeling this as for Firefox 37 since that's the just out latest version, but I'm actually running Firefox Nightly (although for me it's more like 'Firefox Weekly', since I only bother quitting Firefox to switch to the very latest build once in a while). I don't think any of these extension work better in Nightly than in Firefox 37 (if anything, some of them may work better in F37).
Personally I hope I'm still using this set of extensions a year from now, but with Firefox (and its addons) you never know.
Safe browsing:
- NoScript
to disable JavaScript for almost everything. In a lot of cases I don't
even bother with temporary whitelisting; if a site looks like it's
going to want lots of JavaScript, I just fire it up in my Chrome
Incognito environment.
NoScript is about half of my Flash blocking, but is not the only thing I have to rely on these days.
- FlashStopper
is the other half of my Flash blocking and my current solution
to my Flash video hassles on YouTube,
after FlashBlock ended up falling over. Note that contrary to what
its name might lead you to expect, FlashStopper blocks HTML5 video
too, with no additional extension needed.
(In theory I should be able to deal with YouTube with NoScript alone, and this even works in my testing Firefox. Just not in my main one for some reason. FlashStopper is in some ways nicer than using NoScript for this; for instance, you see preview pictures for YouTube videos instead of a big 'this is blocked' marker.)
- µBlock
has replaced the AdBlock family as my ad blocker. As mentioned I mostly have this because throwing out YouTube
ads makes YouTube massively nicer to use. Just as other people have
found, µBlock clearly takes up the least memory out of all of the
options I've tried.
(While I'm probably not all that vulnerable to ad security issues, it doesn't hurt my mood that µBlock deals with these too.)
- CS Lite Mod is my current 'works on modern Firefox versions' replacement for CookieSafe after CookieSafe's UI broke for me recently (I needed to whitelist a domain and discovered I couldn't any more). It appears to basically work just like CookieSafe did, so I'm happy.
I've considered switching to Self-Destructing Cookies, but how SDC mostly works is not how I want to deal with cookies. It would be a good option if I had to use a lot of cookie-requiring sites that I didn't trust for long, but I don't; instead I either trust sites completely or don't want to accept cookies from them at all. Maybe I'm missing out on some conveniences that SDC would give me by (temporarily) accepting more cookies, but so far I'm not seeing it.
My views on Ghostery haven't changed since last time. It seems especially pointless now that I'm using µBlock, although I may be jumping to assumptions here.
User interface (in a broad sense):
- FireGestures.
I remain absolutely addicted to controlling my browser with
gestures and this works great.
(Lack of good gestures support is the single largest reason I won't be using Chrome regularly any time soon (cf).)
- It's All Text!
handily deals with how browsers make bad editors. I use it a bunch these days, and in
particular almost of my comments here on Wandering Thoughts are now written with it, even relatively short ones.
- Open in Browser because most of the time I do not want to download a PDF or a text file or a whatever, I want to view it right then and there in the browser and then close the window to go on with something else. Downloading things is a pain in the rear, at least on Linux.
(I wrote more extensive commentary on these addons last time. I don't feel like copying it all from there and I have nothing much new to say.)
Miscellaneous:
- HTTPS Everywhere basically
because I feel like using HTTPS more. This sometimes degrades or
breaks sites that I try to browse, but most of my browsing is not
particularly important so I just close the window and go do something
else (often something more productive).
- CipherFox gives me access to some more information about TLS connections, although I'd like a little bit more (like whether or not a connection has perfect forward secrecy). Chrome gets this right even in the base browser, so I wish Firefox could copy them and basically be done.
Many of these addons like to plant buttons somewhere in your browser window. The only one of these that I tolerate is NoScript's, because I use that one reasonably often. Everyone else's button gets exiled to the additional dropdown menu where they work pretty fine on the rare occasions when I need them.
(I would put more addon buttons in the tab bar area if they weren't colourful. As it is, I find the bright buttons too distracting next to the native Firefox menu icons I put there.)
I've been running this combination of addons in Firefox Nightly sessions that are now old enough that I feel pretty confident that they don't leak memory. This is unlike any number of other addons and combinations that I've tried; something in my usage patterns seems to be really good at making Firefox extensions leak memory. This is one reason I'm so stuck on many of my choices and so reluctant to experiment with new addons.
(I would like to be able to use Greasemonkey and Stylish but both of them leak memory for me, or at least did the last time I bothered to test them.)
PS: Firefox Nightly has for some time been trying to get people to try out Electrolysis, their multi-process architecture. I don't use it, partly because any number of these extensions don't work with it and probably never will. You can apparently check the 'e10s' status of addons here; I see that NoScript is not e10s ready, for example, which completely rules out e10s for me. Hopefully Mozilla won't be stupid enough to eventually force e10s and thus break a bunch of these addons.
2015-04-08
Your entire download infrastructure needs to use HTTPS
Let's start with something that I tweeted:
Today's security sadface: joyent's Illumos pkgsrc download page is not available over https, so all those checksums/etc could be MITMd.
Perhaps it is not obvious what's wrong here. Well, let's work backwards.
The Joyent pkgsrc bootstrap tar archive is served over plain HTTP, so a
man in the middle attacker can serve us a compromised tarball when we
use curl to fetch it. That's obvious, and the page gives us a SHA1
checksum and a PGP key to verify the tarball. But the page itself is
served over over plain HTTP, so the man in the middle attacker could
alter it too so it has the SHA1 checksum of their compromised tarball.
So surely the PGP verification will save us? No, once again we are
undone by HTTP; both the PGP key ID and the detached PGP ASCII signature
are served over HTTP, so our MITM attacker can alter the page to have a
different PGP key ID and then serve us a detached PGP ASCII signature
made with it for their compromised tarball.
(Even if retrieving the PGP key itself from the keyserver is secure, the attacker can easily insert their own key with a sufficiently good looking email address and so on. Or maybe even a fully duplicated email address and other details.)
There's a very simple rule that everyone should follow here: every step of a download process needs to be served over HTTPS. For instance, even without PGP keys et al in the picture it isn't sufficient to serve just the tarball over HTTPS, because a MITM attacker can rewrite the plaintext 'download things here' page to tell you to download the package over HTTP and then they have you. The entire chain needs to be secure (and forced that way) and from as far upstream in the process as you can manage (eg from the introductory pkgsrc landing page on down, because otherwise the attacker changes the landing page to point to a HTTP download page that they supply and so on).
Of course, having some HTTPS is better than none; it at least makes attackers work harder if they have to not just give you a different tarball than you asked for but also alter a web page in flight (but don't fool yourself that this is much more work, not with modern tools). And it's good to not rely purely on HTTPS by itself; SHA1 checksums and PGP signatures are at least cross-verification and can detect certain sorts of problems.
By the way, in case you think that this is purely theoretical, see the case of some Tor exit nodes silently patching nasty stuff into binaries fetched through them with HTTP. And I believe that there are freely available tools that will do on the fly alterations to web pages they detect you fetching over insecure wireless networks.
(I don't feel I'm unfairly picking on Joyent here because clearly they care not just about the integrity of the tarball but also its security, since they give not just a SHA1 (which might just be for an integrity check) but also a PGP key ID and a signature checking procedure.)
2015-04-06
What adblockers block
The thing about adblockers is that they don't really block ads; determining what is and isn't an ad is an AI problem and we're nowhere near solving those. So what adblockers really block is signs and patterns that designate or suggest ads. The primary patterns that adblockers can use are URLs of resources being requested (such as images and other additional content) and the surrounding HTML context of these requests (including things like CSS tags).
(Many adblockers will allow you to inspect the patterns that they use in their preferences or advanced configuration system.)
No set of heuristics and patterns can possibly be complete. So in practice adblockers only block major sources of ads, because these are the sources of ads where the work of writing rules really pays off in a reduction of ads. In other words, adblockers mostly block decent-sized ad networks and pervasive places with ads like Facebook, Google, and YouTube.
Unless someone really goes out of their way to write rules, adblockers
do not and will never block hand-crafted ads on small sites; there
are no non-AI heuristics that can reliably figure out which bits
of Jane's Fishing Information are ads and which bits aren't. Similarly,
adblockers mostly don't block the various small scale ad networks
that are active in niche areas like online webcomics, ultimately
because they haven't annoyed anyone enough to write (and update)
the blocking rules necessary.
The direct corollary of this is that even pervasive use of adblockers cannot kill advertising on the web. The only thing they can kill is mindless, computer-targeted advertising at mass scale. Such large scale advertising is attractive to a lot of people for a lot of reasons, but it is not the only advertising model for sustaining modest websites through ads.
(Adblockers can kill advertising on large sites because even if the large sites do entirely custom advertising systems, they are large enough that people will find it worthwhile to write the rules necessary.)
PS: There's an entirely separate discussion about whether adblockers can work in the long term if advertising people get determined enough. Ultimately the system of HTML, CSS, and JavaScript that displays ads on web pages is Turing-complete and so can be obfuscated in a nearly endless number of ways if website developers want to put up with the resulting complexity.
PPS: As it happens, large scale advertising networks are already often not an attractive model for modest websites with dedicated audiences because of various fundamental drawbacks in the model (like lack of control over what ads show on your website).
2015-04-05
A note on the argument about the 'morality' of adblockers
While adblockers make some people quite happy, there are others that consider them immoral; see for example this tweet. Let's set aside the security issues and other counter-arguments to note something important: much as in another case, it's extremely disingenuous to discuss morality here without mentioning the blatant amorality of advertising on the web itself. To put it simply, the ad industry and its supporters are coming to the table with extremely unclean hands.
By and large, the story of web advertising and ad companies and networks is a story of organizations aggressively and unapologetically tracking and intruding on people for years. At every turn web advertisers have done their best to obtain more information on more people, to mine this for as much creepy insight as they could, make as much money from it as possible, and never ever ask people for permission or even inform them. At every turn, the ad industry's view has been that if they could get away with something it was all good, especially if it was legal. Morality has never entered the picture.
The ad industry has spent years cultivating a 'fuck you' attitude where they would do everything that was within their technical capabilities to spy on people and shovel ads on top of them. To now suddenly be concerned about the 'morality' of what other people do is the height of hypocrisy. The ad industry has lived by the sword of 'technical capabilities are all that matters' (to the detriment of basically everyone else on the Internet), so it's only fair that they may now die on that sword, like it or not. Adblockers are possible, so by the ad industry's own conduct they're allowed.
(Since the ad industry has no morality it of course doesn't care about its own hypocrisy here; it will bleat whatever bleatings stand some chance of keeping its exploitative business model from collapsing. But bystanders should be listening to these bleatings with a full understanding.)