Wandering Thoughts

2023-01-25

You should back up the settings for your Firefox addons periodically

Today I had some unexpected excitement with two of my core Firefox addons in my core browser, where either or both of uBlock Origin and uMatrix apparently stopped working and stopped everything else from working along with them, since they're on the critical path for getting web pages. I eventually got everything working again, but I wound up needing to remove and then reinstall from scratch uBlock Origin and uMatrix, both of which have complex configurations. This is where I discovered that my most recent backups of those settings were from 2020 (and from a different machine). Oops.

(I have full filesystem backups, but as far as I know you can't easily extract an addon's settings from a Firefox profile directory, so I would have had to completely restore my entire Firefox profile.)

Many of my Firefox addons have some sort of configuration settings, and yours probably do too (if you use addons). uMatrix and uBlock Origin have a collection of filtering settings, Foxy Gestures has my gesture customizations, Stylus has a bunch of styles, Cookie AutoDelete knows which cookies I don't want to delete, and so on. All of these would be annoying or painful to have to recreate from scratch, and all of these addons offer a way to back up ('export') and restore ('import') their settings. I've done that before (although not for all of my addons), but up until now I've only been doing it very sporadically, as in once every few years (when my settings for some extensions change much more often).

That's why I say back up your Firefox addon settings every so often. You never know when you may need to remove and then re-install an addon, and you can even do it accidentally (for reasons out of the scope of this entry, I once accidentally removed uBlock Origin). It'll also make it much less painful if you ever have to completely redo your Firefox profile. And you can also use your backups to set up a new instance of Firefox elsewhere, for example on a different machine. Unfortunately you'll have to do this by hand and addons don't have a consistent process for how to do it, which for most people (me included) does get in the way of doing it regularly.

(My initial fediverse post for backups was just about uBlock Origin and uMatrix, which are the addons where things change the most frequently, but the more I thought about it the more I realized it also applied to most of my other ones too, especially Stylus.)

PS: Since I did the Internet searches, see this answer, this answer, and this question and answer for information on where Firefox stores an addon's data (including your settings). The short version of all of those answers is that you probably don't want to try doing this unless you're really desperate to get the data out, although it's technically possible.

FirefoxAddonsBackUpSettings written at 23:06:17; Add Comment

2023-01-21

How Prometheus makes good use of the HTTP Accept: header

Over on the Fediverse, Simon Willison asked if the HTTP Accept: header was a good idea, which he later narrowed down to APIs and HTML content, excluding media (video, images, etc). I realized that I knew of a good example for APIs, which is how Prometheus metrics exporters use Accept to determine what format they'll report metrics in (although it turns out that I was a bit wrong in my Fediverse post).

Prometheus metrics exporters are queried ('scraped') by Prometheus and respond with metrics in some format. Historically there has been more than one format, as sort of covered in Exposition Formats; currently there's two text ones (Prometheus native and OpenMetrics) and one binary one (with some variations). The text based formats are easy to generate and serve by pretty much anything, while the binary format is necessary for some new things (and may have been seen as more efficient in the past). A normal metrics exporter (a 'client' in a lot of Prometheus jargon) that supports more than one format will choose which format to reply with based on the query's HTTP Accept header, defaulting to the text based format.

Supporting multiple metrics formats at one URL has a number of advantages, especially since everything can produce and consume one of the text formats. People setting up Prometheus servers and clients don't have to care about what format each of them supports in order to set the scrape URL, as they would if the format was in the URL (eg, '/metrics/promtext' instead of '/metrics'). Prometheus and other scrapers don't have to make multiple requests in order to discover the best format they want to use, the way they would have to if the starting URL simply returned an index of format options. And the format used is ultimately under the control of the client more than the server, so a metrics exporter can shift between formats if it needs to (for example if you reconfigure it to report something only supported in one format).

(Currently the wire formats can be found listed and described a bit in common/expfmt/expfmt.go. A Prometheus server may scrape hundreds or thousands of targets every fifteen to thirty seconds, so extra HTTP requests could hurt.)

I suspect that Prometheus isn't the only HTTP based API using the Accept header to transparently choose the best format option for sending data, or to allow seamless upgrades of the supported formats over time. As a system administrator who doesn't want to have to work out, configure, and update format specific endpoint URLs by hand, I fully support this.

(In practice the result of forcing system administrators to set up format specific URLs by hand is probably that the format used for transport is basically fixed once configured, even if both sides are later upgraded to support a better option. This is especially the case if different clients are updated at different times.)

As a side note, this only really works in a pull model instead of a push one. If you push, it's more difficult to ask the other end what (shared) format it would like you to send. A pull model such as Prometheus's provides a natural way to negotiate this, since the puller sends what formats they can accept and then the data source can pick the one it wants out of that.

HTTPAcceptAndPrometheus written at 21:59:47; Add Comment

2023-01-17

An aggressive, stealthy web spider operating from Microsoft IP space

For a few days, I've been noticing some anomalies in some metrics surrounding Wandering Thoughts, but nothing stood out as particularly wrong and my usual habit of looking at the top IP addresses requesting URLs from here didn't turn up anything. Then today I randomly wound up looking at the user-agents of things making requests here and found something unpleasant under the rock I'd just turned over:

Today I discovered that there appears to be a large scale stealth web crawler operating out of Microsoft IP space with the forged user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15', which I believe is a legitimate UA. Current status: working out how to block this in Apache .htaccess.

By the time I noticed it today this spider had made somewhere over 25,000 requests today in somewhat over twelve hours, or at least with that specific user agent (it's hard to see if it used other ones with all of the volume). It made these requests from over 5,800 different IPs; over 600 of these IPs are on the SBL CSS and one of them is SBL 545445 (a /32 phish server). All of these IP addresses are in various networks in Microsoft's AS 8075, and of course none of them have reverse DNS. As you can tell from the significant number of IPs, most IPs do only a few requests and even the active ones did no more than 20 (today, by the time I cut them off). This is a volume level that will fly under the radar for anyone's per-IP ratelimiting.

(Another person reported a similar experience including the low volume per IP. Also, I assume that there is some Microsoft cloud feature for changing your outgoing IP all the time that this spider is exploiting, as opposed to the spider operator having that many virtual machines churning away in Microsoft's cloud.)

This spider seems to have only shown up about five or six days ago. Before then this user agent has no particular prominence in my logs, but then in the past couple of days it's go up to almost 50,000 requests a day. At that request volume most of it is spidering or re-spidering uselessly duplicated content; Wandering Thoughts doesn't have that many unique pages.

This user agent is for Safari 15.1, which was released more than a year ago (apparently October 27th, 2021, or maybe a few days before), and as such is rather out of date by now. Safari on macOS is up to Safari 16, and Safari 15 was (eventually) updated to 15.6.1. I don't know why this spider picked such an out of date user agent to forge, but it's convenient; any actual person still running Safari 15.1 needs to update it anyway to pick up security fixes.

(For the moment, the best I could do with my eccentric setup here was to block anyone using the user agent. Blocking by IP address range is annoying, seeing as today's lot of IP addresses are spread over 20 /16s.)

Sidebar: On the forging of user agents

On the Fediverse, I was asked if it wasn't the case that all user-agent strings were forged in some sense, since these days they're mostly about a statement of compatibility. My off the cuff answer encapsulates something that I want to repeat here:

There is a widespread de facto standard that spiders, crawlers, and other automated agents must report themselves in their user-agent instead of pretending to be browsers.

To put it one way, humans may impersonate each other, but machines do not get to impersonate humans. Machines who try to are immediately assumed to be up to no good, with ample historical reasons to make such an assumption.

(See also my views on your User-Agent header should include and why.)

The other thing about this is that compatibility is a matter for browsers, not spiders. If your spider claims to be 'compatible' with Googlebot, what you're really asking for is any special treatment people give Googlebot.

(Sometimes this backfires, if people are refusing things to Googlebot.)

AggressiveStealthyWebSpider written at 22:57:08; Add Comment

2023-01-12

A browser tweak for system administrators doing (web) network debugging

As a system administrator (and sometimes an ordinary user of the web), I periodically find myself trying to work out why I or people around here can't connect to some website or, sometimes, a portion of the website doesn't work. It turns out that there's a tweak you can make to Firefox and Chrome (and probably other browsers) that makes this somewhat easier to troubleshoot.

(We once had an incident where Google Cloud Platform stopped talking to some of our IPs. Some websites host only a portion of their assets or web application in GCP, so people could load a website's front pages (hosted outside of GCP) but trying to go further or do things in the web app would fail (when it touched GCP and GCP said 'nope'). Even figuring out what was going on took some people here rather a lot of work.)

Modern web browsers have a 'Web Developer Tools' environment that includes a Network tab that will tell you about the requests the current page is doing. By default the information the Network tab presents is focused on the interests of web developers and so lacks some information that system administrators find very helpful. However, you can customize it, and in particular you can make it also show the (HTTP) protocol being used and the remote IP, which are extremely useful for people like me.

To do this, call up Web Developer Tools with, for example, Ctrl+Shift+I. Switch to the Network tab if you're not already on it, and make a request so that the tab displays some data and you have the header. Right click on the header and turn on the Protocol and the remote IP. Turning on the Scheme is optional (it will probably mostly be 'https') but will let you spot websocket connections if you want to check or verify that you have one. Knowing the HTTP protocol is important these days because HTTP/3 is an entirely different transport and may run into firewall issues that HTTP/2 and HTTP/1.1 don't.

(This isn't relevant if you've turned HTTP/3 off in your browser, but then your users probably don't have it turned off and you may need to emulate their setup.)

In an ideal world there would be a way to get your browser to tell you about all currently open or in-flight network connections, both the low level details of where you're connecting to (and how) and the high level details of what protocol the browser is trying to speak over the connection, what web request it's trying to satisfy, and so on. Firefox has about:networking, but generally this gives only low-level details in a useful form. Chrome has taking and exporting logs in chrome://net-export and there's also chrome://net-internals (but it didn't do much for me), and maybe there's other things lurking in chrome://chrome-urls.

(In Firefox, for example, I can see that Firefox is holding open a HTTPS connection to an AWS IP and periodically doing stuff with it, and tcpdump confirms this, but about:networking won't tell me what host name this is for or what web page it's associated with. This is probably some Mozilla internal service, but finding out that it might be 'push.services.mozilla.com' took an absurd amount of work.)

(All of this was sparked by an issue I incorrectly blamed on HTTP/3, which led me to the Cloudflare blog on how to test HTTP/3, which taught me about the Web Developer Tools trick for the protocol.)

PS: Firefox will at least let you get a global view of (new) network activity that it knows about, via the Network tab of the "Browser Toolbox" (Ctrl+Shift+Alt+I). You want to pick 'Multiprocess (slower)'. I believe this will also let you temporarily disable the cache globally, across all windows and tabs.

BrowserNetworkDebuggingTweak written at 23:17:44; Add Comment

2022-12-31

Going from a Firefox preference to the underlying configuration setting

Suppose, not entirely hypothetically, that you have a Firefox "Preferences" option and you'd like to know for sure what about:config setting corresponds to it. One way to do this is to look it up in the Firefox source code, which is probably most readily done online through searchfox.org. This has to be done in two steps because there's a little bit of indirection in the Firefox code base (due to localization).

First, search for the full text of the Preference option, such as "Autofill logins and passwords", but you have to do it without quotes (unlike typical Internet search engine behavior). This will typically get you at least a file under browser/locales/en-US (assuming you're using the English text), in this case browser/locales/en-US/browser/preferences/preferences.ftl. Click on the hard to see line number and you'll jump right to where it's mentioned in the file, and you'll get something like (in this case):

# Checkbox which controls filling saved logins into fields automatically when they appear, in some cases without user interaction.
forms-fill-logins-and-passwords =
    .label = Autofill logins and passwords
    .accesskey = i

Now take that name ('forms-fill-logins-and-passwords') and search for it in the code. This should get you at least two hits in the main Firefox code, one in the file you just found it in and one in, in this case, browser/components/preferences/privacy.inc.xhtml. Once again you can click on the little line numbers to jump to the right spot in the file, which will give you a blob of XML that looks like:

<checkbox id="passwordAutofillCheckbox"
    data-l10n-id="forms-fill-logins-and-passwords"
    search-l10n-id="forms-fill-logins-and-passwords.label"
    preference="signon.autofillForms"
    flex="1" />

The 'preference=' is the about:config setting name that this works on.

Not all Preferences options and about:config settings are related in as straightforward a way as this, but this works a lot of the time.

You can also work the process in reverse. Given an about:config setting, you can search for it, find the relevant preferences/ XML (if there is any), get the localization label, and find the text of the label in your language. Sometimes simply knowing the section of Firefox's Preferences that controls the about:config setting is a good lead, especially once you throw in the l10n ID name.

(When you start with an about:config setting, you'll get a lot more hits because you also find where that setting is used and looked at.)

FirefoxPreferenceToConfigSetting written at 22:45:18; Add Comment

2022-12-30

Disabling automatic form autofilling in Firefox (which is now simple)

If you allow Firefox to memorize your logins and passwords on web sites, by default Firefox will automatically pre-populate login forms with them. However, for years I've had this turned off, probably no later than when I read 2017's No boundaries for user identities: Web trackers exploit browser login managers. I was going to say that there's no Preferences option to control this, but it turns out that there is these days, in "Privacy & Security"'s "Logins and passwords" section as 'autofill logins and passwords'. This controls the about:config setting signon.autofillForms. If you untick this option (or set the value to false from the default true), you need to click or otherwise select the field before you'll get the option to autofill it.

(In other articles about this general issue, there's 2021's You should turn off autofill in your password manager (via a Firefox issue) and probably others.)

There is a long standing open Firefox bug, bug #1427543 make signon.autofillForms = false the default. The bug doesn't seem to have gotten much commentary or attention over the five years it's existed, so it's not clear if Firefox will ever act to change the default (or definitively reject the idea, for that matter). These days there is a related preference specifically for HTTP login forms, signon.autofillForms.http, which defaults to false (off) now (bug #1217152).

(There's also a separate 'signon.autofillForms.autocompleteOff' preference, which is described in bug #1531135 Add a pref to not autofill in password fields with an autocomplete field name of "off".)

I'm glad that Firefox picked up an explicit Preferences option for this, since it means I don't have to remember (or hunt down) the name of the about:config setting the way I used to back in the days. Admittedly I don't need to do it very often, but I do set up a new Firefox instance or profile every so often and this is one of the changes I want to make.

(Until I did my research for this entry I didn't know about the Preferences option, so I was mostly writing it to capture the name of the about:config setting for future use.)

FirefoxDisableAutofill written at 23:31:55; Add Comment

2022-12-22

The power of URLs you can use with query parameters and a HTTP GET request

I recently wrote about some aspects of my dmenu setup, including using a custom $PATH that contains a bunch of little utility scripts. These little scripts are an important part of making my dmenu setup so useful and a crucial building block of my environment, but in turn a lot of them are enabled by something else. In practice, a lot of what I do with dmenu is to open a Firefox window on some sort of URL, and in turn this is relies on being able to create URLs that do useful things, like perform a web search.

Some of those URLs can be readily formed because they have a predictable structure, such as packages in the Python standard library or packages in the Go standard library. If you know the name of a standard package in either language, it's easy to create the URL for it, which in turn makes it easy to write a little script that will open a new Firefox window showing a particular package's documentation (especially if you don't bother with error checking if the package actually exists).

However, not all things in the world can reasonable be addressed with plain URLs, for various reasons. For other things, it's both useful and powerful to be able to address them using URLs that have query parameters attached. URLs with query parameters aren't the only way to pass such extra information around (for example, you could use HTML forms and pass them as form parameters in a POST), but they have the great virtue that it's easy to open browser windows on the results, since from the browser's perspective they're just another URL to make a GET request on.

(Many other tools also have a much easier time with GET requests than with other forms of requests.)

What this says to me is that if you're setting up a local web application, if at all possible you should make it accessible either through URL patterns (in the simple case) or through URLs with query parameters (in the more complex pattern of search). This is the traditional web application implementation of searching or looking things up, but it's not necessarily universal and sometimes it can feel like more work (your framework might encourage use of POST with forms, for example, and make GET based usage more difficult; the Django framework is a bit like this).

(There are even public websites that are exceptions and want you to use POST, although sometimes you can make GET queries with query parameters anyway.)

This is in a sense obvious, but it's the first time I've put it all together in my head and realized how important support for GET query parameters are for my collection of little dmenu scripts (both for web searches and to do things like search some of our local documentation).

URLPowerOfQueryParameters written at 22:44:03; Add Comment

2022-12-04

How to lose some of your tabs in Firefox 107+ (and possibly earlier)

Recently, over on the Fediverse I said:

In Firefox 107 (and possibly earlier), if you convert a browser window into a tab in another window, quit Firefox, and restart, you lose the window-to-tab tab. It doesn't even make it into the session store. This is a potential 'data' loss, in that you can lose URLs that you wanted to read/etc.

(This is filed as bug #1801952.)

If you quit and restart Firefox normally, you have to have 'Open previous windows and tabs' turned on in Preferences in order to see this; I don't know how common that is (although it's something that I think of as essential for Firefox usage). If your Firefox just crashes (or is abruptly shut down when your session ends), this can (well, will) happen when you restart and Firefox tries to recover your aborted session.

This comes up for me because I default to opening URLs in new windows instead of in tabs, but periodically I open a burst of new windows that actually should be grouped together so I dock all all but one of those windows as tabs in the first window. People who default to tabs probably won't see this, since it's likely to be rare that you use a new window, never mind dock a window back into a tab.

You can still 'dock' URLs from windows into tabs, by opening a new tab and then pasting the URL from the window into the tab. This does preserve the new tab in Firefox's session history, although you only get the URL and not the history of the window it came from (if that history is important).

(Dumping a modern Firefox session store to inspect it or verify issues like this is somewhat involved. See for example this. I have my own tool for it, but it's a hacked up thing.)

Many Firefox issues are relatively easy to track down to a specific change using the excellent mozregression tool, and I normally do that. Unfortunately, mozregression normally expects you to be able to tell it if a given revision is good or bad with a single run of Firefox, and in this case it takes either two runs or manual inspection of the saved session state in whatever profile directory mozregression is using. So far I haven't been energetic enough to slog through the extra work that would be required to find the specific commit.

Firefox107HowToLoseTabs written at 22:49:03; Add Comment

2022-12-02

Apache 2.4's event MPM and oddities with ServerLimit

I mentioned recently that we were hopefully going to move away from the Apache prefork MPM when we upgraded our primary web server from Ubuntu 18.04 to Ubuntu 22.04. We had tried to switch over to the event MPM back in 18.04, but had run into problems and had reverted to the prefork MPM as a quick fix. Specifically, we had run into Apache stopping serving requests and reporting the following in the error log:

[mpm_event:error] [pid <nnn>:tid <large number>] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.

(The message really doesn't have a space between the two sentences. Maybe someday it will.)

As covered in the MaxRequestWorkers documentation, the starting or minimum number of server processes you need is MRW divided by your ThreadsPerChild value. If you set MRW to 300 and TPC to 25 (as we do), this is 12 server processes. However, the ServerLimit documentation also says:

With event, increase this directive if the process number defined by your MaxRequestWorkers and ThreadsPerChild settings, plus the number of gracefully shutting down processes, is more than 16 server processes (default).

How many gracefully shutting down processes are you likely to have? Who knows, although it may be (strongly) influenced by your setting for MaxConnectionsPerChild, if you have one. This is one possible source of our problem with the event MPM on Ubuntu 18.04, although at the time we also found a suggestive Apache bug and generally didn't trust the event MPM enough to keep trying with it.

Yesterday, when we upgraded our central web server to Ubuntu 22.04 and an Apache configuration that uses the event MPM, we got an unpleasant surprise. Our problem from 18.04 came back overnight, and in fact the error message I quoted above comes from our 22.04 error.log. This time around we haven't reverted back to the prefork MPM; instead we're trying various things to make the event MPM work.

One thing we're trying is that, well, maybe the event MPM doesn't have a bug here, it just has more processes that are shutting down gracefully than we expect. So we've raised the ServerLimit from the default of 16 to 32. The other thing we're trying is that we've turned off our use of Apache's mod_qos. Although there have been other reasons for using it in the past, today we use it to deal with our file serving problem, where we have a few sets of large files that are requested in bulk by often slow clients. One of the reasons we wanted to switch to the event MPM is that it should handle these much better than the prefork MPM (which must use a process for each of them). If our theory is correct, we can afford to operate without the ratelimits from mod_qos.

(If our theory is wrong, our alerts are going to let us know the next time these files are unusually popular, as opposed to their regular 25 Mbyte/second level of popularity.)

ApacheEventMPMAndServerLimit written at 22:55:15; Add Comment

2022-11-21

Using curl to test alternate (test) servers for a web site

One of the perpetual issues in system administration is that we have a new version of some web site of ours to test, for example because we're upgrading the server's operating system from Ubuntu 18.04 to 22.04. In many cases this is a problem because the web server's configuration wants to be 'example.org' but you've installed it on 'test.internal' because 'example.org' points to your current production server. Two traditional approaches to this are to modify your local /etc/hosts (or equivalent) to claim that 'example.org' has the IP address of 'test.internal', or to change the web server's Apache (or nginx or etc) configuration so that it believes it's 'test.internal' (or some suitable name) instead of 'example.org'.

As I learned today, curl has an option to support this sort of mismatch between the server's official name and where it actually is. Actually it has more than one of them, but let's start with --resolve:

curl --resolve example.org:443:<IP of server> https://example.org/

As covered in the curl manual page, the --resolve option changes the IP address associated with a given host and port combination. For HTTP requests, this affects both the HTTP Host header and the TLS SNI (for HTTPS connections). You can give multiple --resolve options if you want to, and it takes wildcards so you can do slightly crazy things like:

curl -L --resolve *:443:<server-IP> https://example.org/redir

As mentioned by Eric Nygren, this can also be used to test an IPv6 IP before you publish it in DNS (or an alternate IPv4 IP).

Curl also has the --connect-to option, which is potentially more powerful although somewhat more verbose. It has two options that I can see, which is that it will take a host name instead of an IP address and that you can change the port (which you might need to do in order to talk to some backend server). You can wildcard everything, although in a different syntax than with --resolve, so our two examples are:

curl --connect-to example.org:443:test.internal: https://example.org/
curl -L --connect-to :443:test.internal: https://example.org/redir

You can also omit the original port, for example if you want to test HTTP to HTTPS redirection on your new test server:

curl -L --connect-to example.org::test.internal: http://example.org/

Having learned the distinction, I'll probably mostly use --connect-to because while it's slightly longer and more complicated, it's also more convenient to be able to use the test server's hostname instead of having to keep looking up its IP.

For more reading, there's Daniel Stenberg's curl another host, which also covers merely changing the Host: header.

As far as I know, curl has no option to specifically change the TLS SNI by itself, although possibly you could achieve the same effect by artfully combining --resolve (or --connect-to) and explicitly setting the Host: header. Probably there's no case where you'd want to do this (or Apache would let you do it). You can always use curl's --insecure option to ignore TLS certificate errors.

CurlTestingAlternateServer written at 21:52:58; Add Comment

(Previous 10 or go back to November 2022 at 2022/11/14)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.