Wandering Thoughts

2021-03-05

How not to use Apache's RewriteRule directive in a reverse proxy

Recently we needed to set up a reverse proxy (to one of our user run web servers) that supported WebSocket for a socket.io based user application. Modern versions of Apache have a mod_proxy_wstunnel module for this, and you can find various Apache configuration instructions for how to use it on places like Stackoverflow. The other day I shot my foot off by not following these instructions exactly.

What I wrote was a configuration stanza that looked like this:

RewriteCond %{REQUEST_URI} ^/socket.io [NC]
RewriteCond %{QUERY_STRING} transport=websocket [NC]
# This is where my mistake is:
RewriteRule (.*) "ws://ourhost:port/$1" [P,L]

During some debugging I discovered that this was causing our main Apache server to make requests to the backend server that looked like 'GET //socket.io/.... HTTP/1.1'. The user's application was very unhappy with the leading double slash, as well it might be.

This is my old friend how not to use ProxyPass back in another form. The problem is that we aren't matching the leading slashes between the original path and the proxied path; we're taking the entire path of the request (with its leading /) and putting it on after another slash. The correct version of the RewriteRule, as the Apache documentation will show you, is:

RewriteRule ^/?(.*) "ws://ourhost:port/$1" [P,L]

In my example the '?' in the regular expression pattern is unnecessary since this rewrite rule can't trigger unless the request has a leading slash, but the mod_proxy_wstunnel version doesn't require such a match in its rewrite conditions. On the other hand, I'm not sure I want to enable 'GET socket.io/...' to actually work; all paths in GET requests should start with a slash.

PS: This is a ws: reverse proxy instead of a wss: reverse proxy because we don't support TLS certificates for people running user run web servers (they would be quite difficult to provide and manage). The virtual host that is reverse proxied to a user run web server can support HTTPS, and the communication between the main web server and the user run web server happens over our secure server room network.

Sidebar: How I think I made this mistake

We initially tried to get this person's reverse proxied environment working inside a <Location> block for their personal home page on our main server, where the public path was something like '/~user/thing/'. In this situation I believe that what would be the extra leading slash has already been removed by Apache's general matching, and so the first pattern would have worked. For various reasons we then shifted them over to a dedicated virtual host, with no <Location> block, and so suddenly the '(.*)' pattern was now scooping up the leading / after all.

web/ApacheProxyRewriteRule written at 23:57:44; Add Comment

2021-03-04

Systemd needs (or could use) a linter for unit files

Today I made a discovery:

Today's Fedora packaging failure: /usr/lib/systemd/system/lttng-session.service (from lttng-tools) is a PGP key, not a systemd .service unit. (Insert joke about people once again not managing to use PGP properly)

Yes, bug filed: #1935426

I discovered this because I was watching 'journalctl -f' while I upgraded my office workstation to Fedora 33 in my usual way. The upgrade process causes systemd to re-examine your unit files and complain about things. Most of the complaints were normal ones like:

mcelog.service:8: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether
xinetd.service:10: PIDFile= references a path below legacy directory /var/run/, updating /var/run/xinetd.pid → /run/xinetd.pid; please update the unit file accordingly.

But mixed in with those complaints I noticed the much more unusual:

lttng-sessiond.service:1: Assignment outside of section. Ignoring.
lttng-sessiond.service:3: Assignment outside of section. Ignoring.
[.. for a bunch more lines ..]

That made me curious what was in the file, whereupon I discovered that it was actually a PGP public key instead of a systemd unit file.

We can laugh at this mistake because it's funny in several different ways (given that it involves PGP for extra spice). But it's actually pointing out a systematic problem, one that is also illustrated by those other messages about other systemd units, which is that there's no easy way to check your unit files to see if systemd is happy with them. In other words, there is no linter for systemd unit files.

If there was a linter, none of these problems would be there, or at least any that were still present would be ones that Fedora (or any other Linux distribution) had decided were actually okay. With a linter, Linux distributions could make it a standard packaging rule (in whatever packaging system they use) that all systemd units in a package had to pass the linter; this would have automatically detected the lttng-tools problem, probably among others. Without a linter, the only way to detect systemd unit problems is to enable them and see if systemd complains. This is not something that's easy to automate, especially during package builds, and so it's fallible and limited.

(Because of that it invites people to file bugs for things that may not be bugs. Are these issues with the PIDFile location actual oversights in the packaging or an area where Fedora's standard doesn't line up with the systemd upstream? I can't tell.)

An automatically applied linter would be especially useful for the less frequently used packages and programs, where an issue has a much easier time lurking for some time. Probably not very many people have lttng-tools even installed on Fedora, and clearly not very many of them use things that require the lttng sessiond service.

PS: This isn't the only systemd project where standards have changed and some systemd bit is now complaining. Systemd-tmpfiles complains about various things wanting to clean up bits in /var/run, for example.

linux/SystemdUnitLinterNeed written at 23:53:08; Add Comment

2021-03-03

What signal a RJ-45 serial connection is (probably) missing

We have a central (serial) console server, which means that we have a bunch of serial lines running from the serial consoles of servers to various Digi Etherlites. However, we do not use actual serial cables for this. Probably like most places that deal with a lot of serial lines, we actually use Ethernet cables. On one end they plug directly into the Etherlites; on the other end we use what are generally called 'DB9 to RJ45 modular adapters'. All of this is routine and I basically don't think about it, so until I recently read Taking This Serially it didn't occur to me to wonder what pin was left out in this process.

As Taking This Serially covers, DB-9 (really DE-9) has nine pins, all of them used for serial signals:

  1. Data carrier detect (DCD)
  2. Rx
  3. Tx
  4. Data terminal ready (DTR)
  5. Ground
  6. Data set ready (DSR)
  7. Request to send (RTS)
  8. Clear to send (CTS)
  9. Ring indicator

Ethernet cables and RJ-45 connectors have at most eight wires that you can use, so one of the remaining DB-9 serial signals must be dropped (or otherwise altered). If you consult online descriptions of typical DB-9 to RJ-45 wiring, you will wind up at the obvious answer: RJ45 serial connections typically drop the 'Ring indicator'.

The actual situation is rather complicated. There are multiple different ways to connect the 8 RJ45 wires to the nine DB-9 wires, and only some of the ones covered on Wikipedia drop Ring Indicator. One, EIA/TIA-561, combines it with DSR. Or at least that's how EIA/TIA-561 is described on Wikipedia and some sources; others don't mention DSR at all. And for the signals that are passed through, each different way has its own set of pin assignments for what signal is on what RJ45 pin. Or your hardware may be different from all of the options listed on Wikipedia, as Digi's online documentation suggests is the case for our Etherlites.

(Having looked this up, I now understand why we always buy the 'wire your own' DB-9 to RJ-45 modular connectors instead of the prewired ones. As always, the problem with serial connections is that there are so many different standards.)

tech/RJ45SerialMissingSignal written at 19:03:32; Add Comment

2021-03-02

The Python ctypes security issue and Python 2

In the middle of February, the Python developers revealed that Python had been affected by a buffer overflow security issue, CVE-2021-3177. The relatively full details are covered in ctypes: Buffer overflow in PyCArg_repr, and conveniently the original bug report has a very simple reproduction that can also serve to test your Python to see if it's fixed:

$ python2
Python 2.7.17 (default, Feb 25 2021, 14:02:55) 
>>> from ctypes import *
>>> c_double.from_param(1e300)
*** buffer overflow detected ***: python2 terminated
Aborted (core dumped)

(A fixed version will report '<cparam 'd' (1e+300)>' here.)

The official bug report only covers Python 3, because Python 2.7 is not supported any more, but as you can see here the bug is present in Python 2 as well (this is the Ubuntu 18.04 version, which is unfixed for reasons).

I'm on record as saying that it was very unlikely for security issues to be discovered in Python 2 after this long. Regardless of how significant this issue is in practice, I was and am wrong. A buffer overflow has lurked in the standard Python library, including Python 2, and was only discovered after official Python support for Python 2 has stopped. There have been other recent security issues in Python 3, per Python security vulnerabilities, and some of them may also apply to Python 2 and be significant for you.

(Linux distributions are still fixing issues like this in Python 2. Well, more or less. Ubuntu hasn't worked out a successful fix for 18.04 and hasn't even tried one for 20.04, but Fedora has fixed the issue.)

This CVE is not an issue for our Python 2 code, where we don't use ctypes. But it does make me somewhat more concerned about our remaining Python 2 programs, for the simple reason that I was wrong about one of my beliefs about Python 2 after its end of support. To use a metaphor, what I thought was a strong, well-inspected pillar has turned out to have some previously unnoticed cracks of a sort that matter, even if they've not yet been spotted in an area that's load-bearing for us. Also, now I should clearly be keeping an eye on Python security issues and testing new ones (if possible) to see if they apply to Python 2. If they do, we'll need to explicitly consider what programs of ours might be affected.

(The answer is often likely to be 'no programs are affected'. but we can no longer take for granted that the issues are not serious and don't affect Python 2 or us.)

As far as the severity of this issue goes, on the one hand buffer overruns are quite bad, but on the other hand this is in what is a relatively obscure corner of Python for most people. This is not the sort of Python security issue that would let people break ordinary Python 2 programs (and I still think that those are very unlikely by now). But I'm a bit biased here, since we're not going to drop everything and port all of our remaining Python 2 programs to Python 3 right now (well, not unless we absolutely have to).

(People's views of the severity may vary; these are just mine.)

PS: To be explicit, this issue has not changed my view that it's reasonable (and not irresponsible) to continue running Python 2 programs and code. This is not a great sign for people who use ctypes, but it's not a fatal vulnerability or a major problem sign.

python/CTypesSecurityIssue written at 22:13:17; Add Comment

2021-03-01

Link: Taking This Serially

j. b. crawford's Taking This Serially (via) is about the history of serial connections and RS-232, including the various connectors used and the complications that ensued with actually connecting things serially. I have a certain amount of history with serial connections, so I found this interesting.

(Also, looking at the article's pinouts for DE9/DB9 serial connectors made me realize something, but that's another blog entry.)

links/TakingThisSerially written at 23:53:16; Add Comment

The balance of power between distributions and software authors

In a comment on my entry on how modern software controls dependencies because it helps software authors, Peter Donis raised a good and interesting point:

I understand the issue you are describing for software authors, but as a software user, if bundling static dependencies instead of using shared libraries means security vulnerabilities in software I'm using take longer to get fixed, then I'll just have to stop using that software altogether. And for the vast majority of software I use, I don't get it directly from the author; I get it from my distro, so what matters to me is how easily the distro can fix issues and push updates to me.

And the distro maintainers know that, and they know that users like me are depending on the distros to get those vulnerabilities fixed and push updates in a timely manner; so if particular software programs make it hard for them to do that, they have an incentive to stop packaging those programs. It seems to me that, if push comes to shove, the distros have the upper hand here. [...]

My view is that there is a somewhat complicated balance of power involved between software authors and distributions, all starting with the people who are actually using the software. This balance of power used to be strongly with distributions, but various aspects of modern software has shifted it toward software authors.

As a person using software, I want to, well, use the software, with as little hassle and as few security issues, bugs, and other problems as possible. Like most people I'm broadly indifferent to how this is done, because my goal is to use the software. However, how much aggravation people will put up with to use software depends on how important it is to them, and I'm no exception to this.

(This balance is individual, not universal; different people and organizations have different needs and priorities.)

If the software is not all that critical or important to me, my threshold for aggravation is low. In fact if the aggravation level is too high I may not even use the software. Here the distribution mostly holds the balance of power, as in Peter Donis' comment, because obtaining software through the distribution is often much easier (although that's changing). If the distribution refuses to package a particular piece of software and it's not that important, I may well do without; if it only packages an old version, that version can be good enough. Software authors may need to play along with what the distributions want or be left out in the cold.

But if the software is important or critical to us, our threshold of aggravation is very high. Doing without or living with bugs fixed upstream in more recent versions becomes less and less of an option or is outright off the table. If we can use a distribution version we generally will, because that's less work, but if we have to we will abandon the distribution and work directly with the upstream version in one way or another. Here the software author holds the balance of power. The distributions can either go along with how the software author is working or be bypassed entirely.

(As an illustration of this power, we install the upstream versions of Prometheus and rspamd instead of the Ubuntu versions. And we use Grafana despite it not being packaged in Ubuntu 18.04.)

This means that one of factors in the balance of power is what sort of software it is, which goes a long way to determining how critical it is to people. The authors of a utility program usually have a lot less leverage than the authors of important services. Authors of important services are also much more exposed to people's anger if distributions make mistakes, as they do every so often, which gives these software authors some biases.

Modern software ecologies like Rust, Python, NPM, and Go have shifted this balance of power in an important way, especially for things like utility programs, because all of them provide a simple way for people to install a program straight from its authors, bypassing the distribution. I don't think that the ecologies are deliberately trying to make distributions obsolete, but they do have a strong motivation to remove the distributions as gatekeepers. No distribution will ever package all of the software that's written in any language, so the easier the language makes it to install software outside of a distribution, the more software written in it will be used and spread.

(These ecologies do somewhat less well at letting you easily keep the programs you've installed up to date, but I suspect that that will come with time. Also, pragmatically a lot of people don't really worry about that for a lot of programs, especially utility programs.)

Sidebar: The historical balance of power shift

In the old days, it was often difficult (and time consuming) to either compile your own version of something written in C (or C++) or obtain a working binary from upstream. For building, there was no standard build system, you often had to pick a bunch of install time options, and you might need to do this for a whole collection of dependencies (which you would need to install somewhere that building the main program could find). For binaries, you had to deal with shared library versions and ABI issues. Distributions did all of this work for you, and it was hard work.

Modern software environments like Rust, Go, NPM, and Python change both the binary and the build your own side of this. Rust and Go statically link everything by default and produce binaries that are often universal across, eg, Linux on 64-bit x86, making it basically trouble free to get the upstream one. For building your own, all of them have single commands that fetch and build most programs for you, including the right versions of all of their dependencies.

tech/SoftwareAndDistroPower written at 23:01:21; Add Comment

2021-02-28

Dot-separated DNS name components aren't even necessarily subdomains, illustrated

I recently wrote an entry about my pragmatic sysadmin view on subdomains and DNS zones. At the end of the entry I mentioned that we had a case where we had DNS name components that didn't create what I thought of as a subdomain, in the form of the hostnames we assign for the IPMIs of our servers. These names are in the form '<host>.ipmi.core.sandbox' (in one of our internal sandboxes), but I said that 'ipmi.core.sandbox' is neither a separate DNS zone nor something that I consider a subdomain.

There's only one problem with this description; it's wrong. It's been so long since I actually dealt with an IPMI hostname that I mis-remembered our naming scheme for them, which I discovered when I needed to poke at one by hand the other day. Our actual IPMI naming scheme puts the 'ipmi' bit first, giving us host names of the form 'ipmi.<host>.core.sandbox' (as before, for the IPMI for <host>; the host itself doesn't have an interface on the core.sandbox subnet).

What this naming scheme creates is middle name components that clearly don't create subdomains in any meaningful sense. If we have host1, host2, and host3 with IPMIs, we get the following IPMI names:

ipmi.host1.core.sandbox
ipmi.host2.core.sandbox
ipmi.host3.core.sandbox

It's pretty obviously silly to talk about 'host1.core.sandbox' being a subdomain, much more so than 'ipmi.core.sandbox' in my first IPMI naming scheme. These names could as well be 'ipmi-<host>'; we just picked a dot instead of a dash as a separator, and dot has special meaning in host names. The 'ipmi.core.sandbox' version would at least create a namespace in core.sandbox for IPMIs, while this version has no single namespace for them, instead scattering the names all over.

(The technicality here is DNS resolver search paths. You could use 'host1.core.sandbox' as a DNS search path, although it would be silly.)

PS: Tony Finch also wrote about "What is a subdomain?" in an entry that's worth reading, especially for historical and general context.

sysadmin/SubdomainsAndDNSZonesII written at 22:39:58; Add Comment

2021-02-27

My Firefox addons as of Firefox 86 (and the current development version)

I was recently reminded that my most recent entry on what Firefox addons I use is now a bit over a year old. Firefox has had 14 releases since then and it feels the start of January 2020 was an entirely different age, but my Firefox addons have barely changed in the year and a bit since that entry. Since they have updated a very small amount, I'll repeat the whole list just so I have it in one spot for the next time around.

My core addons, things that I consider more or less essential for my experience of Firefox, are:

  • Foxy Gestures (Github) is probably still the best gestures extension for me for modern versions of Firefox (but I can't say for sure, because I no longer investigate alternatives).

    (I use some custom gestures in my Foxy Gestures configuration that go with some custom hacks to my Firefox to add support for things like 'view page in no style' as part of the WebExtensions API.)

  • uBlock Origin (Github) is my standard 'block ads and other bad stuff' extension, and also what I use for selectively removing annoying elements of pages (like floating headers and footers).

  • uMatrix (Github) is my primary tool for blocking Javascript and cookies. uBlock Origin could handle the Javascript, but not really the cookies as far as I know, and in any case uMatrix gives me finer control over Javascript which I think is a better fit with how the web does Javascript today.

  • Cookie AutoDelete (Github) deals with the small issue that uMatrix doesn't actually block cookies, it just doesn't hand them back to websites. This is probably what you want in uMatrix's model of the world (see my entry on this for more details), but I don't want a clutter of cookies lingering around, so I use Cookie AutoDelete to get rid of them under controlled circumstances.

    (However unaesthetic it is, I think that the combination of uMatrix and Cookie AutoDelete is necessary to deal with cookies on the modern web. You need something to patrol around and delete any cookies that people have somehow managed to sneak in.)

  • Stylus (Github) has become necessary for me after Google changed their non-Javascript search results page to basically be their Javascript search results without Javascript, instead of the much nicer and more useful old version. I use Stylus to stop search results escaping off the right side of my browser window.

Additional fairly important addons that would change my experience if they weren't there:

  • Textern (Github) gives me the ability to edit textareas in a real editor. I use it all the time when writing comments here on Wandering Thoughts, but not as much as I expected on other places, partly because increasingly people want you to write things with all of the text of a paragraph run together in one line. Textern only works on Unix (or maybe just Linux) and setting it up takes a bit of work because of how it starts an editor (see this entry), but it works pretty smoothly for me.

    (I've changed its key sequence to Ctrl+Alt+E, because the original Ctrl+Shift+E no longer works great on Linux Firefox; see issue #30. Textern itself shifted to Ctrl+Shift+D in recent versions.)

  • Cookie Quick Manager (Github) allows me to inspect, manipulate, save, and reload cookies and sets of cookies. This is kind of handy every so often, especially saving and reloading cookies.

The remaining addons I use I consider useful or nice, but not all that important on the large scale of things. I could lose them without entirely noticing the difference in my Firefox:

  • Open in Browser (Github) allows me to (sometimes) override Firefox's decision to save files so that I see them in the browser instead. I mostly use this for some PDFs and some text files. Sadly its UI isn't as good and smooth as it was in pre-Quantum Firefox.

    (I think my use of Open in Browser is fading away. Most PDFs and other things naturally open in the browser these days, perhaps because web sites have gotten grumpy feedback over forcing you to download them.)

  • Certainly Something (Github) is my TLS certificate viewer of choice. I occasionally want to know the information it shows me, especially for our own sites. The current Firefox certificate information display is almost as good as Certainly Something, but it's much less convenient to get to.

  • HTTP/2 Indicator (Github) does what it says; it provides a little indicator as to whether HTTP/2 was active for the top-level page.

  • ClearURLs (GitLab) is my current replacement for Link Cleaner after the latter stopped being updated. It cleans various tracking elements from URLs, like those 'utm_*' query parameters that you see in various places. These things are a plague on the web so I'm glad to do my little bit to get rid of them.

  • HTTPS Everywhere, basically just because. But in a web world where more and more sites are moving to using things like HSTS, I'm not sure HTTPS Everywhere is all that important any more.

As I've done for a long time now, I actually use the latest beta versions of uBlock Origin and uMatrix. I didn't have any specific reason for switching to them way back when; I think I wanted to give back a bit by theoretically testing beta versions. In practice I've never noticed any problems or issues.

I have some Firefox profiles that are for when I want to use Javascript (they actually use the official Mozilla Linux Firefox release these days, which I recently updated to Firefox 86). In these profiles, I also use Decentraleyes (also), which is a local CDN emulation so that less of my traffic is visible to CDN operators. I don't use it in my main Firefox because I'm not certain how it interacts with me blocking (most) Javascript setup, and also much of what's fetched from CDNs is Javascript, which obviously isn't applicable to me.

(There are somewhat scary directions in the Decentraleyes wiki on making it work with uMatrix. I opted to skip them entirely.)

web/Firefox86Addons written at 23:27:03; Add Comment

My pragmatic sysadmin view on subdomains and DNS zones

Over on Twitter, Julia Evans had an interesting poll and comment:

computer language poll: is mail.google.com a subdomain of google.com? (not a trick question, no wrong answers, please don't argue about it in the replies, I'm just curious what different people think the word "subdomain" means :) )

the ambiguity here is that mail.google.com doesn't have its own NS/SOA record. An example of a subdomain that does have those things is alpha.canada.ca -- it has a different authoritative DNS server than canada.ca does.

This question is interesting to me because I had a completely different view of it than Julia Evans did. For me, NS and SOA DNS records are secondary things when thinking about subdomains, down at the level of the mechanical plumbing that you sometimes need. This may surprise people, so let me provide a quite vivid local example of why I say that.

Our network layout has a bunch of internal subnets using RFC 1918 private IP address space, probably like a lot of other places. We call these 'sandbox' networks, and generally each research group has one, plus there are various other ones for our internal use. All of these sandboxes have host names under an internal pseudo-TLD, .sandbox (yes, I know, this is not safe given the explosion in new TLDs). Each different sandbox has a subdomain in .sandbox and then its machines go in that subdomain, so we have machines with names like sadat.core.sandbox and lw-staff.printers.sandbox.

However, none of these subdomains are DNS zones, with their own SOA and NS records. Instead we bundle all of the sandboxes together into one super sized sandbox. zone that has everything. One of the reasons for this is that we do all of the DNS for all of these sandbox subdomains, so all of those hypothetical NS and SOA records would just point to ourselves (and possibly add pointless extra DNS queries to uncached lookups).

I think most system administrators would consider these sandbox subdomains to be real subdomains. They are different namespaces (including for DNS search domains), they're operated by different groups with different naming policies, we update them separately (each sandbox has its own DNS file), and so on. But at the mechanical level of DNS zones, they're not separate zones.

But this still leaves a question about mail.google.com: is it a subdomain or a host? For people outside of Google, this is where things get subjective. A (DNS) name like 'www.google.com' definitely feels like a host, partly because in practice it's unlikely that people would ever have a <something>.www.google.com. But mail.google.com could quite plausibly someday have names under it as <what>.mail.google.com, even if it doesn't today. So to me it feels more like a subdomain even if it's only being used as a host today.

(People inside Google probably have a much clearer view of what mail.google.com is, conceptually. Although even those views can drift over time. And something can be both a host and a subdomain at once.)

Because what I consider a subdomain depends on how I think about it, we have some even odder cases where we have (DNS) name components that I don't think of as subdomains, just as part of the names of a group of hosts. One example is our IPMIs for machines, which we typically call names like '<host>.ipmi.core.sandbox' (for the IPMI of <host>). In the DNS files, this is listed as '<host>.ipmi' in the core.sandbox file, and I don't think of 'ipmi.core.sandbox' as a subdomain. The DNS name could as well be '<host>-ipmi' or 'ipmi-<host>', but I happen to think that '<host>.ipmi' looks nicer.

(What is now our IPMI network is an interesting case of historical evolution, but that's a story for another entry.)

sysadmin/SubdomainsAndDNSZones written at 00:59:34; Add Comment

2021-02-25

The HTTP Referer header is fading away (at least as a useful thing)

The HTTP Referer header on requests is famously misspelled (it should be Referrer), and also famously not liked because of privacy and security concerns. The privacy and security concerns are especially strong with external ('cross-origin') Referers, which is also the ones that many people find most useful because they tell you where visitors to your pages are coming from and let you find places where people have linked to you or are mentioning you.

I've been reading my Referer logs for essentially as long as Wandering Thoughts has existed, and over the years (and especially lately) it's become clear to me that the Referer header is fading away. Fewer requests have Referer headers, and many of the values that are there aren't all that useful (at least to me). Some of this is the general issue of social media and web applications, where most everything from a place like Twitter either has 'https://twitter.com/' (if the person came from web twitter) or nothing (if they came from a Twitter client). Others seem to be specific choices made by sites. For example, a lot of search engines now arrange things so that the Referer you see is their main URL and doesn't have any information on what the person searched for that led them to your pages.

(Probably an increasing number of people are also using browser extensions that block or spoof Referer, although I don't know if this is common.)

Referer is clearly going to fade away more in the future. This effort started with the Referrer-Policy header, which gave web server operators a simple way to mostly control the outbound Referer from any links on their web pages (without having to touch the HTML). Now the browsers are in the process of moving to a stricter default behavior, called 'strict-origin-when-cross-origin'; this sends only the origin (the website), omitting the path and the query string. A switch to this default would make almost all websites behave the way that Twitter and some search engines do (although for different reasons).

In theory web sites could set a Referrer-Policy to revert back to the current state of affairs once the browser default changes. In practice most websites will never change the default (and many of the ones that do might make it stricter, likely going to 'same-origin'). And someday browsers will probably ratchet privacy and security one step further, so that by default they only send Referer headers to the same site.

When that happens, I'll definitely miss the old days when I could see where my visitors were coming from. Discovering various articles and people who've linked to my entries has definitely made me happy and sometimes given me valuable information and feedback. But the arrow of time on the web points to more privacy, and there's not much we can do about that; the bad actors vastly outnumber the good ones.

web/FadingHTTPReferer written at 23:35:33; Add Comment

(Previous 10 or go back to February 2021 at 2021/02/24)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.