Wandering Thoughts

2021-03-05

How not to use Apache's RewriteRule directive in a reverse proxy

Recently we needed to set up a reverse proxy (to one of our user run web servers) that supported WebSocket for a socket.io based user application. Modern versions of Apache have a mod_proxy_wstunnel module for this, and you can find various Apache configuration instructions for how to use it on places like Stackoverflow. The other day I shot my foot off by not following these instructions exactly.

What I wrote was a configuration stanza that looked like this:

RewriteCond %{REQUEST_URI} ^/socket.io [NC]
RewriteCond %{QUERY_STRING} transport=websocket [NC]
# This is where my mistake is:
RewriteRule (.*) "ws://ourhost:port/$1" [P,L]

During some debugging I discovered that this was causing our main Apache server to make requests to the backend server that looked like 'GET //socket.io/.... HTTP/1.1'. The user's application was very unhappy with the leading double slash, as well it might be.

This is my old friend how not to use ProxyPass back in another form. The problem is that we aren't matching the leading slashes between the original path and the proxied path; we're taking the entire path of the request (with its leading /) and putting it on after another slash. The correct version of the RewriteRule, as the Apache documentation will show you, is:

RewriteRule ^/?(.*) "ws://ourhost:port/$1" [P,L]

In my example the '?' in the regular expression pattern is unnecessary since this rewrite rule can't trigger unless the request has a leading slash, but the mod_proxy_wstunnel version doesn't require such a match in its rewrite conditions. On the other hand, I'm not sure I want to enable 'GET socket.io/...' to actually work; all paths in GET requests should start with a slash.

PS: This is a ws: reverse proxy instead of a wss: reverse proxy because we don't support TLS certificates for people running user run web servers (they would be quite difficult to provide and manage). The virtual host that is reverse proxied to a user run web server can support HTTPS, and the communication between the main web server and the user run web server happens over our secure server room network.

Sidebar: How I think I made this mistake

We initially tried to get this person's reverse proxied environment working inside a <Location> block for their personal home page on our main server, where the public path was something like '/~user/thing/'. In this situation I believe that what would be the extra leading slash has already been removed by Apache's general matching, and so the first pattern would have worked. For various reasons we then shifted them over to a dedicated virtual host, with no <Location> block, and so suddenly the '(.*)' pattern was now scooping up the leading / after all.

ApacheProxyRewriteRule written at 23:57:44; Add Comment

2021-02-27

My Firefox addons as of Firefox 86 (and the current development version)

I was recently reminded that my most recent entry on what Firefox addons I use is now a bit over a year old. Firefox has had 14 releases since then and it feels the start of January 2020 was an entirely different age, but my Firefox addons have barely changed in the year and a bit since that entry. Since they have updated a very small amount, I'll repeat the whole list just so I have it in one spot for the next time around.

My core addons, things that I consider more or less essential for my experience of Firefox, are:

  • Foxy Gestures (Github) is probably still the best gestures extension for me for modern versions of Firefox (but I can't say for sure, because I no longer investigate alternatives).

    (I use some custom gestures in my Foxy Gestures configuration that go with some custom hacks to my Firefox to add support for things like 'view page in no style' as part of the WebExtensions API.)

  • uBlock Origin (Github) is my standard 'block ads and other bad stuff' extension, and also what I use for selectively removing annoying elements of pages (like floating headers and footers).

  • uMatrix (Github) is my primary tool for blocking Javascript and cookies. uBlock Origin could handle the Javascript, but not really the cookies as far as I know, and in any case uMatrix gives me finer control over Javascript which I think is a better fit with how the web does Javascript today.

  • Cookie AutoDelete (Github) deals with the small issue that uMatrix doesn't actually block cookies, it just doesn't hand them back to websites. This is probably what you want in uMatrix's model of the world (see my entry on this for more details), but I don't want a clutter of cookies lingering around, so I use Cookie AutoDelete to get rid of them under controlled circumstances.

    (However unaesthetic it is, I think that the combination of uMatrix and Cookie AutoDelete is necessary to deal with cookies on the modern web. You need something to patrol around and delete any cookies that people have somehow managed to sneak in.)

  • Stylus (Github) has become necessary for me after Google changed their non-Javascript search results page to basically be their Javascript search results without Javascript, instead of the much nicer and more useful old version. I use Stylus to stop search results escaping off the right side of my browser window.

Additional fairly important addons that would change my experience if they weren't there:

  • Textern (Github) gives me the ability to edit textareas in a real editor. I use it all the time when writing comments here on Wandering Thoughts, but not as much as I expected on other places, partly because increasingly people want you to write things with all of the text of a paragraph run together in one line. Textern only works on Unix (or maybe just Linux) and setting it up takes a bit of work because of how it starts an editor (see this entry), but it works pretty smoothly for me.

    (I've changed its key sequence to Ctrl+Alt+E, because the original Ctrl+Shift+E no longer works great on Linux Firefox; see issue #30. Textern itself shifted to Ctrl+Shift+D in recent versions.)

  • Cookie Quick Manager (Github) allows me to inspect, manipulate, save, and reload cookies and sets of cookies. This is kind of handy every so often, especially saving and reloading cookies.

The remaining addons I use I consider useful or nice, but not all that important on the large scale of things. I could lose them without entirely noticing the difference in my Firefox:

  • Open in Browser (Github) allows me to (sometimes) override Firefox's decision to save files so that I see them in the browser instead. I mostly use this for some PDFs and some text files. Sadly its UI isn't as good and smooth as it was in pre-Quantum Firefox.

    (I think my use of Open in Browser is fading away. Most PDFs and other things naturally open in the browser these days, perhaps because web sites have gotten grumpy feedback over forcing you to download them.)

  • Certainly Something (Github) is my TLS certificate viewer of choice. I occasionally want to know the information it shows me, especially for our own sites. The current Firefox certificate information display is almost as good as Certainly Something, but it's much less convenient to get to.

  • HTTP/2 Indicator (Github) does what it says; it provides a little indicator as to whether HTTP/2 was active for the top-level page.

  • ClearURLs (GitLab) is my current replacement for Link Cleaner after the latter stopped being updated. It cleans various tracking elements from URLs, like those 'utm_*' query parameters that you see in various places. These things are a plague on the web so I'm glad to do my little bit to get rid of them.

  • HTTPS Everywhere, basically just because. But in a web world where more and more sites are moving to using things like HSTS, I'm not sure HTTPS Everywhere is all that important any more.

As I've done for a long time now, I actually use the latest beta versions of uBlock Origin and uMatrix. I didn't have any specific reason for switching to them way back when; I think I wanted to give back a bit by theoretically testing beta versions. In practice I've never noticed any problems or issues.

I have some Firefox profiles that are for when I want to use Javascript (they actually use the official Mozilla Linux Firefox release these days, which I recently updated to Firefox 86). In these profiles, I also use Decentraleyes (also), which is a local CDN emulation so that less of my traffic is visible to CDN operators. I don't use it in my main Firefox because I'm not certain how it interacts with me blocking (most) Javascript setup, and also much of what's fetched from CDNs is Javascript, which obviously isn't applicable to me.

(There are somewhat scary directions in the Decentraleyes wiki on making it work with uMatrix. I opted to skip them entirely.)

Firefox86Addons written at 23:27:03; Add Comment

2021-02-25

The HTTP Referer header is fading away (at least as a useful thing)

The HTTP Referer header on requests is famously misspelled (it should be Referrer), and also famously not liked because of privacy and security concerns. The privacy and security concerns are especially strong with external ('cross-origin') Referers, which is also the ones that many people find most useful because they tell you where visitors to your pages are coming from and let you find places where people have linked to you or are mentioning you.

I've been reading my Referer logs for essentially as long as Wandering Thoughts has existed, and over the years (and especially lately) it's become clear to me that the Referer header is fading away. Fewer requests have Referer headers, and many of the values that are there aren't all that useful (at least to me). Some of this is the general issue of social media and web applications, where most everything from a place like Twitter either has 'https://twitter.com/' (if the person came from web twitter) or nothing (if they came from a Twitter client). Others seem to be specific choices made by sites. For example, a lot of search engines now arrange things so that the Referer you see is their main URL and doesn't have any information on what the person searched for that led them to your pages.

(Probably an increasing number of people are also using browser extensions that block or spoof Referer, although I don't know if this is common.)

Referer is clearly going to fade away more in the future. This effort started with the Referrer-Policy header, which gave web server operators a simple way to mostly control the outbound Referer from any links on their web pages (without having to touch the HTML). Now the browsers are in the process of moving to a stricter default behavior, called 'strict-origin-when-cross-origin'; this sends only the origin (the website), omitting the path and the query string. A switch to this default would make almost all websites behave the way that Twitter and some search engines do (although for different reasons).

In theory web sites could set a Referrer-Policy to revert back to the current state of affairs once the browser default changes. In practice most websites will never change the default (and many of the ones that do might make it stricter, likely going to 'same-origin'). And someday browsers will probably ratchet privacy and security one step further, so that by default they only send Referer headers to the same site.

When that happens, I'll definitely miss the old days when I could see where my visitors were coming from. Discovering various articles and people who've linked to my entries has definitely made me happy and sometimes given me valuable information and feedback. But the arrow of time on the web points to more privacy, and there's not much we can do about that; the bad actors vastly outnumber the good ones.

FadingHTTPReferer written at 23:35:33; Add Comment

2021-02-17

When browsers (or at least Firefox) send HTTP Basic Authentication headers

We're long term fans of using HTTP Basic Authentication in Apache, but while I know how to configure it in Apache (and even how to log out of it in Firefox), I haven't really looked into some of the finer details of how it works. In particular, until recently I hadn't looked into when the browser (or at least Firefox) sends the Authorization header in HTTP(S) requests and when it doesn't.

The simple story of how HTTP Basic Authentication (also) works is that when your browser requests a URL protected by Basic Authentication, Apache will answer with a HTTP 401 status and some additional headers. If your browser has relevant credentials cached, it will re-issue the HTTP request with an Authorization header added. If your browser doesn't have the credentials, it will prompt you for login information (in a process that's recently been improved) and then re-issue the request.

Of course this simple story would be rather bad for responsiveness, since it implies that the browser would make two HTTP requests for every URL protected by HTTP Basic Authentication (one without any authorization, which would get a 401, and then a retry with authorization). So browsers don't do that. Instead to some degree they treat the Authorization header like a cookie and preemptively send it along for at least some requests to your website. The question I was curious about was how broadly Firefox did that. Unfortunately for us, the answer is that Firefox doesn't send the Authorization header very broadly.

(This is an appropriate choice for security, of course.)

As far as I can tell from some simple experimentation, Firefox will preemptively send Authorization for any URL under a directory on your site where it's been challenged for HTTP Basic Authentication before (in the same Basic Authentication realm and so on). It won't preemptively send Authorization outside of the hierarchy under those directories. That's kind of abstract, so here's a concrete example.

Suppose I have a website with URLs (among others) of:

/grafana/
/grafana/d/overview/
/grafana/d/pingstatus/
/grafana/d/downhosts/
/alertmanager/
/statics/

All of these URLs other than the /statics/ hierarchy are protected by the same HTTP Basic Authentication, configured once for /grafana/ and everything underneath it and once for /alertmanager/ (and everything underneath it).

If I request the /grafana/d/overview/ dashboard in a clean session, I will get a 401 and then have to authenticate. If I then request the /grafana/d/pingstatus/ dashboard, Firefox will not preemptively send Authorization, because it's not in or under the first URL; instead it will get a 401 and then re-send the request. If I go to /grafana/ (the top level) Firefox will get a 401 again, but now if I go on to /grafana/d/downhosts/, Firefox will preemptively send Authorization because it's under a URL that Firefox has been challenged on.

(If /grafana/d/overview was a page instead of a directory, requesting /grafana/d/pingstatus afterward would preemptively send a Authorization header because they would both be under the /grafana/d/ directory.)

If I request /alertmanager/ or /statics/ after all of this, my Firefox won't send a preemptive Authorization because both of them are outside of /grafana/. Requesting /alertmanager/ without authentication will get a 401 and Firefox will resend the request with the Authorization header, but Firefox will never request /statics/ with an Authorization header. The /statics/ URL is outside of all HTTP Basic authentication directories and the web server itself will never reply with a 401 to trigger Firefox's sending of Authorization.

(If you want to think of it in cookie terms, I believe this is what would happen if the web server set a cookie with a Path= of the initial URL directory and then could add more paths as you clicked around the site.)

In HTTP, the server's HTTP 401 reply to the browser contains no (reliable) information that the browser can use to determine what URL hierarchy is covered by authentication. The HTTP server has no way of telling the browser 'this challenge is for all of /grafana/' (even though Apache knows that); it just gives 401s for all of those URLs when Firefox sends requests without an Authorization header. Eventually Firefox hopefully learns all of the URLs that need Basic Authentication that you (and Grafana) are actually using.

BasicAuthWhenSent written at 00:16:24; Add Comment

2021-02-07

Strict SameSite web cookie policies probably don't do much for us

I recently read The great SameSite confusion (via). To summarize badly, this article is about how it's easy to misunderstand exactly what the SameSite cookie attribute does due to a distinction that web security draws between the ideas of 'origin' and 'site'. The simple version of the difference is that 'origin' means the exact website (the same scheme, host, and port), while 'site' simply means the domain name (which is determined based on the public suffix list). For a lot of organizations this distinction may be relatively small. However, here at the University of Toronto, the distinction is very big. As a consequence, the effects of a strong SameSite policy (whether set explicitly or through browser defaults) is relatively modest for our websites.

The University of Toronto mostly uses the domain 'utoronto.ca'. However, we have a great many organizational units (faculties, departments, groups, and so on), and these mostly have websites with names that are either direct sub-names of utoronto.ca (such as 'utcc.utoronto.ca') or names in subdomains, such as 'www.utsc.utoronto.ca'. All of these are very different websites, run by very different groups with very different security policies and so on, but they're all the same 'site' as far as SameSite is concerned. As a result, even the strictest SameSite policy won't prevent cookies from leaking from one of these websites to another.

Fortunately, all is not lost for our (potential) desire to keep from leaking cookies between our different websites. Cookies have always had a basic limitation on what hosts they're sent to, as covered in the relevant MDN section. If someone with a website here sends a cookie without a Domain attribute, it won't leak outside of them and their sub-domains. To leak a cookie to all University of Toronto websites, you would have to explicitly set 'Domain=utoronto.ca'. Well, you or the framework you're using would have to do that, since people often don't manually set cookies. Hopefully there aren't many frameworks that default their cookie domains to the 'site' (as the web defines it).

(One corollary of this is that there is a lot of scope for 'same-site' requests in general on University of Toronto websites, for both good and, unfortunately, evil. If an attacker wants to find a place to put some content that will be same-site for an important UofT website, they're probably not going to have much trouble. There are even various subdomain names that look very similar to each other.)

SameSiteCookiesForUs written at 23:45:27; Add Comment

2021-01-18

Where Firefox's text encoding menus are

I recently read Text Encoding Menu in 2021 (via), on Firefox's menu (or menus) for controlling what text encoding a web page or other resource is interpreted in. The menu that Henri Sivonen described puzzled me, because it wasn't the text encoding menu that I'm familiar with and the text listed didn't initially match mine.

Firefox's 'Text Encoding' menu (or menus) lets you select or in some cases alter the text encoding of documents that the browser is displaying. In the past this was an important option; content was actually in all sorts of character sets, it wasn't necessarily served with character set information (or with the right information), browsers didn't necessarily do a good job of guessing or detecting character sets, and so on. These days a lot of content is in UTF-8 or is served with the correct character set information, and browsers have gotten better at detecting character sets on the remaining, so it is much less frequently needed or used (as the article covers).

(For the latter, see Henri Sivonen's chardetng: A More Compact Character Encoding Detector for the Legacy Web, which is honestly fascinating.)

Firefox has two versions of this menu. The first version is the conventional one, accessed through the regular menus at View → Text Encoding. The second version is accessed through an item icon that you can add after the URL bar, in the URL bar overflow menu, or with some careful dragging, add to the right side of the tab bar. This customization environment is accessed by right-clicking in either the tab bar or in the url bar outside the actual URL area and picking 'Customize'.

However, both versions are almost always going to be disabled, because changing text encoding is disabled if the server provided character set information and that's almost always the case these days. The View → Text Encoding menu is disabled by greying out the entire menu option. The Text Encoding icon version is disabled by not letting you change whatever the encoding is; you can bring up the menu but all of the other options are greyed out.

(If you are looking at a local file on Linux, the 'server' seems to be your locale setting if the file has no other information, such as being a plain text file or a HTML file with no character set declaration. This makes a certain amount of sense.)

The exact version of the Text Encoding menu that Henri Sivonen describes is only in Firefox Nightly right now, not in Firefox 84 (and it needs a recent version of Nightly). Firefox 84 lacks at least the 'Automatic' entry that Sivonen listed and there may be differences in behavior (I haven't cross-compared the menus or the behavior).

(This is one of those entries that I write because I got confused and dug into this. The perpetually greyed out View → Text Encoding menu was especially puzzling, since I managed to not notice that my Text Encoding icon was equally non-useful even though it would display a menu.)

FirefoxTextEncodingMenus written at 23:15:39; Add Comment

2021-01-17

Password managers automate checking the website address for you

I recently read Terence Eden's That’s not how 2FA works (via). In it, there's an insight I hadn't realized about what password managers do for you:

A password manager stores your passwords. But it also stores the web address of site’s login page. If you visit githud, the password manager won’t prompt you to use the login details for github.

Let's rephrase this: password managers automate checking the website's URL. We all know that we should check the URL to make sure we're really on Github or Twitter or wherever before we enter our login and password, but we don't always do that and even when we try, humans are really bad at seeing the one exception in a thousand normal cases.

(Unless we're very lucky, we see what we expect to see and we expect to see the usual URL and website name. This is true even on browsers that still show the full URL instead of some shortened or abstracted version or a name from the TLS certificate.)

As part of remembering long passwords for you, password managers do the thing that computers are so good at; they automate this check so that it's always done and reliable. They also do it in the best way possible for this sort of security, because it's not an extra check, it's an inherent part of their password lookup. Since it's not an extra check, there's no 'are you sure you want to' option (that people will always say yes to) and it's easy to explain to people why they're not getting to log in to where they expect.

I hadn't thought of this aspect of password managers before now. I'd always thought of them just as a way to remember my long random passwords, without associating this with verifying the website.

(Well, every so often I got reminded of the website matching side, on the rare occasions that websites changed their login subdomain and procedure so much that my browser's memorized passwords no longer matched and I had to fix it. But websites seem to do that much less these days, perhaps because so many people are using password managers and so get irritated at websites that break them.)

PasswordManagersAlwaysCheck written at 23:10:33; Add Comment

2021-01-03

The modern web and (alleged) software stagnation in the past few decades

I was recently reading The Great Software Stagnation (via), which puts forward a simple thesis:

Software is eating the world. But progress in software technology itself largely stalled around 1996. [...]

I have a number of reactions to that, but one of them is that one specific yet obvious area of software technology has progressed hugely in the last 24 years, or even the last ten, and that is the 'application' web (which these days is not just Javascript but also CSS and HTML features that allow interactivity, animation, and so on). What you can do with the web today is quietly astounding, not just for the web but at all.

Back in 1996, software technology might have allowed you to build a global, high detail map as an application that was delivered on CD-ROM (not DVD, not in 1996). But you definitely wouldn't have been able to have that map on almost any device, and have it updated frequently, and have high resolution satellite views of much of the west included (and probably not all of the map details, either). Nor would you probably have been able to include interactive, highly responsive route planning, including for bicycling.

(If you consider the backend systems for this web application as well, much of the software technology necessary to operate them likely postdates 1996 as well.)

Maps are everyone's go-to example of web application technology, but I have another one that is much closer to home for me. Here in 2021, I can easily deliver to my co-workers (in a very small organization) a whole set of custom monitoring dashboards with custom graphs, information tables, and other visualization displays that I can update frequently and that are available on basically any computer you care to name (this would be our Grafana dashboards). There's an entire ecology of software technologies that enables all of this, and almost none of them existed in 1996 in any meaningful form.

(I will argue that not even Javascript existed in 1996 in meaningful form; the Javascript of 1996 is significantly different from the Javascript of these past five years or so.)

Could you have theoretically done this in 1996? Yes. Could I have practically done this in 1996? No. The web's software technologies have made it possible to build this and the sea change in the viability of the web itself has made it possible to deliver this (including ongoing updates to how the dashboards work, adding new dashboards, and so on).

(There were monitoring dashboards in 1996, and I know the university had some of them, watched by operators in our central machine room. But they were not delivered over the web, and I'm pretty certain they were very expensive enterprise software and much more time consuming (and expensive) to customize and operate than our setup.)

These are not the only web applications that more or less couldn't have existed in 1996 in any form. Even some ostensibly relatively plain websites could not have existed with 1996 software technology even if you gave them 2020 hardware technology, because of their sheer scope. People have been talking to each other over the Internet for a long time (as I'm very familiar with), but Twitter's global scale and activity create a whole new set of problems that require post-1996 software technology to deal with, often in areas that are genuinely new.

(Much of this software technology is less obviously sexy than new languages with new models of programming. But it's also quite sophisticated and represents real progress in the state of the art in things like distributed consensus and large scale data stores.)

In looking at all of this, I'm strongly reminded of another article I read recently, Dan Luu's Against essential and accidental complexity. This basically takes the other side of the 'things have stalled' argument by walking through some drastic changes in programmer productivity over the past several decades. Dan Luu's starting point is roughly 1986 for reasons covered in the article, but many of the changes Luu points to are from after 1996.

PS: Another web-related area that software technology has made huge strides in since 1996 is almost everything related to cryptography. My strong impression is that much of this progress has been driven by the existence of the HTTPS web, due to the web being where most cryptography is used (or more broadly, TLS, which is driven by the web even if it's used beyond it).

WebVsSoftwareStagnation written at 01:38:11; Add Comment

2020-12-14

Chrome is getting its own set of Certificate Authority roots

The news of the semi-recent time interval is that Google has decided that Chrome will have its own program to decide what CA root certificates it trusts, the Chrome Root Program. Up until this decision, Chrome has mostly or entirely used the system root certificate store, at least on Windows and macOS. This has made it the odd browser out among the four remaining major browsers; Firefox uses its own CA root set, while Safari and Microsoft Edge are basically co-developed with their OSes and presumably have major input on what is in the macOS and Windows CA root sets.

(The exception is Chrome on iOS, where Google is forced to use the iOS root store because of how all web browsers have to operate there.)

Google is operating this root program independently of Mozilla, but will apparently be reusing parts of the work Mozilla does in public for Firefox's CA root program. This isn't really surprising; each project can be expected to make decisions that fit its particular circumstances. In practice I would expect major CA root certificates to be in both programs unless something terrible happens, such as a CA probably needs to be de-trusted.

There is one important but non-obvious consequence of Chrome's shift here. Like Mozilla, the Chrome Root Program specifically requires CAs to report incidents to Google; failure to report can result in removal from the Chrome Root Program and thus your certificates stopping working in Chrome. In the past, CAs might have decided to play fast and loose with Mozilla's reporting requirements, on the grounds that Firefox is a small percentage of the browser market and they could let it slide. Chrome has more influence and power here and so represents a bigger stick.

(Apple and Microsoft probably have reporting requirements, but I suspect they are less hard-assed about it than Mozilla is. I suspect Chrome is going to be as hard-assed as Mozilla is.)

Another possible effect is on Android in the future. My understanding is that these days, Chrome is updated on Android devices independently of the Android OS version. Since Chrome will now pull in its own set of CA roots, it presumably won't have to care about whether or not Android's set of roots are out of date (generally because the device is using an old Android because it doesn't get updates any more). Since we're facing an impending doom of this, I can understand Chrome wanting to mitigate this as fast as possible.

(There's also the delicate issue of not trusting old versions of OS provided TLS libraries to do certificate verification properly in a world of cross signed certificates, multiple certificate chains, and so on. We saw a bunch of problems with that in the AddTrust External CA Root expiry. The more you roll your own TLS certificate verification code, as Chrome is doing, I suspect that the more you want to control CA root certificates and the data you keep about them.)

ChromeOwnCARoots written at 01:38:01; Add Comment

2020-12-06

The deprecation of FTP in browsers and its likely effects on search engines

One of the things going on in web browsers over time is that they're in the process of removing support for FTP, for instance Firefox once planned to do it this summer and Chrome may already have removed it. The obvious reason cited by Mozilla and Google for this is that use of ftp: URLs is very uncommon in web browsers (and on the web), and the FTP client implementation is a bunch of old code that must be carried around just for this. Another reason is probably that the web as a whole is increasingly moving to encrypted communications, and even if FTP theoretically supports a TLS enabled version called FTPS, in practice only a vanishingly small number of FTP sites actually support it.

As a sysadmin and someone who periodically goes digging for old documentation, I have some feelings and worries about this. The direct issue is that browsers are often one of the friendliest interfaces for digging through FTP sites; they offer convenient forward and backward navigation, visual display, and even multiple tabs (or windows). Terminal FTP clients (the general state of the art on Unix) are nowhere near as nice. However, this is the smaller of my concerns.

My larger concern is the issue of finding FTP sites, or finding that a FTP site has documentation I want. Generally I don't go to a FTP site and start hunting through it; instead, I do an Internet search and discover that some ancient thing on an old FTP site is the only source of what I want. Succeeding in these searches relies on the Internet search engines crawling and indexing FTP sites.

The major use of Internet search engines comes from browsers, and search engines are highly motivated to display only results that the browsers can actually use. If a browser can't use FTP URLs, a search engine has a reason to at least lower the priority of those URLs and may want to remove them entirely. As FTP URLs become lower and lower priority and get displayed less and less in results, search engines have less and less reasons to crawl them at all. And at the end of this process, I can no longer find old documentation on old FTP sites through web searches.

(As FTP sites stop being indexed, accessed, or usable in browsers, people also start running out of reasons to keep them operating. Many of the most valuable FTP sites for me are ones that are historical relics, and apparently survive primarily on benign neglect. Their contents are highly unlikely to be moved to HTTP sites; instead it's more likely that the contents will be discarded entirely.)

I don't expect this to happen imminently. It will probably take years before all of the infrastructure is turned off by some of the players, based on past experience. But I wouldn't be surprised if it's hard to do searches that return FTP URLs within five years, if not sooner.

FTPDeprecationAndSearching written at 23:49:04; Add Comment

(Previous 10 or go back to December 2020 at 2020/12/01)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.