Wandering Thoughts archives

2020-12-06

The deprecation of FTP in browsers and its likely effects on search engines

One of the things going on in web browsers over time is that they're in the process of removing support for FTP, for instance Firefox once planned to do it this summer and Chrome may already have removed it. The obvious reason cited by Mozilla and Google for this is that use of ftp: URLs is very uncommon in web browsers (and on the web), and the FTP client implementation is a bunch of old code that must be carried around just for this. Another reason is probably that the web as a whole is increasingly moving to encrypted communications, and even if FTP theoretically supports a TLS enabled version called FTPS, in practice only a vanishingly small number of FTP sites actually support it.

As a sysadmin and someone who periodically goes digging for old documentation, I have some feelings and worries about this. The direct issue is that browsers are often one of the friendliest interfaces for digging through FTP sites; they offer convenient forward and backward navigation, visual display, and even multiple tabs (or windows). Terminal FTP clients (the general state of the art on Unix) are nowhere near as nice. However, this is the smaller of my concerns.

My larger concern is the issue of finding FTP sites, or finding that a FTP site has documentation I want. Generally I don't go to a FTP site and start hunting through it; instead, I do an Internet search and discover that some ancient thing on an old FTP site is the only source of what I want. Succeeding in these searches relies on the Internet search engines crawling and indexing FTP sites.

The major use of Internet search engines comes from browsers, and search engines are highly motivated to display only results that the browsers can actually use. If a browser can't use FTP URLs, a search engine has a reason to at least lower the priority of those URLs and may want to remove them entirely. As FTP URLs become lower and lower priority and get displayed less and less in results, search engines have less and less reasons to crawl them at all. And at the end of this process, I can no longer find old documentation on old FTP sites through web searches.

(As FTP sites stop being indexed, accessed, or usable in browsers, people also start running out of reasons to keep them operating. Many of the most valuable FTP sites for me are ones that are historical relics, and apparently survive primarily on benign neglect. Their contents are highly unlikely to be moved to HTTP sites; instead it's more likely that the contents will be discarded entirely.)

I don't expect this to happen imminently. It will probably take years before all of the infrastructure is turned off by some of the players, based on past experience. But I wouldn't be surprised if it's hard to do searches that return FTP URLs within five years, if not sooner.

web/FTPDeprecationAndSearching written at 23:49:04; Add Comment

Linux's hostname -s switch is now safe for many people, but the situation is messy

Slightly over a decade ago I wrote an entry about our discovery that 'hostname -s' sometimes did DNS lookups, depending on the exact version involved. We discovered this the hard way, when our DNS lookups failed at one point and suddenly 'hostname -s' itself started failing unexpectedly. We recently had a reason to use 'hostname -s' again, which caused me to remember this old issue and check the current situation. The good news is that common versions of hostname now don't do DNS lookups.

Well, probably, because it turns out that the Linux situation with hostname is much more complicated and tangled than I had any idea before I started looking. It appears that there are no less than four sources for hostname, and which version you wind up using can depend on your Linux. On top of that, the source you're probably using is distributed in an unusual way that makes it hard for me to say exactly when its 'hostname -s' became safe. So let's start with the basics.

If you check with 'rpm -qf /usr/bin/hostname' or 'dpkg -S /usr/bin/hostname' on appropriate systems (Fedora, CentOS, Debian, and Ubuntu), you will probably find that the version of hostname you're using comes from a 'hostname' package. This package has no upstream as such, and no source repository; the canonical source seems to be the Debian package. Old versions of its source can be found in its part of debsources. This version has handled 'hostname -s' correctly since somewhere between 2.95 (which doesn't) and 3.04 (which does).

(Based on the information shown in its part of debsources, hostname 2.95 was part of Debian 5.0 (Lenny), released in 2009, and hostname 3.04 was part of Debian 6.0 (Squeeze), released in 2011.)

Arch Linux seems to use a hostname that comes from the GNU inetutils project. The relevant code currently appears to do a DNS lookup if you use '-s', but it will proceed if the DNS lookup fails instead of erroring out (the way the decade ago hostname behaved). This does mean that under some conditions your 'hostname -s' command may stall for some time while its DNS lookup times out, instead of running essentially instantly.

The Linux manpages project has two manpages online for hostname (1, 2). The default one is from net-tools, and the other one is from GNU coreutils. The GNU Coreutils version has no '-s' option (or other commonly supported ones), and as a result I would be surprised if many Linuxes used it. The net-tools version is apparently the original upstream of the plain hostname package version. Based on the Fedora 11 bug report about this, back a decade ago Fedora was using the net-tools version of hostname (I don't know about Debian). The current net-tools version of hostname.c now bypasses DNS lookups when used with '-s', a change that was made in 2015.

(While Fedora still packages net-tools, their package only has a few of its binaries. And apparently net-tools as a whole may be basically unmaintained; the last commits in the repository seem to be from 2018, and it was 2016 when it was particularly actively developed.)

linux/HostnameSwitchFine written at 00:44:36; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.