2020-12-06
The deprecation of FTP in browsers and its likely effects on search engines
One of the things going on in web browsers over time is that they're
in the process of removing support for FTP, for instance Firefox
once planned to do it this summer
and Chrome may already have removed it. The obvious
reason cited by Mozilla and Google for this is that use of ftp:
URLs is very uncommon in web browsers (and on the web), and the FTP
client implementation is a bunch of old code that must be carried
around just for this. Another reason is probably that the web as a
whole is increasingly moving to encrypted communications, and even
if FTP theoretically supports a TLS enabled version called FTPS, in practice only a vanishingly
small number of FTP sites actually support it.
As a sysadmin and someone who periodically goes digging for old documentation, I have some feelings and worries about this. The direct issue is that browsers are often one of the friendliest interfaces for digging through FTP sites; they offer convenient forward and backward navigation, visual display, and even multiple tabs (or windows). Terminal FTP clients (the general state of the art on Unix) are nowhere near as nice. However, this is the smaller of my concerns.
My larger concern is the issue of finding FTP sites, or finding that a FTP site has documentation I want. Generally I don't go to a FTP site and start hunting through it; instead, I do an Internet search and discover that some ancient thing on an old FTP site is the only source of what I want. Succeeding in these searches relies on the Internet search engines crawling and indexing FTP sites.
The major use of Internet search engines comes from browsers, and search engines are highly motivated to display only results that the browsers can actually use. If a browser can't use FTP URLs, a search engine has a reason to at least lower the priority of those URLs and may want to remove them entirely. As FTP URLs become lower and lower priority and get displayed less and less in results, search engines have less and less reasons to crawl them at all. And at the end of this process, I can no longer find old documentation on old FTP sites through web searches.
(As FTP sites stop being indexed, accessed, or usable in browsers, people also start running out of reasons to keep them operating. Many of the most valuable FTP sites for me are ones that are historical relics, and apparently survive primarily on benign neglect. Their contents are highly unlikely to be moved to HTTP sites; instead it's more likely that the contents will be discarded entirely.)
I don't expect this to happen imminently. It will probably take years before all of the infrastructure is turned off by some of the players, based on past experience. But I wouldn't be surprised if it's hard to do searches that return FTP URLs within five years, if not sooner.
Linux's hostname -s
switch is now safe for many people, but the situation is messy
Slightly over a decade ago I wrote an entry about our discovery
that 'hostname -s
' sometimes did DNS lookups,
depending on the exact version involved. We discovered this the
hard way, when our DNS lookups failed at one point and suddenly
'hostname -s
' itself started failing unexpectedly. We recently
had a reason to use 'hostname -s
' again, which caused me to
remember this old issue and check the current situation. The good
news is that common versions of hostname
now don't do DNS
lookups.
Well, probably, because it turns out that the Linux situation with
hostname
is much more complicated and tangled than I had any idea
before I started looking. It appears that there are no less than
four sources for hostname
, and which version you wind up using
can depend on your Linux. On top of that, the source you're probably
using is distributed in an unusual way that makes it hard for me
to say exactly when its 'hostname -s
' became safe. So let's
start with the basics.
If you check with 'rpm -qf /usr/bin/hostname
' or 'dpkg -S
/usr/bin/hostname
' on appropriate systems (Fedora, CentOS, Debian,
and Ubuntu), you will probably find that the version of hostname
you're using comes from a 'hostname' package. This package has no
upstream as such, and no source repository; the canonical source
seems to be the Debian package. Old versions of
its source can be found in its part of debsources. This version has
handled 'hostname -s
' correctly since somewhere between 2.95
(which doesn't) and 3.04 (which does).
(Based on the information shown in its part of debsources, hostname 2.95 was part of Debian 5.0 (Lenny), released in 2009, and hostname 3.04 was part of Debian 6.0 (Squeeze), released in 2011.)
Arch Linux seems to use a hostname
that comes
from the GNU inetutils
project. The relevant code
currently appears to do a DNS lookup if you use '-s
', but it will
proceed if the DNS lookup fails instead of erroring out (the way
the decade ago hostname
behaved). This does mean that under some
conditions your 'hostname -s
' command may stall for some time
while its DNS lookup times out, instead of running essentially
instantly.
The Linux manpages project
has two manpages online for hostname
(1, 2). The
default one is from net-tools,
and the other one is from GNU coreutils. The GNU Coreutils version
has no '-s
' option (or other commonly supported ones), and as a
result I would be surprised if many Linuxes used it. The net-tools
version is apparently the original upstream of the plain hostname
package version. Based on the Fedora 11 bug report about this, back a decade
ago Fedora was using the net-tools version of hostname
(I don't
know about Debian). The current net-tools version of hostname.c now
bypasses DNS lookups when used with '-s
', a change that was made
in 2015.
(While Fedora still packages net-tools, their package only has a few of its binaries. And apparently net-tools as a whole may be basically unmaintained; the last commits in the repository seem to be from 2018, and it was 2016 when it was particularly actively developed.)