More potential problems for people with older browsers
I've written before that keeping your site accessible to very old browsers is non-trivial because of issues like them not necessarily supporting modern TLS. However, there's another problem that people with older browsers are likely to be facing, unless circumstances on the modern web change. I said on the Fediverse:
Today in unfortunate web browser developments: I think people using older versions of browsers, especially Chrome, are going to have increasing problems accessing websites. There are a lot of (bad) crawlers out there forging old Chrome versions, perhaps due to everyone accumulating AI training data, and I think websites are going to be less and less tolerant of them.
(Mine sure is currently, as an experiment.)
(By 'AI' I actually mean LLM.)
I covered some request volume information yesterday and it (and things I've seen today) strongly suggest that there is a lot of undercover scraping activity going on. Much of that scraping activity uses older browser User-Agents, often very old, which means that people who don't like it are probably increasingly going to put roadblocks in the way of anything presenting those old User-Agent values (there are already open source projects designed to frustrate LLM scraping and there will probably be more in the future).
(Apparently some LLM scrapers start out with honest User-Agents but then switch to faking them if you block their honest versions.)
There's no particular reason why scraping software can't use current User-Agent values, but it probably has to be updated every so often when new browser versions come out and people haven't done that so far. Much like email anti-spam efforts changing email spammer behavior, this may change if enough websites start reacting to old User-Agents, but I suspect that it will take a while for that to come to pass. Instead I expect it to be a smaller scale, distributed effort from 'unimportant' websites that are getting overwhelmed, like LWN (see the mention of this in their 'what we haven't added' section).
Major websites probably won't outright reject old browsers, but I suspect that they'll start throwing an increased amount of blocks in the way of 'suspicious' browser sessions with those User-Agents. This is probably likely to include CAPTCHAs and other such measures that they already use some of the time. CAPTCHAs aren't particularly effective at stopping bad actors in practice but they're the hammer that websites already have, so I'm sure they'll be used on this nail.
Another thing that I suspect will start happening is that more sites will start insisting that you run some JavaScript to pass a test in order to access them (whether this is an explicit CAPTCHA or just passive JavaScript that has to execute). This will stop LLM scrapers that don't run JavaScript, which is not all of them, and force the others to spend a certain amount of CPU and memory, driving up the aggregate cost of scraping your site dry. This will of course adversely affect people without JavaScript in their browser and those of us who choose to disable it for most sites, but that will be seen as the lesser evil by people who do this. As with anti-scraper efforts, there are already open source projects for this.
(This is especially likely to happen if LLM scrapers modernize their claimed User-Agent values to be exactly like current browser versions. People are going to find some defense.)
PS: I've belatedly made the Wandering Thoughts blocks for old browsers now redirect people to a page about the situation. I've also added a similar page for my current block of most HTTP/1.0 requests.
|
|