Some unusual and puzzling bad requests for my CSS stylesheet

February 2, 2020

Anyone who has looked at their web server's error logs knows that there's some weird stuff out there (as well as the straightforward bad stuff). Looking at the error logs for Wandering Thoughts recently turned up some people who apparently have unusual ideas of how to parse HTML to determine the URL for my CSS stylesheet.

Wandering Thoughts has what I think of as a standard <link> element in its <head> for my CSS stylesheet:

<link href="/~cks/dwiki/dwiki.css" rel="stylesheet" type="text/css"> 

Pretty much every browser in existence will parse this and request my CSS. What I saw in the error logs was this:

File does not exist: <path>/dwiki.css" rel="stylesheet" type="text

This certainly looks like the clients making this request took the entire contents of the <link> from the first quote to the very last one and decided it was the actual URL.

There's nothing particularly bad about what the sources of these badly parsed requests seem to have been doing; they seem to be reading entries here, at reasonable volumes (sometimes only one entry). Some but not all of them request the web server's favicon.

I'm puzzled about what the underlying source of these requests could be. I'm pretty sure that it's common to have CSS <link>s with the href to the CSS stylesheet not as the last (quoted) attribute, so any browser or browser-like thing that mis-parsed <link>s this way wouldn't work on a wide variety of sites, not just me. The obvious suspicion is that whatever is making the request doesn't actually care about the CSS and doesn't use it, making the bad parsing and subsequent request failure unimportant, but as mentioned the IPs making these requests don't show any signs of being up to anything bad.

The good news (if these are real people with real browsers) is that WanderingThoughts mostly doesn't depend on its CSS stylesheet. A completely unstyled version looks almost the same as the usual one (which is also good for people reading entries in a syndication feed reader).

Sidebar: A little more detail on the sources

I saw these requests from several IPs (although at different activity levels); at least one of the IPs was a residential cablemodem IP. They had several different user-agents, including at least:

Mozilla/5.0 (Linux; Android 8.1.0; Mi A2 Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91 Mobile Safari/537.36

Mozilla/5.0 (Linux; Android 9; SM-N960F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/76.0.3809.132 Mobile Safari/537.36

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/61.0.3163.98 Safari/537.36

(Of course all of this could be coming from a real browser that just cloaks its user-agent string to foil fingerprinting and tracking.)

I haven't tried to trawl my server logs to see if these particular user agents show up somewhere else.


Comments on this page:

By Ricky at 2020-02-03 10:07:33:

It's a stab in the dark, but maybe something doesn't like the tilde for some reason? Is there some language where a tilde introduces some sort of special mode in string parsing? The requested path isn't actually the entire remainder of the "link" tag, as it cuts off at the slash in "text/css", which is why I think it might be some sort of parsing issue...

By SteveB at 2020-02-04 06:29:35:

Could this be the result of a script trying to parse HTML with REs?

They try parsing the link tag with something naive like

   /<link href="(.*)">/

and maybe it works for some simple cases, so they start using it to read the whole intarwebs.

Written on 02 February 2020.
« Finding out what directories exist with only basic shell builtins (a Unix shell trick)
What we do to enable us to grow our ZFS pools over time »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 2 02:09:46 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.