Wandering Thoughts archives

2020-02-26

The browsers are probably running the TLS show now

The news of the time interval is that Apple is limiting TLS certificate lifetimes to 398 days for certificates issued from September 1st onward (also, also). This effectively bypasses the CA/Browser Forum, where Google put forward a ballot on this in 2019 but couldn't get it passed (also). Specifically, it was voted down by a number of CAs; some CAs voted in favor, as did all browser vendors. Now Apple has decided to demonstrate who has actual power in this ecosystem and has simply put their foot down. What CAs want and how they voted is now irrelevant.

(Since Apple has led the way on this and all browser vendors want to do this, I expect Chrome, Firefox, and probably Microsoft Edge to follow before the end of the year.)

I wouldn't be surprised if other developments in TLS start happening this way (and if it was Apple driving them, because Apple is in some ways in the best political position to do this). At the same time it's worth noting that this is a change from how things used to be (as far as I know). Up until now, browser vendors have generally been fairly careful to build consensus and push CAs relatively lightly. If browser vendors are now going to be more aggressive about simply forcing CAs to do things, who knows what happens next.

At the same time, shortening the acceptable certificate validity period is the easiest change to force, because everyone can already issue and get shorter-lived certificates. The only way for a CA to not 'comply' with Apple's new policy would be to insist on issuing only long-lived certificates to customers against the wishes of the customers, and that's a great way to have the customers pack up and go to someone else. This is fundamentally different from a policy change that would require CAs to actively change their behavior, where the CAs could just refuse to do anything and basically dare the browser vendors to de-trust them all. On the third hand, Google more or less did force a behavior change by increasingly insisting on Certificate Transparency. Maybe we'll see more of that.

(And in a world with Let's Encrypt, most everyone has an alternative option to commercial CAs. At least right now, it seems unlikely that a browser vendor would try to force a change that LE objected to, partly because LE is now such a dominant CA. Just like browsers, LE is sort of in a position to put its foot down.)

BrowsersRunningTLSNow written at 00:25:31; Add Comment

2020-02-05

The drawback of having a dynamic site with a lot of URLs on today's web

Wandering Thoughts, this blog, is dynamically generated from an underlying set of entries (more or less). As is common with dynamically generated sites, which often have a different URL structure than static sites, it has a lot of automatically generated URLs that provide various ways of viewing and accessing the underlying entries, and of course it creates links to those URLs in various places. Once upon a time this was generally fine and I didn't think much about it. These days my attitude has changed and I'm increasingly thinking about how to reduce the number of these automatically generated links (and perhaps to remove some of the URLs themselves).

The issue is that on the modern web, everything gets visited (even things behind links flagged as nofollow, although I wish otherwise). This includes automatically generated pages that are either pointless duplication or actively useless, for example because they don't have any actual content. If you have a highly dynamic web site that generates a lot of these URLs and links to them, sooner or later you'll waste time generating and serving these pages to robots that don't really care.

All of that sounds nicely abstract, so let me be concrete. Every entry on Wandering Thoughts can potentially have comments, and long ago I decided that I should provide syndication feeds for comments (as well as entries) and then put in links to the relevant syndication feeds for any page at the bottom of it (pages that are directories have syndication feeds for entries and comments). Plenty of my entries don't have any comments and so the comment syndication feed for them is empty, and even for entries that do have comments the feed is almost entirely static (since new comments are rare, especially on old entries). But because there are links to all of these comment syndication feeds, they get periodically fetched by crawlers.

(To put concrete numbers on this, in the past ten days over 2300 different entries here have had their comment syndication feeds retrieved, a number of them repeatedly. Some of the fetching is from web spiders that admit it, some of it may be from people clicking on links out of curiosity, and some of it is certainly from people and software that are cloaking their real activity. Over ten days, this is not gigantic.)

Once upon a time I would have had no qualms about exposing all of these comment syndication feed links as the right thing to do even in the face of this. These days I'm not so sure. I'm not going to make the feeds themselves go away, but I am considering not putting in the links in at least some cases (for example, if there are no comments on the entry). That would at least reduce how many visible links Wandering Thoughts generate, and somewhat reduce the pointless crawling and indexing that people are doing.

(I've actually been quietly reducing the number of syndication feeds that Wandering Thoughts exposes for some time, but the previous cases have been easy ones. Interested parties can see the 'atomfeed-virt-only-*' settings in DWiki's configuration file.)

ManyURLsModernDrawback written at 22:59:33; Add Comment

2020-02-02

Some unusual and puzzling bad requests for my CSS stylesheet

Anyone who has looked at their web server's error logs knows that there's some weird stuff out there (as well as the straightforward bad stuff). Looking at the error logs for Wandering Thoughts recently turned up some people who apparently have unusual ideas of how to parse HTML to determine the URL for my CSS stylesheet.

Wandering Thoughts has what I think of as a standard <link> element in its <head> for my CSS stylesheet:

<link href="/~cks/dwiki/dwiki.css" rel="stylesheet" type="text/css"> 

Pretty much every browser in existence will parse this and request my CSS. What I saw in the error logs was this:

File does not exist: <path>/dwiki.css" rel="stylesheet" type="text

This certainly looks like the clients making this request took the entire contents of the <link> from the first quote to the very last one and decided it was the actual URL.

There's nothing particularly bad about what the sources of these badly parsed requests seem to have been doing; they seem to be reading entries here, at reasonable volumes (sometimes only one entry). Some but not all of them request the web server's favicon.

I'm puzzled about what the underlying source of these requests could be. I'm pretty sure that it's common to have CSS <link>s with the href to the CSS stylesheet not as the last (quoted) attribute, so any browser or browser-like thing that mis-parsed <link>s this way wouldn't work on a wide variety of sites, not just me. The obvious suspicion is that whatever is making the request doesn't actually care about the CSS and doesn't use it, making the bad parsing and subsequent request failure unimportant, but as mentioned the IPs making these requests don't show any signs of being up to anything bad.

The good news (if these are real people with real browsers) is that WanderingThoughts mostly doesn't depend on its CSS stylesheet. A completely unstyled version looks almost the same as the usual one (which is also good for people reading entries in a syndication feed reader).

Sidebar: A little more detail on the sources

I saw these requests from several IPs (although at different activity levels); at least one of the IPs was a residential cablemodem IP. They had several different user-agents, including at least:

Mozilla/5.0 (Linux; Android 8.1.0; Mi A2 Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91 Mobile Safari/537.36

Mozilla/5.0 (Linux; Android 9; SM-N960F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/76.0.3809.132 Mobile Safari/537.36

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/61.0.3163.98 Safari/537.36

(Of course all of this could be coming from a real browser that just cloaks its user-agent string to foil fingerprinting and tracking.)

I haven't tried to trawl my server logs to see if these particular user agents show up somewhere else.

BadlyParsedCSSRequests written at 02:09:46; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.