Wandering Thoughts

2023-05-19

The long life of Apache httpd 2.4

Here's something that hadn't struck me until I looked it up for this entry: Apache 2.4 is now more than ten years old. The first 2.4 release was made in early 2012 (per Wikipedia), and despite being that old, 2.4 remains the current version of the Apache web server (with 2.4.57 released in early April of this year, 2023). There is a '2.5' in development which will be released as 2.6, someday, but the current trunk 2.5/2.6 changes are somewhat underwhelming.

This isn't entirely new in the Apache (httpd) project, since the initial releases of 2.2 and 2.4 were almost seven years apart, but it still feels like we've been using Apache 2.4 forever and are going to keep on using it for the foreseeable future. This isn't particularly a bad thing; for example, I've certainly got a lot of use over the years out of Apache 2.4 expertise. But it does feel a little bit peculiar that such a core part of the web has stayed so stable for so long (I know that Apache is no longer the trendy web server, but we love it and I think it's still reasonably commonly used).

That Apache 2.4 is having a long life hasn't prevented the Apache project from advancing 2.4's capabilities over time, certainly in Apache modules and I believe in the core as well. One example of an addition to 2.4 is WebSocket support, which Wikipedia says was added in 2.4.5 (and actually released in 2.4.6). Another case is HTTP/2, with mod_http2 added in 2.4.17. If Apache can keep evolving this way to support future web changes like HTTP/3 (assuming there's enough demand for them in Apache in the first place), it may be a long time before we see an Apache 2.6.

I don't think this is a bad thing, but it does seem like an unusual one. In a world where software versions churn on a regular basis, Apache 2.4 stands out as an unusually long-lived thing.

(Of course, version numbers and their meanings are somewhat arbitrary. Apache 2.4.0 from 2012 is not really the same piece of software as Apache 2.4.57 from 2023. Consider the changes in Multi-Processing Modules (MPM)s over 2.4's lifetime so far.)

Apache24LongLife written at 22:14:11; Add Comment

2023-05-12

The modern browser experience has some impressive subtle tricks

My current mouse has additional 'rocker' buttons, which Firefox (and probably other web browsers) map to forward and back one page, just like the keyboard shortcuts of Alt + Right Arrow and Alt + Left Arrow. Over the time I've become completely acclimatized to using them and having them just work, so it took a while for me to consciously notice a surprising situation where the 'back' rocker button worked.

A common UI for certain sorts of websites today is one where images are embedded as thumbnails and pop out to more or less full size in your browser when you click on them. This is the typical experience on Mastodon, for example. Once you've finished viewing the image, you can dismiss it back in a variety of ways; hitting Escape is one de facto standard way. Another way that works on many websites is my 'back' rocker button, which is an action that feels so natural to do and to work that it took me until recently to realize how odd it was that it works.

It would be natural for all forms of 'back one page' to work if popping out images this way took you to a different page, with a change of the URL in the URL bar and so on. But typically it doesn't, with the URL remaining unchanged, yet 'back' works anyway (and usually 'forward' won't work after you go back). So these websites are arranging to intercept the 'back' action or things that invoke it and do special handling, and the special handling comes across as so naturally the right thing to happen that I didn't even think about it until recently.

Of course, since this is the website doing it, different websites can have slightly different behavior (including a few websites who actually do change the visible URL). When there are multiple thumbnail images in a single post, it seems to be common to have the arrow keys cycle through them and for the 'back' action to exit this image mode entirely, instead of going to the previous image. On some websites, Alt + Left Arrow doesn't work although the 'back' browser button does.

But despite differences in specific behavior, the whole experience is natural enough that it feels irritating when a website with such pop-out images doesn't support 'back' to go back to what I was doing, and instead forces me to hit Escape or click their little 'close' button on the image. It may be magic but it's good magic.

(I assume that there are reasons this is only rarely implemented with an actual change to the visible URL.)

BrowsersBackImpressiveTricks written at 22:01:02; Add Comment

2023-04-26

Putting the 'User-Agent' in your web crawler's User-Agent

In the "that's not how you do it" category, here are two HTTP User-Agent values that I saw on Wandering Thoughts recently:

User-Agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0

That's right, these User-Agents have 'User-Agent' in them (at the start). This is not exactly a new development in crawler user-agents, since I saw it long ago when Wandering Thoughts was new, and possibly even before then (but if so, I didn't bother writing it down).

One option is that this is people being a little bit unclear on the concept of what should go in a User-Agent string. Another option, brought to mind by the 'User-Agent=' version, is that some crawler software is confusing about how you should configure the user-agent it will use, such that people are taking a pure string field as something that needs a key=value form (or maybe a 'label: value' form, for the second user agent). Since a pure string configuration field generally accepts either other versions, these people's configurations 'work' in the sense that the software runs.

In the traditional way of software configuration, the people running the software could be copying examples around via superstition. This would make extra sense for the first user-agent, since Firefox 12.0 has not been an even vaguely likely actual browser for just over a decade (it was apparently released April 24, 2012, which is more recently than I would have thought).

(Because both of these are obviously forged user-agents, I've followed my usual practice and arranged to block them from further access to Wandering Thoughts. In fact I've gotten around to blocking access for all user-agents that start with 'User-Agent'. Not that I expect it to make any real difference in the rain of stealth crawlers that poke at things here, but one does what one can, or at least what one feels grumpy about.)

UserAgentInUserAgent written at 22:49:59; Add Comment

2023-04-05

Giving Firefox the tiniest URL bar bookmark 'star' button possible

Recently over on the Fediverse I mentioned I'd broken Ctrl-D to add new bookmarks in my Firefox:

TIL that I accidentally broke bookmarking on my primary Firefox. This is actually convenient because the only time I bookmark something on it is when I absentmindedly used Ctrl-D to try to close the window/tab. (My terminals close with Ctrl-D, my Firefox windows with Ctrl-W, I have a bunch of both, yes life is fun.)

One of the irritating ongoing changes in Firefox over the past N years is Firefox's slow but steady encroachment on the space in the URL bar. One of those things is the 'star' button for bookmarking the current page (or unbookmarking it), which Mozilla added at one point and doesn't let you remove, even if you don't use bookmarking and want the space so you can actually see more of the page's URL. So after grousing about it, I wound up removing the star button in my custom Firefox build, through the means of commenting out the entire snippet of browser 'HTML' that defines it. It turns out that this breaks Ctrl-D, apparently because the normal pop-up bookmark dialog wants to manipulate the state of the star. With no star element at all to manipulate, Firefox starts raising internal Javascript errors.

I made a series of attempts to deal with this. First I patched in Javascript 'try { ... }' sections around the obvious bits that were raising errors. This made Ctrl-D work too well; it would flash up the normal dialog very briefly but not wait for input (ie, for me to cancel the bookmark). Next I tried having the star button and its box be present but be either 'display: none' or zero width. That didn't work either, with the same symptoms that Ctrl-D worked too well. At this point I almost gave up but my stubbornness led me to the next obvious step of a having the tiniest possible star button. It turned out that this worked, and the tiniest star button (well, button box) is 1px in size (I believe these are CSS pixels, so scaled by your HiDPI scaling factor if this is applicable).

After some refinement, the CSS that I wound up with is, in userChrome.css format (cf):

#star-button-box {
  width: 0;
  max-width: 0;
  min-width: 0;
  border-width: 0;
  padding: 0px 0px 0px 1px !important;
  margin: 0;
}

(I've verified that this works in userChrome.css, although in my own build I did it in browser/base/content/browser.css, and I innocently used 'padding: 1px;' so that version gave me a 2px bookmark star. The difference between 2px and 1px isn't much, so I may keep it as 2px so I have a bit more of a reminder that there is something there. Some of these 0 settings may be overkill.)

One of my overall lessons from this is that I should try to do more personal Firefox UI changes with CSS instead of with my historical approach of brute force, ie commenting out sections of the UI's definitions. And probably I should do more of these as userChrome.css changes instead of source code changes. Just because I have a hammer doesn't mean I need to use it for everything.

FirefoxTiniestBookmarkStar written at 22:53:23; Add Comment

2023-04-04

How to get a bigger font for Firefox's preview of link targets (sort of)

If you hover the mouse over a link in Firefox (or pretty much any other browser), the browser will show you the URL of the link target so you can see where you'll wind up if you click the link; this URL target appears down at the bottom of the page. Well, in theory, since there are a lot of things that can happen in between your click on a link and where you wind up. In Firefox, as far as I know that URL target is rendered in the browser's default user interface (UI) font and font size, which is normally your platform default. On Unix, Firefox uses GTK so this is the GTK system font default. Suppose, not hypothetically, that you would like the font of this URL target to be bigger so it's easier to read.

These days, Firefox's user interface is rendered with regular web technology of HTML and CSS (more or less, there are complications). This UI rendering can be customized through a userChrome.css file under your Firefox profile, at least if you turn on the special 'toolkit.legacyUserProfileCustomizations.stylesheets' preference. So in theory what you need to do is to find the CSS class or identifier for the URL target and then add some userChrome.css CSS to set the font size (or increase it).

For regular static UI elements that can be made to be always present, you can use Firefox's Browser Toolbox to introspect Firefox's UI HTML and CSS to determine the CSS information you need. For example, it's easy to discover that the URL bar's 'star' button to bookmark things has the CSS ID '#star-button-box' (or '#star-button' for the actual image). However, the link preview is a dynamic element; it appears only when you hover the mouse over a link, and so as far as I know there's no way to use the Browser Toolbox to pull out its details. Nor have I been able to work out what the relevant CSS ID or classes are from poking through the Firefox source code a bit.

(I would love to find a way of exploring such dynamic elements in the Browser Toolbox. Possibly there already is such a way, as I don't know my way around Firefox's developer and browser toolbox very well. It's not "Disable Popup Auto-Hide", though; the link target preview doesn't count as a popup, reasonably.)

However, you can use userChrome.css to increase the font size for the entire UI. I got my version of this from the Arch Wiki's Firefox Tweaks "change the interface font" section. My current version is:

* { font-size: 11pt; }

Possibly this is a sign that I should increase the default UI font size in GTK as a whole, but figuring out how to do that in my eccentric X environment is too much work right now.

(My current userChrome also has a XUL namespace declaration, but the Arch Wiki examples don't and since XUL Layout is gone, I suspect it's not necessary.)

PS: You can use the Browser Toolbox to shim in a font-size setting in order to try this out and figure out what size you want. This will work for the link target preview as well as regular UI elements. And you could probably fine-tune the CSS selector to exclude big ticket UI elements you didn't want to size up too much, such as tab titles.

FirefoxBiggerTargetURLFont written at 22:32:24; Add Comment

2023-04-01

Avoiding HTTP/3 (for a while) as a pragmatic default

I imagine that you've heard of HTTP/3 by now; as Wikipedia describes it, it's "the third major version of the Hypertext Transfer Protocol used to exchange information on the World Wide Web". I'm generally a quiet enthusiast of adopting new HTTP things when we can (we've been reasonably prompt to add HTTP/2 to our Apache web servers when it was possible), but with HTTP/3 I think we're likely to take a much more cautious and slow approach even once HTTP/3 support is available for Apache. This is because HTTP/3 is unlike previous versions in one important way.

One of the unusual things about HTTP/3 is that it doesn't use TCP but instead a new network transport protocol, QUIC, and QUIC operates over UDP. Operating over UDP instead of TCP has a number of consequences; for example, firewalls need adjustments to let 'QUIC' traffic through and the path your QUIC traffic takes may be different than your TCP HTTP traffic. All of this creates many opportunities for different things to happen with HTTP/3 requests than with your TCP HTTP requests. Some of these different things will be one version working and the other not, and since HTTP/3 is the newer and less common version, it's the version most likely to not work.

We're not a large group of people, and we don't have a big environment where we have a lot of visibility into how traffic moves through the broad Internet. If there is a HTTP/3 specific networking issue between our web servers and people making requests to them, it's going to be basically opaque to us, especially if people reporting problems can't even see whether or not they're using HTTP/3 (which they probably can't; you have to go well out of your way to see this in Firefox, for example). With limited people and limited resources to debug problems, the conservative approach is to avoid having them entirely by not offering HTTP/3 in its relatively early days.

How long will it take for HTTP/3 to be reliable for random people in random network environments (including reliably and detectably not working)? I don't know, but I certainly don't expect it to happen right away once HTTP/3 becomes available for Apache and other common web servers.

(I'm also uncertain about how much HTTP/3 usage there is among the big players like Google, Cloudflare, and so on. They matter because they have the resources to spot and track problems specific to HTTP/3, and to get network path problems resolved. If you can't reach us because of something in your ISP, we have a problem; if you can't reach GMail for the same reason, your ISP has a problem.)

PS: All of this is of course academic until Ubuntu's version of Apache supports HTTP/3. We're quite unlikely to switch web servers to get HTTP/3, even if Apache takes much longer than other web servers to add support.

AvoidingHTTP3ForNow written at 22:36:02; Add Comment

2023-03-25

Apache 2.4's event MPM can require more workers than you'd expect

When we upgraded from Ubuntu 18.04 to Ubuntu 22.04, we moved away from the prefork MPM to the event MPM. A significant reason for this shift is that our primary public web server has wound up with a high traffic level from people downloading things, often relatively large datasets (for example). My intuition was that much of the traffic was from low-rate connections that were mostly idle on the server as they waited for slow remote networks. The event MPM is supposed to have various features to deal with this sort of usage (as well as connections idle in the HTTP 'keep alive' state, waiting to see if there's more traffic).

Soon after we changed over we found that we had to raise various event MPM limits, and since then we've raised them twice more. This includes the limit on the number of workers, Apache's MaxRequestsWorkers. Our Apache metrics say that when our MaxRequestsWorkers setting was 1000, we managed to hit that limit with busy workers. We're now up to 2,000 workers on that web server, which on the one hand feels absurd to me but on the other hand, 1,000 clearly wasn't enough.

One possible reason for this is that I may have misunderstood how frequently connections are idle or, to quote the event MPM documentation, "where the only remaining thing to do is send the data to the client". I had assumed (without testing) that once a connection was simply writing out data from a file to the client, it fell into this state, but possibly this is only for when Apache is buffering the last remaining data itself. Since the popular requests are multi-megabyte files, they'd spend most of their transfer with Apache still reading from the files. Certainly our captured metrics suggest that we don't see very many connections that Apache's status module reports as asynchronous connections that are writing things.

For our web server's current usage, these settings are okay. But they're unfortunately dangerous, because we allow people to run CGIs on this server, and the machine is unlikely to do well if we have even 1,000 CGIs running at the same time. In practice not many CGIs get run these days, so we're likely going to get away with it. Still, it makes me nervous and I wish we had a better solution.

(If it does become a problem I can think of some options, although they're generally terrible hacks.)

ApacheEventMPMManyWorkers written at 21:49:43; Add Comment

2023-03-15

The extra hazards of mutual TLS authentication (mTLS) in web servers

Today I wound up reading a 2018 Colm MacCárthaigh thread on mutual TLS (mTLS) (via, via). I was nodding vaguely along to the thread until I hit this point, which raised an issue I hadn't previously thought about:

Directly related to all of this is that it takes an enormous amount of code to do mTLS. An ordinary TLS server can more or less ignore X509 and ASN.1 - it doesn't need to parse or handle them. Turn on MTLS and all of a sudden it does!

In mutual TLS (authentication), clients send the server a certificate just as the server sends one to clients. The server can then authenticate the client, just as the client authenticates the server (hence 'mutual TLS'). In both cases, by 'authenticates' we mean verifying the TLS certificate and also extracting various pieces of information in it.

As we should all know by now, verifying TLS certificates is not a simple process. Nor is extracting information from TLS certificates; over the years that have been an assortment of bugs in this area, such as KDE missing the difference between text blobs and C strings. In a conventional TLS environment that complexity lives only in the TLS client, which must have an X.509 and thus ASN.1 parser in order to understand and verify the server's TLS certificate.

In mTLS, the server has to verify the client's TLS certificate as well and then generally extract information from it (often people want some sort of identification of the client). This means that it needs to parse and verify X.509 certificates, complete with an ASN.1 parser. This is a lot more code than before and it's directly exposed to unauthenticated clients (you can't verify a signed TLS certificate without extracting information about the Certificate Authority from it).

If you have a narrow, specific circumstance you can potentially write somewhat narrow code that (hopefully) rejects client certificates that contain any oddities that shouldn't be present in yours, and so simplify this situation somewhat. However, you can't do this in general purpose code, such as code in general web servers or language libraries; that code pretty much needs to handle all of the oddities that are legal in mTLS client certificates. That means ASN.1 parsing that's as full as for server certificates and more or less full TLS certificate verification. Most people will probably not use cross-signed CA intermediate certificates with partially expired certificate chains for their mTLS client certificates, but if it's allowed, well.

Pragmatically, there have been a fair number of OpenSSL security issues that were only exploitable if you could get an OpenSSL client to talk to a server with a specially crafted certificate or otherwise compromised. With mTLS in your web server, congratulations, your web server is now a 'client'; these vulnerabilities may partially or completely apply. And your web server is generally a juicy target.

PS: mTLS in things that are merely using HTTPS as a transport is a somewhat different matter, although you still have new problems (such as those OpenSSL issues). My system administrator's view is that the extra code for mTLS is probably much less well tested in actual use and so probably has more bugs than plain HTTPS with some form of server certificate verification.

WebServerMTLSHazards written at 22:31:46; Add Comment

2023-02-23

The web single sign on versus availability problem

We have an slowly growing collection of internal web sites (for staff) that need authentication, and we also have a web single sign on (SSO) system. But despite having a SSO system, we barely use it and we keep using Apache based HTTP Basic Authentication instead, despite the annoyance and inconvenience of having to tell various websites the same thing. The fundamental problem that keeps us doing this is the tradeoff between a SSO system and availability.

A SSO system is very convenient when it works, but if and when it's down for some reason, it's a single point that prevents you from using anything else (unless you happen to be lucky). Many of the staff things that we could protect with SSO are in fact exactly the things we most want to be available when the rest of our systems are falling over; for example, it would be pretty bad if our metrics system's dashboards weren't accessible during an outage. Grafana Loki? Our searchable archive of our worklogs? All things that we really, really want to have access to if at all possible, and that need access restrictions.

In an ideal world, you could use either authentication method, so you'd use SSO while it was working and be able to fall back to another way, such as HTTP Basic Authentication, if SSO wasn't up. However as far as I know Apache doesn't directly provide any easy way to do this, either as a choice you make in your browser or as something that happens automatically on the server side. The mechanics of HTTP Basic Authentication in practice would make a browser choice somewhat difficult, as the browser only knows to send HTTP Basic Authentication when challenged (more or less).

You can imagine a collection of hacks that would achieve this in a hypothetical web server (and possibly be something you could realize in Apache). There are two possible approaches I see. First, most SSO systems wind up setting a cookie in your browser for the site, so you could in theory create a separate web app on the site, protected by HTTP Basic Authentication, that 'forges' the SSO cookie itself. Second, you could have a similar HTTP Basic Authentication protected web application that sets a browser cookie that causes the authentication protected section of the site to switch from SSO to HTTP Basic Authentication. In both cases, your actual authentication protected web area would default to SSO, and you'd visit another, special URL on the site to authenticate with HTTP Basic Authentication instead.

(The first approach is likely more complicated but has the advantage that the complexity is concentrated in the web app, which has to be able to generate a valid local 'SSO has been done' cookie and any other stuff necessary, such as session database entries.)

You might be able to do this backward as well; if you can switch behavior on a cookie, you can switch on the SSO cookie instead. You'd probably want to make the authentication protected area switch to the full SSO setup at that point, so that if people have an outdated or made up SSO cookie, they get forced through the SSO re-authentication process. However, things will get complicated in your life if someone has an outdated SSO cookie while your SSO system isn't working.

SingleSignOnVsAvailability written at 21:55:10; Add Comment

2023-02-19

Using web server reverse proxying to deal with file access permissions

Perhaps we're peculiar, but one of the challenges we keep running into over and over again in our web servers is that someone wants to expose access-restricted Unix files to selected local people via the web, rather than making people log in and slog through the filesystem. The actual access restrictions are not a problem because we have a well developed system using HTTP Basic Authentication (and someday it will be extended to be a single sign on environment). However this still leaves us with the challenge of giving the web server the permission to read those files. Our traditional approach has been group membership, which has created several Apache UIDs that have a steadily expanding set of privileges.

Today, in a blinding flash of the obvious, I realized that an alternate approach to solving this problem is reverse proxies. For each set of files with Unix access restrictions, we can use or set up a login that specifically has access to them, then have that login run a simple web server that serves those files. Then the main web server reverse proxies to all of those sub-servers, with appropriate HTTP Basic Authentication or other access controls in front. Each sub-server has strictly limited access to its own files, and the main Apache server doesn't need to have access to anything (beyond the ability to talk to the sub-servers). Much as with our regular user-run web servers, a sub-server could run potentially dangerous things like PHP without endangering anyone else.

(We wouldn't want these sub-servers to be regular user-run web servers, because there's no internal access controls on who can talk to user-run web servers. You would definitely want access to the sub-servers to be limited to the main Apache.)

There are a number of plausible approaches to making sure that only the main Apache can talk to the sub-servers. Since one concern is potential cross-compromise (especially with PHP in the mix), we'd definitely want to get this right. For localhost traffic, Linux iptables can restrict things by the UID that generated the packet, or we might be able to reverse proxy over Unix domain sockets (although things are tricky there unless we want a lot of two-login groups). If the frontend Apache is on a different machine, we can restrict inbound network traffic to the sub-servers to be from that machine (and then restrict who can do outbound traffic on that machine, if we allow any access to it at all).

We might not want to use Apache for most of these sub-servers, since all they need to do is serve files and Apache is still fairly heavy-weight for that. Something with a simple configuration and operation would probably be ideal for most cases. On the other hand, Apache is right there and we know how to set it up and operate it.

(What triggered this flash of the obvious was this Fediverse post by @anarcat.)

ReverseProxiesForFilePermissions written at 22:43:26; Add Comment

(Previous 10 or go back to January 2023 at 2023/01/25)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.