Wandering Thoughts

2024-03-12

What do we count as 'manual' management of TLS certificates

Recently I casually wrote about how even big websites may still be manually managing TLS certificates. Given that we're talking about big websites, this raises a somewhat interesting question of what we mean by 'manual' and 'automatic' TLS certificate management.

A modern big website probably has a bunch of front end load balancers or web servers that terminate TLS, and regardless of what else is involved in their TLS certificate management it's very unlikely that system administrators are logging in to each one of them to roll over its TLS certificate to a new one (any more than they manually log in to those servers to deploy other changes). At the same time, if the only bit of automation involved in TLS certificate management is deploying a TLS certificate across the fleet (once you have it) I think most people would be comfortable still calling that (more or less) 'manual' TLS certificate management.

As a system administrator who used to deal with TLS certificates (back then I called them SSL certificates) the fully manual way, I see three broad parts to fully automated management of TLS certificates:

  • automated deployment, where once you have the new TLS certificate you don't have to copy files around on a particular server, restart the web server, and so on. Put the TLS certificate in the right place and maybe push a button and you're done.

  • automated issuance of TLS certificates, where you don't have to generate keys, prepare a CSR, go to a web site, perhaps put in your credit card information or some other 'cost you money' stuff, perhaps wait for some manual verification or challenge by email, and finally download your signed certificate. Instead you run a program and you have a new TLS certificate.

  • automated renewal of TLS certificates, where you don't have to remember to do anything by hand when your TLS certificates are getting close enough to their expiry time. (A lesser form of automated renewal is automated reminders that you need to manually renew.)

As a casual thing, if you don't have fully automated management of TLS certificates I would say you had 'manual management' of them, because a human had to do something to make the whole process go. If I was trying to be precise and you had automated deployment but not the other two, I might describe you as having 'mostly manual management' of your TLS certificates. If you had automated issuance (and deployment) but no automated renewals, I might say you had 'partially automated' or 'partially manual' TLS certificate management.

(You can have automated issuance but not automated deployment or automated renewal and at that point I'd probably still say you had 'manual' management, because people still have to be significantly involved even if you don't have to wrestle with a TLS Certificate Authority's website and processes.)

I believe that at least some TLS Certificate Authorities support automated issuance of year long certificates, but I'm not sure. Now that I've looked, I'm going to have to stop assuming that a website using a year-long TLS certificate is a reliable sign that they're not using automated issuance.

TLSCertsWhatIsManual written at 22:29:15; Add Comment

2024-02-18

Even big websites may still be manually managing TLS certificates (or close)

I've written before about how people's soon to expire TLS certificates aren't necessarily a problem, because not everyone manages their TLS certificates through Let's Encrypt like '30 day in advance automated renewal' and perhaps short-lived TLS certificates. For example, some places (like Facebook) have automation but seem to only deploy TLS certificates that are quite close to expiry. Other places at least look as if they're still doing things by hand, and recently I got to watch an example of that.

As I mentioned yesterday, the department outsources its public website to a SaaS CMS provider. While the website has a name here for obvious reasons, it uses various assets that are hosted on sites under the SaaS provider's domain names (both assets that are probably general and assets, like images, that are definitely specific to us). For reasons beyond the scope of this entry, we monitor the reachability of these additional domain names with our metrics system. This only checks on-campus reachability, of course, but that's still important even if most visitors to the site are probably from outside the university.

As a side effect of this reachability monitoring, we harvest the TLS certificate expiry times of these domains, and because we haven't done anything special about it, they get show on our core status dashboard along side the expiry times of TLS certificates that we're actually responsible for. The result of this was that recently I got to watch their TLS expiry times count down to only two weeks away, which is lots of time from one view while also alarmingly little if you're used to renewals 30 days in advance. Then they flipped over to new a new year-long TLS certificate and our dashboard was quiet again (except for the next such external site that has dropped under 30 days).

Interestingly, the current TLS certificate was issued about a week before it was deployed, or at least its Not-Before date is February 9th at 00:00 UTC and it seems to have been put into use this past Friday, the 16th. One reason for this delay in deployment is suggested by our monitoring, which seems to have detected traces of a third certificate sometimes being visible, this one expiring June 23rd, 2024. Perhaps there were some deployment challenges across the SaaS provider's fleet of web servers.

(Their current TLS certificate is actually good for just a bit over a year, with a Not-Before of 2024-02-09 and a Not-After of 2025-02-28. This is presumably accepted by browsers, even though it's bit over 365 days; I haven't paid attention to the latest restrictions from places like Apple.)

TLSCertsSomeStillManual written at 22:06:08; Add Comment

2024-02-17

We outsource our public web presence and that's fine

I work for a pretty large Computer Science department, one where we have the expertise and need to do a bunch of internal development and in general we maintain plenty of things, including websites. Thus, it may surprise some people to learn that the department's public-focused web site is currently hosted externally on a SaaS provider. Even the previous generation of our outside-facing web presence was hosted and managed outside of the department. To some, this might seem like the wrong decision for a department of Computer Science (of all people) to make; surely we're capable of operating our own web presence and thus should as a matter of principle (and independence).

Well, yes and no. There are two realities. The first is that a modern content management system is both a complex thing (to develop and to generally to operate and maintain securely) and a commodity, with many organizations able to provide good ones at competitive prices. The second is that both the system administration and the publicity side of the department only have so many people and so much time. Or, to put it another way, all of us have work to get done.

The department has no particular 'competitive advantage' in running a CMS website; in fact, we're almost certain to be worse at it than someone doing it at scale commercially, much like what happened with webmail. If the department decided to operate its own CMS anyway, it would be as a matter of principle (which principles would depend on whether the CMS was free or paid for). So far, the department has not decided that this particular principle is worth paying for, both in direct costs and in the opportunity costs of what that money and staff time could otherwise be used for.

Personally I agree with that decision. As mentioned, CMSes are a widely available (but specialized) commodity. Were we to do it ourselves, we wouldn't be, say, making a gesture of principle against the centralization of CMSes. We would merely be another CMS operator in an already crowded pond that has many options.

(And people here do operate plenty of websites and web content on our own resources. It's just that the group here responsible for our public web presence found it most effective and efficient to use a SaaS provider for this particular job.)

OutsourcedWebCMSSensible written at 21:39:20; Add Comment

2024-01-23

CGI programs have an attractive one step deployment model

When I wrote about how CGI programs aren't particularly slow these days, one of the reactions I saw was to suggest that one might as well use a FastCGI system to run your 'CGI' as a persistent daemon, saving you the overhead of starting a CGI program on every request. One of the practical answers is that FastCGI doesn't have as simple a deployment model as CGIs generally offer, which is part of their attractions.

With many models of CGI usage and configuration, installing a CGI, removing a CGI, or updating it is a single-step process; you copy a program into a directory, remove it again, or update it. The web server notices that the executable file exists (sometimes with a specific extension or whatever) and runs it in response to requests. This deployment model can certainly become more elaborate, with you directing a whole tree of URLs to a CGI, but it doesn't have to be; you can start very simple and scale up.

It's theoretically possible to make FastCGI deployment almost as simple as the CGI model, but I don't know if any FastCGI servers and web servers have good support for this. Instead, FastCGI and in general all 'application server' models almost always require at least a two step configuration, where you to configure your application in the application server and then configure the URL for your application in your web server (so that it forwards to your application server). In some cases, each application needs a separate server (FastCGI or whatever other mechanism), which means that you have to arrange to start and perhaps monitor a new server every time you add an application.

(I'm going to assume that the FastCGI server supports reliable and automatic hot reloading of your application when you deploy a change to it. If it doesn't then that gets more complicated too.)

If you have a relatively static application landscape, this multi-step deployment process is perfectly okay since you don't have to go through it very often. But it is more involved and it often requires some degree of centralization (for web server configuration updates, for example), while it's possible to have a completely distributed CGI deployment model where people can just drop suitably named programs into directories that they own (and then have their CGI run as themselves through, for example, Apache suexec). And, of course, it's more things to learn.

(CGI is not the only thing in the web language landscape that has this simple one step deployment model. PHP has traditionally had it too, although my vague understanding is that people often use PHP application servers these days.)

PS: At least on Apache, CGI also has a simple debugging story; the web server will log any output your CGI sends to standard error in the error log, including any output generated by a total failure to run. This can be quite useful when inexperienced people are trying to develop and run their first CGI. Other web servers can sometimes be less helpful.

CGIOneStepDeployment written at 22:55:08; Add Comment

2024-01-08

One of the things limiting the evolution of WebPKI is web servers

It's recently struck me that one of the things limiting the evolution of what is called Web PKI, the general infrastructure of TLS on the web (cf), is that it has turned out that in practice, almost anything that requires (code) changes to web servers is a non-starter. This is handily illustrated by the fate of OCSP Stapling.

One way to make Web PKI better is to make certificate revocation work better, which is to say more or less at all. The Online Certificate Status Protocol (OCSP) would allow browsers to immediately check if a certificate was revoked, but there are a huge raft of problems with that. The only practical way to deploy it is with OCSP Stapling, where web servers would include a proof from the Certificate Authority that their TLS certificate hadn't been revoked as of some recent time. However, to deploy OCSP Stapling, web servers and the environment around them needed to be updated to obtain OCSP responses from the CA and then include these responses as additional elements in the TLS handshake.

Before I started writing this entry I was going to say that OCSP Stapling is notable by its absence, but this is not quite true. Using the test on this OpenSSL cookbook page suggests that a collection of major websites include stapled OCSP responses but also that at least as many major websites don't, including high profile destinations that you've certainly heard of. Such extremely partial adoption of OCSP Stapling makes it relatively useless in practice, because it means that no web client or Certificate Authority can feasibly require it (a CA can issue certificates that require OCSP Stapling).

There are perfectly good reasons for this inertia in web server behavior. New code takes time to be written, released, become common in deployed versions of web server software, fixed, improved, released again, deployed again, and even then it often requires being activated through configuration changes. At any given time, most of the web servers in the world are running older code, sometimes very older code. Most people don't change their web server configuration (or their web server) unless they have to, and also they generally don't immediately adopt new things that may not work.

(By contrast, browsers are much easier to change; there are only a few sources of major browsers, and they can generally push out changes instead of having to wait for people to pull them in. It's relatively easy to get pretty high usage of some new thing in six months or a year, or even sooner if a few groups decide.)

The practical result of this is that any improvement to Web PKI that requires web server changes is relatively unlikely to happen, and definitely isn't going to happen any time soon. The more you can hide things behind TLS libraries, the better, because then hopefully only the TLS libraries have to change (if they maintain API compatibility). But even TLS libraries mostly get updated passively, when people update operating system versions and the like.

(People can be partially persuaded to make some web server changes because they're stylish or cool, such as HTTP/2 and HTTP/3 support. But even then the code needs to get out into the world, and lots of people won't make the changes immediately or even at all.)

WebPKIEvolutionVsWebServers written at 21:40:31; Add Comment

2023-12-27

Web CGI programs aren't particularly slow these days

I recently read Reminiscing CGI scripts (via), which talked about CGI scripts and in passing mentioned that they fell out of favour for, well, let me quote:

CGI scripts have fallen out of favor primarily due to concerns related to performance and security. [...]

This is in one sense true. Back in the era when CGIs were pushed aside by PHP and other more complex deployment environments like Apache's mod_perl and mod_wsgi, their performance was an issue, especially under what was then significant load. But this isn't because CGI programs are intrinsically slow in an absolute sense; it was because computers in the early 00s were not very powerful and might even be heavily (over-)shared in virtual hosting environments. When the computers acting as web servers couldn't do very much in general, everything less you could get them to do could make a visible difference, including not starting a separate program or two for each request.

Modern computers are much faster and more powerful than the early 00s servers where PHP shoved CGIs aside; even a low end VPS is probably as good or better, with more memory, more CPU, and almost always a much faster disk. And unsurprisingly, CGIs have gotten a lot faster and a lot better at handling load in absolute terms.

To illustrate this, I put together a very basic CGI in Python and Go, stuck them in my area on our general purpose web server, and tested how fast they would run. On our run of the mill Ubuntu web server, the Python version took around 17 milliseconds to run and the Go version around four milliseconds (in both cases when they'd been run somewhat recently). Because the CGIs are more or less doing nothing in both cases, this is pretty much measuring the execution overhead of running a CGI. A real Python CGI would take longer to start because it has more things to import, but even then it's not necessarily terribly slow.

(As another data point, I have ongoing numbers for the response time of Wandering Thoughts, which is a rather complex piece of Python normally running as a CGI. On fairly basic (virtual) hardware, it seems to average about 0.17 seconds for the front page (including things like TLS overhead), which is down noticeably from a decade ago.)

Given that CGI scripts have their attractions for modest scale pages and sites, it's useful to know that CGI programs are not as terrible as they're often made out to be in old sources (or people working from old sources). Using a CGI program is a perfectly good deployment strategy for many web applications (and you can take advantages of other general web server features).

(Yes, your CGI may slow down if you're getting a hundred hits a second. How likely is that to happen, and if it does, how critical is the slowdown? There are some environments where you absolutely want and need to plan for this, but also quite a number where you don't.)

CGINotSlow written at 23:01:48; Add Comment

2023-12-15

What /.well-known/ URL queries people make against our web servers

WebFinger is a general web protocol for obtaining various sorts of information about 'people' and things, including someone's OpenID Connect (OIDC) identity provider. For example, if you want to find things out about 'brad@example.org', you can make a HTTPS query to example.org for /.well-known/webfinger?resource=acct%3Abrad%40example.org and see what you get back. WebFinger is on my mind lately as part of me dealing with OIDC and other web SSO stuff, so I became curious to see if people out there (ie, spammers) were trying to use it to extract information from us.

As we can see, WebFinger is just one of a number of things that use '/.well-known/<something>'; another famous one is Let's Encrypt's HTTP based challenge (HTTP-01), which looks for /.well-known/acme-challenge/<TOKEN> (over HTTP, not HTTPS, although I believe it accepts HTTP to HTTPS redirects). So I decided to look for general use of /.well-known/ to see what came up, and to my surprise there was rather more than I expected.

The official registry for this is Well-Known URIs at IANA. On the web server for our normal email domain (which is not our web server), by far the common query was for '/.well-known/carddav', documented in RFC 6764. After that I saw some requests for '/.well-known/openpgpkey/policy', which is covered here and less clearly here, but which isn't an officially registered thing yet. Then there were a number of requests for '/.well-known/traffic-advice' from "Chrome Privacy Preserving Prefetch Proxy". This too isn't officially registered and is sort of documented here (and here), in this question and answers, and in this blog entry. Apparently this is a pretty recent thing, probably dating from August 2023. Somewhat to my surprise, I couldn't see any use of WebFinger across the past week or so.

On our actual web server, the picture is a bit different. The dominant query is for '/.well-known/traffic-advice', and then after that we get what look like security probes for several URLs:

/.well-known/class.api.php
/.well-known/pki-validation/class.api.php
/.well-known/pki-validation/cloud.php
/.well-known/pki-validation/
/.well-known/acme-challenge/class.api.php
/.well-known/acme-challenge/atomlib.php
/.well-known/acme-challenge/cloud.php
/.well-known/acme-challenge/
/.well-known/

(Although '/.well-known/pki-validation' is a registered Well-Known URI, I believe this use of it is as much of a security probe as the pokes at acme-challenge are.)

There was a bit of use of '/.well-known/assetlinks.json' and '/.well-known/security.txt', and a long tail of other things, only a few of them registered (and some of them possibly less obviously malicious than people looking for '.php' URLs).

(We did see some requests for Thunderbird's '/.well-known/autoconfig/mail/config-v1.1.xml', which perhaps we should support, although writing and validating a configuration file looks somewhat complicated.)

There weren't that many requests overall, which isn't really surprising given that we HTTP 404'd all of them. What's left is likely to be the residual automation that blindly tries no matter what and some degree of automated probes from attackers. I admit I'm a bit sad not to have found any for WebFinger itself, because it would be a bit nifty if attackers were trying to mine that (or we had people probing for OIDC IdPs, or some other WebFinger use).

WellKnownQueriesAgainstUs written at 23:05:46; Add Comment

2023-12-11

Seeing how fast people will probe you after you get a new TLS certificate

For reasons outside the scope of this entry I spent some time today setting up a new Apache-based web server. More specifically, I spent some time setting up a new virtual host on a web server I'd set up last Friday. Of course this virtual host had a TLS certificate, or at least was going to once I had Let's Encrypt issue me one. Some of the time I'm a little ad-hoc with the process of setting up a HTTPS site; I'll start out by writing the HTTP site configuration, get a TLS certificate issued, edit the configuration to add in the HTTPS version, and so on. This can make it take a visible amount of time between the TLS certificate being issued, and thus appearing in Certificate Transparency logs, and there being any HTTPS website that will respond if you ask for it.

This time around I decided to follow a new approach and pre-write the HTTPS configuration, guarding it behind an Apache <IfFile> check for the TLS certificate private key. This meant that I could activate the HTTPS site pretty much moments after Let's Encrypt issued my TLS certificate. I also gave this new virtual host it's own set of logs, in fact two sets, one for the HTTP version and one for the HTTPS version. Part of why I did this is because I was curious how long after I got a TLS certificate it would be before people showed up to probe my new HTTPS site.

(It's well known by now that all sorts of people monitor Certificate Transparency logs for new names to probe. These days CT logs also make new entries visible quite fast; it's easily possible to monitor the logs in near real time. My own monitoring, which is nowhere near state of the art, was mailing me less than five minutes after the certificate was issued.)

If you've ever looked at this yourself, you probably know the answer. It took roughly a minute before the first outside probes showed up (from a 'leakix.org' IP address). Interestingly, this also provoked some re-scans of the machine's first HTTPS website, which had been set up Friday (and whose name was visible in, for example, the IP address's reverse mapping). These scans were actually more thorough than the scans against the new HTTPS virtual host. The HTTP versions of both the base name and the new virtual host were also scanned at the same time (again, the base version more thoroughly than the new virtual host).

Our firewall logs suggest that the machine was getting hit with a higher rate of random connections than before the TLS certificate was issued, along with at least one clear port scan against assorted TCP ports. This clear port scan took a while to show up, only starting about twenty minutes after the TLS certificate was issued (an eternity if you're trying to be the one who compromises a newly exposed machine before it's fixed up).

At one level none of this is really surprising to me; I knew this sort of stuff happened and I knew it could happen rapidly. At another level there's a difference between knowing it and watching your logs as it happens live in front of you.

WebProbeSpeedNewTLSCertificate written at 22:17:58; Add Comment

2023-12-07

Mapping out my understanding of (web-based) single sign-on systems

Suppose, not entirely hypothetically, that you want to use some systems (perhaps external systems) that wants you to have a 'single sign on' (SSO) system that it can use to authenticate you and your users. There are a number of good reasons for both sides to want this; you get better control and the outside system gets to outsource all of the hassles of managing authentication to you. To create this SSO setup, there are a number of pieces, and here is how I currently understand them.

The thing you want to end up with is an Identity Provider (IdP). Typical IdPs have two roles; they challenge users to authenticate (generally through a web browser) and perhaps approve giving this authentication information to other systems, and they provide authenticated identity information to other systems. They typically do their single sign on trick by putting a cookie in the browser to mark you as already authenticated, so when a system sends you to the IdP to get authenticated you just bounce right through without getting challenged. A garden variety IdP does all of this with HTTP(S) transactions, some of them from people's web browsers and some of them from systems to API endpoints (or from the IdP to other people's API endpoints).

An IdP needs to speak some protocol to systems that are getting authentication information from it. Two common protocols are SAML and OIDC (OpenID Connect) (also). Different IdP implementations speak different protocols; for example, SimpleSAMLphp primarily speaks SAML (as you might expect from the name), although now I look it can apparently also speak OIDC through an OIDC module. By contrast, Dex is purely an OIDC and OAuth2 IdP, while Keycloak will support all of SAML, OIDC, and OAuth2.

Naturally people have built bridges that do protocol translation between SAML and OIDC, so that if you have a SAML IdP already, you can provide OIDC to people (and perhaps vice versa). You can also 'bridge' between the same protocol, so (for example) Dex can use another OIDC IdP for authentication. I believe one reason to do this in general is to filter and winnow down the upstream IdP's list of users. Dex's documentation suggests another reason is to reformat the answers that the upstream OIDC IdP returns to something more useful to the systems using your IdP, and I'm sure there are other reasons.

(One obvious one is that if your IdP is basically an OIDC proxy for you, you don't have to register all of the systems and applications using your IdP with the upstream IdP. You register your IdP and then everything hides behind it. Your upstream IdP may or may not consider this a feature.)

An OIDC or SAML IdP needs to find out what users you have (and perhaps what administrative groups they're part of), and also authenticate them somehow. Often one or both can be outsourced to what Dex calls connectors. A popular source of both user information and user authentication is a LDAP server, which you may already have sitting around for other reasons. An IdP can also supplement outsourced authentication with additional authentication; for example, it might do password authentication against LDAP or Active Directory and then have an additional per-user MFA challenge that it manages in some database of its own.

(Some IdPs can manage users and their authentication entirely internally, but then you have to maintain, update, and protect their user and authentication database. If you already have an existing source of this, you might as well use it.)

OIDC also has a discovery protocol that is intended to let you find the OIDC IdP URLs for any particular user on any particular domain, so that a user can tell some system that they're 'me@example.org' and the system can find the right OIDC IdP from there. This discovery protocol is part of WebFinger, which means that to really run an OIDC IdP, you need something to answer WebFinger requests and provide the necessary data from RFC 7033. WebFinger isn't specific to OIDC (it's used on the Fediverse, for example) and some of the data people may want you to provide for them is user specific, like their avatar.

(I believe that OIDC IdPs usually or always require clients to be registered and assigned a client secret, so this discoverability has some limits to what you can use it for.)

PS: It's possible to set up a very janky testing OIDC IdP, but there are ways that are almost as easy and somewhat more industrial and workable.

(I'm trying to understand this whole area for work reasons.)

MappingOutSSOAuthentication written at 23:03:16; Add Comment

2023-11-26

The HTML viewport mess

Over on the Fediverse, someone I follow recently wrote a basic HTML page, tried it in a (simulated) phone environment, and got a page with ant-sized text. Some of the people reading this already know what the issue is, which is that the page didn't include a magic <meta> tag to tell phones to do the sensible thing, although the tag is not generally described that way. It's a little bit absurd that in 2023 we still have this problem, but here we are.

As I understand it, the story goes like this. Back in the beginning of smartphones, they had very small screens in both physical size and pixels (and they still mostly have that). People found out that if the smartphone browser used the true device size for website HTML layout, what you almost always got was either a messy disaster (if the site used relative widths and heights, which would slice an already small size into extremely small pieces) or constant scrollbars (if the site used absolute widths and heights, which would generally be much wider than the device's screen size). So smartphone browsers evolved a hack, where they would do HTML layout as if they had a much larger 'reasonable' resolution, then shrink the entire rendered result down to fit on the screen and let people do pinch to zoom to zoom in on portions of the tiny website so they could, for example, read text.

However, sometimes web sites were ready to render well on the small smartphone screens. To communicate this to smartphone browsers, the website had to include a special "viewport" <meta> tag in its HTML <head>. While the tag lets you specify a number of things, what you almost always want (especially for basic HTML) is 'width=device-width', which tells the smartphone to do layout at its native size (and thus is a promise from the website that it is prepared for this and will do sensible layout things). As a side effect of the smartphone browser doing layout at its native size and not shrinking the rendered result down, basic HTML text gets sensible (ie readable) font sizes.

Here in 2023, smartphone browsers are a sufficiently large traffic source that very few people can ignore them. The magical '<meta name="viewport" content="width=device-width">' tag is functionally almost universal (sometimes with assorted additions); you'd probably have to look hard to find a web page without it. However, for an assortment of reasons no one is willing to actually make that setting the default (not in smartphone browsers and not in, for example, HTML5). So if you write a basic HTML5 page by hand and forget this tag, a smartphone browser will jump back to 2005 or so and render your web page with text suitable for ants, when as a basic HTML page it would have worked just as well at the smartphone's true size.

(In theory a HTML rendering engine could detect that you were using truly basic HTML with no widths specified and no use of things like tables, and then switch you to 'viewport=device-width' automatically because it would definitely work fine. In practice I doubt anyone wants to add that complexity for what is a very rare usage case.)

HTMLViewportMess written at 22:27:21; Add Comment

(Previous 10 or go back to November 2023 at 2023/11/20)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.