2025-04-18
The clever tricks of OpenPubkey and OPKSSH
OPKSSH (also) is a clever way of using OpenID Connect (OIDC) to authenticate your OpenSSH sessions (it's not the only way to do this). How it works is sufficiently ingenious and clever that I want to write it up, especially as one underlying part uses a general trick.
OPKSSH itself is built on top of OpenPubkey, which is a trick to associated your keypair with an OIDC token. When you perform OIDC authentication, what you get back (at an abstract level) is a signed set of 'claims' and, crucially, a nonce. The nonce is supplied by the client that initiated the OIDC authentication so that it can know that the ID token it eventually gets back actually comes from this authentication session and wasn't obtained through some other one. The client initiating OIDC authentication doesn't get to ask the OIDC identity provider (OP) to include other fields.
What OpenPubkey does is turn the nonce into a signature for a combination of your public key and a second nonce of its own, by cryptographically hashing these together through a defined process. Because the OIDC IdP is signing a set of claims that include the calculated nonce, it is effectively signing a signature of your public key. If you give people the signed OIDC ID token, your public key, and your second nonce, they can verify this (and you can augment the ID Token you back to get a PK Token that embeds this additional information).
(As I understand it, calculating the OIDC ID Token nonce this way is safe because it still includes a random value (the inner nonce) and due to the cryptographic hashing, the entire calculated nonce is still effectively a non-repeating random value.)
To smuggle this PK Token to the OpenSSH server, OPKSSH embeds it
as an additional option field in an OpenSSH certificate (called
'openpubkey-pkt'). The certificate itself is for your generated PK
Token private key and is (self) signed with it, but this is all
perfectly fine with OpenSSH; SSH clients will send the certificate
off to the server as a candidate authentication key and the server
will read it in. Normally the server would reject it since it's
signed by an unknown SSH certificate authority, but OPKSSH uses a
clever trick with OpenSSH's AuthorizedKeysCommand
server
option to get its hands on the full certificate, which lets it
extract the PK Token, verify everything, and tell the SSH server
daemon that your public key is the underlying OpenPubkey key (which
you have the private key for in your SSH client).
Smuggling information through OpenSSH certificates and then processing
them with AuthorizedKeysCommand
is a clever trick, but it's
specific to OpenSSH. Turning a nonce into a signature is a general
trick that was eye-opening to me, especially because you can probably
do it repeatedly.
2025-04-15
The DNS system isn't a database and shouldn't be used as one
Over on the Fediverse, I said something:
Thesis: DNS is not meaningfully a database, because it's explicitly designed and used today so that it gives different answers to different people. Is it implemented with databases? Sure. But treating it as a database is a mistake. It's a query oracle, and as a query oracle it's not trustworthy in the way that you would normally trust a database to be, for example, consistent between different people querying it.
It would be nice if we had a global, distributed, relatively easily queryable, consistent database system. It would make a lot of things pretty nice, especially if we could wrap some cryptography around it to make sure we were getting honest answers. However, the general DNS system is not such a database and can't be used as one, and as a result should not be pressed into service as one in protocols.
DNS is designed from the ground up to lie to you in unpredictable ways, and parts of the DNS system lie to you every day. We call these lies things like 'outdated cached data' or 'geolocation based DNS' (or 'split horizon DNS'), but they're lies, or at least inconsistent alternate versions of some truth. The same fundamental properties that allow these inconsistent alternate versions also allow for more deliberate and specific lies, and they also mean that no one can know with assurance what version of DNS anyone else is seeing.
(People who want to reduce the chance for active lies as much as possible must do a variety of relatively extreme things, like query DNS from multiple vantage points around the Internet and perhaps through multiple third party DNS servers. No, checking DNSSEC isn't enough, even when it's present (also), because that just changes who can be lying to you.)
Anything that uses the global DNS system should be designed to expect outdated, inconsistent, and varying answers to the questions it asks (and sometimes incorrect answers, for various reasons). Sometimes those answers will be lies (including the lie of 'that name doesn't exist'). If your design can't deal with all of this, you shouldn't be using DNS.
2025-04-09
The problem of general OIDC identity provider support in clients
I've written entries criticizing things that support using OIDC (OAuth2) authentication for not supporting it with general OIDC identity providers ('OPs' in OIDC jargon), only with specific (large) ones like Microsoft and Google (and often Github in tech-focused things). For example, there are almost no mail clients that support using your own IdP, and it's much easier to find web-based projects that support the usual few big OIDC providers and not your own OIDC OP. However, at the same time I want to acknowledge the practical problems with supporting arbitrary OIDC OPs in things, especially in things that ordinary people are going to be expected to set up themselves.
The core problem is that there is no way to automatically discover all of the information that you need to know in order to start OIDC authentication. If the person gives you their email address, perhaps you can use WebFinger to discover basic information through OIDC Identity Provider discovery, but that isn't sufficient by itself (and it also requires aligning a number of email addresses). In practice, the OIDC OP will require you to have an 'client identifier' and perhaps a 'client secret', both of which are essentially arbitrary strings. If you're a website, the OIDC standards require your 'redirect URI' to have been pre-registered with it. If you're a client program, hopefully you can supply some sort of 'localhost' redirect URI and have it accepted, but you may need to tell the person setting things up on the OIDC OP side that you need specific strings set.
(The client ID and especially the client secret are not normally supposed to be completely public; there are various issues if you publish them widely and then use them for a bunch of different things, cf.)
If you need specific information, even to know who the authenticated person is, this isn't necessarily straightforward. You may have to ask for exactly the right information, neither too much nor too little, and you can't necessarily assume you know where a user or login name is; you may have to ask the person setting up the custom OIDC IdP where to get this. On the good side, there is at least a specific place for where people's email addresses are (but you can't assume that this is the same as someone's login).
(In OIDC terms, you may need to ask for specific scopes and then use a specific claim to get the user or login name. You can't assume that the always-present 'sub' claim is a login name, although it often is; it can be an opaque identifier that's only meaningful to the identity provider.)
Now imagine that you're the author of a mail client that wants to provide a user friendly experience to people. Today, the best you can do is provide a wall of text fields that people have to enter the right information into, with very little validation possible. If people get things even a little bit wrong, all you and they may see is inscrutable error messages. You're probably going to have to describe what people need to do and the information they need to get in technical OIDC terms that assume people can navigate their specific OIDC IdP (or that someone can navigate this for them). You could create a configuration file format for this where the OIDC IdP operator can write down all of the information, give it to the people using your software, and they import it (much like OpenVPN can provide canned configuration files), but you'll be inventing that format (cue xkcd).
If you have limited time and resources to develop your software and help people using it, it's much simpler to support only a few large, known OIDC identity providers. If things need specific setup on the OIDC IdP side, you can feasibly provide that in your documentation (since there's only a few variations), and you can pre-set everything in your program, complete with knowledge about things like OIDC scopes and claims. It's also going to be fairly easy to test your code and procedures against these identity providers, while if you support custom OIDC IdPs you may need to figure out how to set up one (or several), how to configure it, and so on.
2025-04-03
OIDC/OAuth2 as the current all purpose 'authentication hammer'
Today, for reasons, I found myself reflecting that OIDC/OAuth2 seems to have become today's all purpose authentication method, rather than just being a web authentication and Single Sign On system. Obviously you can authenticate websites with OIDC, as well as anything that you can reasonably implement using a website as part of things, but it goes beyond this. You can use OIDC/OAuth2 tokens to authenticate IMAP, POP3, and authenticated SMTP (although substantial restrictions apply), you can (probably) authenticate yourself to various VPN software through OIDC, there are several ways of doing SSH authentication with OIDC, and there's likely others. OIDC/OAuth2 is a supported SASL mechanism, so protocols with SASL support can in theory use OIDC tokens for authentication (although your backend has to support this, as I suppose do your clients). And in general you can pass OAuth2 tokens around somehow to validate yourself over some bespoke protocol.
On the one hand, this is potentially quite useful if you have an OIDC identity server (an 'OP'), perhaps one with some special custom authentication behavior. Once you have your special server, OIDC is your all purpose tool to get its special behavior supported everywhere (as opposed to having to build and hook up your special needs with bespoke behavior in everything, assuming that's even possible). It does have the little drawback that you wind up with OIDC on the brain and see OIDC as the solution to all of your problems, much like hammers.
(Another use of OIDC is to outsource all of your authentication and perhaps even identity handling to some big third party provider (such as Google, Microsoft/Office365, Github, etc). This saves you from having to run your own authentication and identity servers, manage your own Multi-Factor Authentication handling, and so on.)
On the other hand, the OIDC authentication flow is unapologetically web based, and in practice often needs a browser with JavaScript and cookies (cookies may be required in the protocol, I haven't checked). This means that any regular program that wants to use OIDC to authenticate you to something must either call up your browser somehow and then collect the result or it must embed a browser within itself in a little captive browser interface (where it's probably easier to collect the result). This has a variety of limitations and implications, especially if you want to authenticate yourself through OIDC on a server style machine where you don't even have a web browser you can readily run (or a GUI).
(There are awkward tricks around this, cf, or you can outsource part of the authentication to a trusted website that the server program checks in with.)
OIDC isn't the first or the only web authentication protocol; there's also at least SAML, which I believe predates it. But I don't think SAML caught on outside of (some) web authentication. Perhaps it's the XML, which has had what you could call 'some problems' over the years (also, which sort of discusses how SAML requires specific XML handling guarantees that general XML libraries don't necessarily provide).
2025-03-28
In universities, sometimes simple questions aren't simple
Over on the Fediverse I shared a recent learning experience:
Me, an innocent: "So, how many professors are there in our university department?"
Admin person with a thousand yard stare: "Well, it depends on what you mean by 'professor', 'in', and 'department." <unfolds large and complicated chart>
In many companies and other organizations, the status of people is usually straightforward. In a university, things are quite often not so clear, and in my department all three words in my joke are in fact not a joke (although you could argue that two overlap).
For 'professor', there are a whole collection of potential statuses beyond 'tenured or tenure stream'. Professors may be officially retired but still dropping by to some degree ('emeritus'), appointed only for a limited period (but doing research, not just teaching), hired as sessional instructors for teaching, given a 'status-only' appointment, and other possible situations.
(In my university, there's such a thing as teaching stream faculty, who are entirely distinct from sessional instructors. In other universities, all professors are what we here would call 'research stream' professors and do research work as well as teaching.)
For 'in', even once you have a regular full time tenure stream professor, there's a wide range of possibilities for a professor to be cross appointed (also) between departments (or sometimes 'partially appointed' by two departments). These sort of multi-department appointments are done for many reasons, including to enable a professor in one department to supervise graduate students in another one. How much of the professor's salary each department pays varies, as does where the professor actually does their research and what facilities they use in each department.
(Sometime a multi-department professor will be quite active in both departments because their core research is cross-disciplinary, for example.)
For 'department', this is a local peculiarity in my university. We have three campuses, and professors are normally associated with a specific campus. Depending on how you define 'the department', you might or might not consider Computer Science professors at the satellite campuses to be part of the (main campus) department. Sometimes it depends on what the professors opt to do, for example whether or not they will use our main research computing facilities, or whether they'll be supervising graduate students located at our main campus.
Which answers you want for all of these depends on what you're going to use the resulting number (or numbers) for. There is no singular and correct answer for 'how many professors are there in the department'. The corollary to this is that any time we're asked how many professors are in our department, we have to quiz the people asking about what parts matter to them (or guess, or give complicated and conditional answers, or all of the above).
(Asking 'how many professor FTEs do we have' isn't any better.)
PS: If you think this complicates the life of any computer IAM system that's trying to be a comprehensive source of answers, you would be correct. Locally, my group doesn't even attempt to track these complexities and instead has a much simpler view of things that works well enough for our purposes (mostly managing Unix accounts).
2025-03-16
OIDC claim scopes and their interactions with OIDC token authentication
When I wrote about how SAML and OIDC differed in sharing information, where SAML shares every SAML 'attribute' by default and OIDC has 'scopes' for its 'claims', I said that the SAML approach was probably easier within an organization, where you already have trust in the clients. It turns out that there's an important exception to this I didn't realize at the time, and that's when programs (like mail clients) are using tokens to authenticate to servers (like IMAP servers).
In OIDC/OAuth2 (and probably in SAML as well), programs that obtain tokens can open them up and see all of the information that they contain, either inspecting them directly or using a public OIDC endpoint that allows them to 'introspect' the token for additional information (this is the same endpoint that will be used by your IMAP server or whatever). Unless you enjoy making a bespoke collection of (for example) IMAP clients, the information that programs need to obtain tokens is going to be more or less public within your organization and will probably (or even necessarily) leak outside of it.
(For example, you can readily discover all of the OIDC client IDs used by Thunderbird for the various large providers it supports. There's nothing stopping you from using those client IDs and client secrets yourself, although large providers may require your target to have specifically approved using Thunderbird with your target's accounts.)
This means that anyone who can persuade your people to authenticate through a program's usual flow can probably extract all of the information available in the token. They can do this either on the person's computer (capturing the token locally) or by persuading people that they need to 'authenticate to this service with IMAP OAuth2' or the like and then extracting the information from the token.
In the SAML world, this will by default be all of the information contained in the token. In the OIDC world, you can restrict the information made available through tokens issued through programs by restricting the scopes that you allow programs to ask for (and possibly different scopes for different programs, although this is a bit fragile; attackers may get to choose which program's client ID and so on they use).
(Realizing this is going to change what scopes we allow in our OIDC IdP for program client registrations. So far I had reflexively been giving them access to everything, just like our internal websites; now I think I'm going to narrow it down to almost nothing.)
Sidebar: How your token-consuming server knows what created them
When your server verifies OAuth2/OIDC tokens presented to it, the minimum thing you want to know is that they come from the expected OIDC identity provider, which is normally achieved automatically because you'll ask that OIDC IdP to verify that the token is good. However, you may also want to know that the token was specifically issued for use with your server, or through a program that's expected to be used for your server. The normal way to do this is through the 'aud' OIDC claim, which has at least the client ID (and in theory your OIDC IdP could add additional entries). If your OIDC IdP can issue tokens through multiple identities (perhaps to multiple parties, such as the major IdPs of, for example, Google and Microsoft), you may also want to verify the 'iss' (issuer) field instead or in addition to 'aud'.
2025-03-15
Some notes on the OpenID Connect (OIDC) 'redirect uri'
The normal authentication process for OIDC is web-based and involves a series of HTTP redirects, interspersed with web pages that you interact with. Something that wants to authenticate you will redirect you to the OIDC identity server's website, which will ask you for your login and password and maybe MFA authentication, check them, and then HTTP redirect you back to a 'callback' or 'redirect' URL that will transfer a magic code from the OIDC server to the OIDC client (generally as a URL query parameter). All of this happens in your browser, which means that the OIDC client and server don't need to be able to directly talk to each other, allowing you to use an external cloud/SaaS OIDC IdP to authenticate to a high-security internal website that isn't reachable from the outside world and maybe isn't allowed to make random outgoing HTTP connections.
(The magic code transferred in the final HTTP redirect is apparently often not the authentication token itself but instead something the client can use for a short time to obtain the real authentication token. This does require the client to be able to make an outgoing HTTP connection, which is usually okay.)
When the OIDC client initiates the HTTP redirection to the OIDC IdP server, one of the parameters it passes along is the 'redirect uri' it wants the OIDC server to use to pass the magic code back to it. A malicious client (or something that's gotten a client's ID and secret) could do some mischief by manipulating this redirect URL, so the standard specifically requires that OIDC IdP have a list of allowed redirect uris for each registered client. The standard also says that in theory, the client's provided redirect uri and the configured redirect uris are compared as literal string values. So, for example, 'https://example.org/callback' doesn't match 'https://example.org/callback/'.
This is straightforward when it comes to websites as OIDC clients, since they should have well defined callback urls that you can configure directly into your OIDC IdP when you set up each of them. It gets more hairy when what you're dealing with is programs as OIDC clients, where they are (for example) trying to get an OIDC token so they can authenticate to your IMAP server with OAuth2, since these programs don't normally have a website. Historically, there are several approaches that people have taken for programs (or seem to have, based on my reading so far).
Very early on in OAuth2's history, people apparently defined the special redirect uri value 'urn:ietf:wg:oauth:2.0:oob' (which is now hard to find or identify documentation on). An OAuth2 IdP that saw this redirect uri (and maybe had it allowed for the client) was supposed to not redirect you but instead show you a HTML page with the magic OIDC code displayed on it, so you could copy and paste the code into your local program. This value is now obsolete but it may still be accepted by some IdPs (you can find it listed for Google in mutt_oauth2.py, and I spotted an OIDC IdP server that handles it).
Another option is that the IdP can provide an actual website that does the same thing; if you get HTTP redirected to it with a valid code, it will show you the code on a HTML page and you can copy and paste it. Based on mutt_oauth2.py again, it appears that Microsoft may have at one point done this, using https://login.microsoftonline.com/common/oauth2/nativeclient as the page. You can do this too with your own IdP (or your own website in general), although it's not recommended for all sorts of reasons.
The final broad approach is to use 'localhost' as the target host for the redirect. There are several ways to make this work, and one of them runs into complications with the IdP's redirect uri handling.
The obvious general approach is for your program to run a little HTTP server that listens on some port on localhost, and capture the code when the (local) browser gets the HTTP redirect to localhost and visits the server. The problem here is that you can't necessarily listen on port 80, so your redirect uri needs to include the port you're listening (eg 'http://localhost:7000'), and if your OIDC IdP is following the standard it must be configured not just with 'http://localhost' as the allowed redirect uri but the specific port you'll use. Also, because of string matching, if the OIDC IdP lists 'http://localhost:7000', you can't send 'http://localhost:7000/' despite them being the same URL.
(And your program has to use 'localhost', not '127.0.0.1' or the IPv6 loopback address; although the two have the same effect, they're obviously not string-identical.)
Based on experimental evidence from OIDC/OAuth2 client configurations, I strongly suspect that some large IdP providers have non-standard, relaxed handling of 'localhost' redirect uris such that their client configuration lists 'http://localhost' and the IdP will accept some random port glued on in the actual redirect uri (or maybe this behavior has been standardized now). I suspect that the IdPs may also accept the trailing slash case. Honestly, it's hard to see how you get out of this if you want to handle real client programs out in the wild.
(Some OIDC IdP software definitely does the standard compliant string comparison. The one I know of for sure is SimpleSAMLphp's OIDC module. Meanwhile, based on reading the source code, Dex uses a relaxed matching for localhost in its matching function, provided that there are no redirect uris register for the client. Dex also still accepts the urn:ietf:wg:oauth:2.0:oob redirect uri, so I suspect that there are still uses out there in the field.)
If the program has its own embedded web browser that it's in full control of, it can do what Thunderbird appears to do (based on reading its source code). As far as I can tell, Thunderbird doesn't run a local listening server; instead it intercepts the HTTP redirection to 'http://localhost' itself. When the IdP sends the final HTTP redirect to localhost with the code embedded in the URL, Thunderbird effectively just grabs the code from the redirect URL in the HTTP reply and never actually issues a HTTP request to the redirect target.
The final option is to not run a localhost HTTP server and to tell people running your program that when their browser gives them an 'unable to connect' error at the end of the OIDC authentication process, they need to go to the URL bar and copy the 'code' query parameter into the program (or if you're being friendly, let them copy and paste the entire URL and you extract the code parameter). This allows your program to use a fixed redirect uri, including just 'http://localhost', because it doesn't have to be able to listen on it or on any fixed port.
(This is effectively a more secure but less user friendly version of the old 'copy a code that the website displayed' OAuth2 approach, and that approach wasn't all that user friendly to start with.)
PS: An OIDC redirect uri apparently allows things other than http:// and https:// URLs; there is, for example, the 'openid-credential-offer' scheme. I believe that the OIDC IdP doesn't particularly do anything with those redirect uris other than accept them and issue a HTTP redirect to them with the appropriate code attached. It's up to your local program or system to intercept HTTP requests for those schemes and react appropriately, much like Thunderbird does, but perhaps easier because you can probably register the program as handling all 'whatever-special://' URLs so the redirect is automatically handed off to it.
(I suspect that there are more complexities in the whole OIDC and OAuth2 redirect uri area, since I'm new to the whole thing.)
2025-03-12
The commodification of desktop GUI behavior
Over on the Fediverse, I tried out a thesis:
Thesis: most desktop GUIs are not opinionated about how you interact with things, and this is why there are so many GUI toolkits and they make so little difference to programs, and also why the browser is a perfectly good cross-platform GUI (and why cross-platform GUIs in general).
Some GUIs are quite opinionated (eg Plan 9's Acme) but most are basically the same. Which isn't necessarily a bad thing but it creates a sameness.
(Custom GUIs are good for frequent users, bad for occasional ones.)
Desktop GUIs differ in how they look and to some extent in how you do certain things and how you expect 'native' programs to behave; I'm sure the fans of any particular platform can tell you all about little behaviors that they expect from native applications that imported ones lack. But I think we've pretty much converged on a set of fundamental behaviors for how to interact with GUI programs, or at least how to deal with basic ones, so in a lot of cases the question about GUIs is how things look, not how you do things at all.
(Complex programs have for some time been coming up with their own bespoke alternatives to, for example, huge cascades of menus. If these are successful they tend to get more broadly adopted by programs facing the same problems; consider the 'ribbon', which got what could be called a somewhat mixed reaction on its modern introduction.)
On the desktop, changing the GUI toolkit that a program uses (either on the same platform or on a different one) may require changing the structure of your code (in addition to ordinary code changes), but it probably won't change how your program operates. Things will look a bit different, maybe some standard platform features will appear or disappear, but it's not a completely different experience. This often includes moving your application from the desktop into the browser (a popular and useful 'cross-platform' environment in itself).
This is less true on mobile platforms, where my sense is that the two dominant platforms have evolved somewhat different idioms for how you interact with applications. A proper 'native' application behaves differently on the two platforms even if it's using mostly the same code base.
GUIs such as Plan 9's Acme show that this doesn't have to be the case; for that matter, so does GNU Emacs. GNU Emacs has a vague shell of a standard looking GUI but it's a thin layer over a much different and stranger vastness, and I believe that experienced Emacs people do very little interaction with it.
2025-03-08
How SAML and OIDC differ in sharing information, and perhaps why
In practice, SAML and OIDC are two ways of doing third party web-based authentication (and thus a Single Sign On (SSO)) system; the web site you want to use sends you off to a SAML or OIDC server to authenticate, and then the server sends authentication information back to the 'client' web site. Both protocols send additional information about you along with the bare fact of an authentication, but they differ in how they do this.
In SAML, the SAML server sends a collection of 'attributes' back to the SAML client. There are some standard SAML attributes that client websites will expect, but the server is free to throw in any other attributes it feels like, and I believe that servers do things like turn every LDAP attribute they get from a LDAP user lookup into a SAML attribute (certainly SimpleSAMLphp does this). As far as I know, any filtering of what SAML attributes are provided by the server to any particular client is a server side feature, and SAML clients don't necessarily have any way of telling the SAML server what attributes they want or don't want.
In OIDC, the equivalent way of returning information is 'claims', which are grouped into 'scopes', along with basic claims that you get without asking for a scope. The expectation in OIDC is that clients that want more than the basic claims will request specific scopes and then get back (only) the claims for those scopes. There are standard scopes with standard claims (not all of which are necessarily returned by any given OIDC server). If you want to add additional information in the form of more claims, I believe that it's generally expected that you'll create one or more custom scopes for those claims and then have your OIDC clients request them (although not all OIDC clients are willing and able to handle custom scopes).
(I think in theory an OIDC server may be free to shove whatever claims it wants to into information for clients regardless of what scopes the client requested, but an OIDC client may ignore any information it didn't request and doesn't understand rather than pass it through to other software.)
The SAML approach is more convenient for server and client administrators who are working within the same organization. The server administrator can add whatever information to SAML responses that's useful and convenient, and SAML clients will generally automatically pick it up and often make it available to other software. The OIDC approach is less convenient, since you need to create one or more additional scopes on the server and define what claims go in them, and then get your OIDC clients to request the new scopes; if an OIDC client doesn't update, it doesn't get the new information. However, the OIDC approach makes it easier for both clients and servers to be more selective and thus potentially for people to control how much information they give to who. An OIDC client can ask for only minimal information by only asking for a basic scope (such as 'email') and then the OIDC server can tell the person exactly what information they're approving being passed to the client, without the OIDC server administrators having to get involved to add client-specific attribute filtering.
(In practice, OIDC probably also encourages giving less information to even trusted clients in general since you have to go through these extra steps, so you're less likely to do things like expose all LDAP information as OIDC claims in some new 'our-ldap' scope or the like.)
My guess is that OIDC was deliberately designed this way partly in order to make it better for use with third party clients. Within an organization, SAML's broad sharing of information may make sense, but it makes much less sense in a cross-organization context, where you may be using OIDC-based 'sign in with <large provider>' on some unrelated website. In that sort of case, you certainly don't want that website to get every scrap of information that the large provider has on you, but instead only ask for (and get) what it needs, and for it to not get much by default.
2025-03-07
The OpenID Connect (OIDC) 'sub' claim is surprisingly load-bearing
OIDC (OpenID Connect) is today's better or best regarded standard for (web-based) authentication. When a website (or something) authenticates you through an OpenID (identity) Provider (OP), one of the things it gets back is a bunch of 'claims', which is to say information about the authenticated person. One of the core claims is 'sub', which is vaguely described as a string that is 'subject - identifier for the end-user at the issuer'. As I discovered today, this claim is what I could call 'load bearing' in a surprising way or two.
In theory, 'sub' has no meaning beyond identifying the user in some
opaque way. The first way it's load bearing is that some OIDC client
software (a 'Relying Party (RP)') will assume that the 'sub' claim
has a human useful meaning. For example, the Apache OpenIDC module defaults to putting
the 'sub' claim into Apache's REMOTE_USER
environment variable.
This is fine if your OIDC IdP software puts, say, a login name into
it; it is less fine if your OIDC IdP software wants to create 'sub'
claims that look like 'YXVzZXIxMi5zb21laWRw'. These claims mean
something to your server software but not necessarily to you and
the software you want to use on (or behind) OIDC RPs.
The second and more surprising way that the 'sub' claim is load bearing involves how external consumers of your OIDC IdP keep track of your people. In common situations your people will be identified and authorized by their email address (using some additional protocols), which they enter into the outside OIDC RP that's authenticating against your OIDC IdP, and this looks like the identifier that RP uses to keep track of them. However, at least one such OIDC RP assumes that the 'sub' claim for a given email address will never change, and I suspect that there are more people who either quietly use the 'sub' claim as the master key for accounts or who require 'sub' and the email address to be locked together this way.
This second issue makes the details of how your OIDC IdP software generates its 'sub' claim values quite important. You want it to be able to generate those 'sub' values in a clear and documented way that other OIDC IdP software can readily duplicate to create the same 'sub' values, and that won't change if you change some aspect of the OIDC IdP configuration for your current software. Otherwise you're at least stuck with your current OIDC IdP software, and perhaps with its exact current configuration (for authentication sources, internal names of things, and so on).
(If you have to change 'sub' values, for example because you have to migrate to different OIDC IdP software, this could go as far as the outside OIDC RP basically deleting all of their local account data for your people and requiring all of it to be entered back from scratch. But hopefully those outside parties have a better procedure than this.)