2025-06-17
A performance mystery with Linux WireGuard on 10G Ethernet
As a followup on discovering that WireGuard can saturate a 1G Ethernet (on Linux), I set up WireGuard on some slower servers here that have 10G networking. This isn't an ideal test but it's more representative of what we would see with our actual fileservers, since I used spare fileserver hardware. What I got out of it was a performance and CPU usage mystery.
What I expected to see was that WireGuard performance would top out at some level above 1G as the slower CPUs on both the sending and the receiving host ran into their limits, and I definitely wouldn't see them drive the network as fast as they could without WireGuard. What I actually saw was that WireGuard did hit a speed limit but the CPU usage didn't seem to saturate, either for kernel WireGuard processing or for the iperf3 process. These machines can manage to come relatively close to 10G bandwidth with bare TCP, while with WireGuard they were running around 400 MBytes/sec of on the wire bandwidth (which translates to somewhat less inside the WireGuard connection, due to overheads).
One possible explanation for this is increased packet handling latency, where the introduction of WireGuard adds delays that keep things from running at full speed. Another possible explanation is that I'm running into CPU limits that aren't obvious from simple tools like top and htop. One interesting thing is that if I do a test in both directions at once (either an iperf3 bidirectional test or two iperf3 sessions, one in each direction), the bandwidth in each direction is slightly over half the unidirectional bandwidth (while a bidirectional test without WireGuard runs at full speed in both directions at once). This certainly makes it look like there's a total WireGuard bandwidth limit in these servers somewhere; unidirectional traffic gets basically all of it, while bidirectional traffic splits it fairly between each direction.
I looked at 'perf top' on the receiving 10G machine and kernel spin lock stuff seems to come in surprisingly high. I tried having a 1G test machine also send WireGuard traffic to the receiving 10G test machine at the same time and the incoming bandwidth does go up by about 100 Mbytes/sec, so perhaps on these servers I'm running into a single-peer bandwidth limitation. I can probably arrange to test this tomorrow.
(I can't usefully try both of my 1G WireGuard test machines at once because they're both connected to the same 1G switch, with a 1G uplink into our 10G switch fabric.)
PS: The two 10G servers are running Ubuntu 24.04 and Ubuntu 22.04 respectively with standard kernels; the faster server with more CPUs was the 'receiving' server here, and is running 24.04. The two 1G test servers are running Ubuntu 24.04.
2025-06-16
Linux kernel WireGuard can go 'fast' on decent hardware
I'm used to thinking of encryption as a slow thing that can't deliver anywhere near to network saturation, even on basic gigabit Ethernet connections. This is broadly the experience we see with our current VPN servers, which struggle to turn in more than relatively anemic bandwidth with OpenVPN and L2TP, and so for a long time I assumed it would also be our experience with WireGuard if we tried to put anything serious behind it. I'd seen the 2023 Tailscale blog post about this but discounted it as something we were unlikely to see; as their kernel throughput on powerful sounding AWS nodes was anemic by 10G standards, so I assumed our likely less powerful servers wouldn't even get 1G rates.
Today, for reasons beyond the scope of this entry, I wound up wondering how fast we could make WireGuard go. So I grabbed a couple of spare servers we had with reasonably modern CPUs (by our limited standards), put our standard Ubuntu 24.04 on them, and took a quick look to see how fast I could make them go over 1G networking. To my surprise, the answer is that WireGuard can saturate that 1G network with no particularly special tuning, and the system CPU usage is relatively low (4.5% on the client iperf3 side, 8% on the server iperf3 side; each server has a single Xeon E-2226G). The low usage suggests that we could push well over 1G of WireGuard bandwidth through a 10G link, which means that I'm going to set one up for testing at some point.
While the Xeon E-2226G is not a particularly impressive CPU, it's better than the CPUs our NFS fileservers have (the current hardware has Xeon Silver 4410Ys). But I suspect that we could sustain over 1G of WireGuard bandwidth even on them, if we wanted to terminate WireGuard on the fileservers instead of on a 'gateway' machine with a fast CPU (and a 10G link).
More broadly, I probably need to reset my assumptions about the relative speed of encryption as compared to network speeds. These days I suspect a lot of encryption methods can saturate a 1G network link, at least in theory, since I don't think WireGuard is exceptionally good in this respect (as I understand it, encryption speed wasn't particularly a design goal; it was designed to be secure first). Actual implementations may vary for various reasons so perhaps our VPN servers need some tuneups.
(The actual bandwidth achieved inside WireGuard is less than the 1G data rate because simply being encrypted adds some overhead. This is also something I'm going to have to remember when doing future testing; if I want to see how fast WireGuard is driving the underlying networking, I should look at the underlying networking data rate, not necessarily WireGuard's rate.)
2025-06-15
My views on the choice of name for SMTP senders to use in TLS SNI
TLS SNI (Server Name Indication) is something that a significant minority of sending mail servers use when they do TLS with SMTP. One of the reasons that it's not used more generally is apparently that there's confusion about what TLS SNI name to use. Based on our logs, in practice essentially everyone using TLS SNI uses the MX target name as the SNI name; if something is MX'd to 'inbound.example.org', then sending mailers with send the SNI name 'inbound.example.org'.
One other option for the TLS SNI name is the domain name of the recipient. Using the TLS SNI of the recipient domain would let SMTP frontends route the connection to an appropriate backend in still encrypted form, although you'd have to use a custom protocol. If you then required a matching name in the server TLS certificate, you'd also have assurance that you were delivering the email to a mail server that should handle that domain's mail, rather than having someone intercept your DNS MX query and provide their own server as the place to send 'example.org' mail. However, this has some problems.
First, it means that a sending mailer couldn't aggregate email messages to multiple target domains all hosted by the same MX target into a single connection. I suspect that this isn't a big issue and most email isn't aggregated this way in the first place. More importantly, if the receiving server's TLS certificate had to match the SNI it received, you would need to equip your inbound mail servers with a collection of potentially high value TLS certificates for your bare domain names. The 'inbound.example.org' server would need a TLS server certificate for 'example.org' (and maybe 'example.net' if you had both and handled both in the same inbound server). In the current state of TLS, I don't believe this TLS certificate could be scoped down so that it couldn't be used for HTTPS.
This would also be troublesome for outsourcing your email handling to someone. If you outsourced your email to example.org, you would have to give them a TLS certificate (or a series of TLS certificates) for your bare domain, so they could handle email for it with the right TLS certificate.
Sending the target MX name as the TLS SNI name is less useful for some things and is more prone to DNS interception and other ways to tamper with DNS results (assuming there's no DNSSEC in action). But it has the virtue that it's simple to implement on both sides and you don't have to give your inbound mail server access to potentially sensitive TLS certificates. It can have a TLS certificate just for its own name, where the only thing this can really be used for is mail just because that's the only thing the server does. And of course it works well with outsourced mail handling, because the MX target server only needs a TLS certificate for its own name, which its organization should be able to easily provide.
Arguably part of the confusion here is an uncertainly over what using TLS SNI during SMTP STARTTLS is supposed to achieve and what security properties we want from the resulting TLS connection. In HTTPS TLS, using SNI has a clear purpose; it lets you handle multiple websites, each with their own TLS certificate, on a single IP, and you want the TLS certificate to be valid for the TLS SNI name. I don't think we have any such clarity for TLS SNI and general server TLS certificate validation in SMTP. For example, are we using TLS certificates to prove that the SMTP server you're talking to is who you think it is (the MX target name), or that it's authorized to handle email for a particular domain (the domain name of the destination)?
(This elaborates on a Fediverse post in a discussion of this area.)
2025-06-14
Revisiting ZFS's ZIL, separate log devices, and writes
Many years ago I wrote a couple of entries about ZFS's ZIL optimizations for writes and then an update for separate log devices. In completely unsurprising news, OpenZFS's behavior has changed since then and gotten simpler. The basic background for this entry is the flow of activity in the ZIL (ZFS Intent Log).
When you write data to a ZFS filesystem, your write will be classified as 'indirect', 'copied', or 'needcopy'. A 'copied' write is immediately put into the in-memory ZIL even before the ZIL is flushed to disk, a 'needcopy' write will be put into the in-memory ZIL if a (filesystem) sync() or fsync() happens and then written to disk as part of the ZIL flush, and an 'indirect' write will always be written to its final place in the filesystem even if the ZIL is flushed to disk, with the ZIL just containing a pointer to the regular location (although at that point the ZIL flush depends on those regular writes). ZFS keeps metrics on how much you have of all of these, and they're potentially relevant in various situations.
As of the current development version of OpenZFS (and I believe for some time in released versions), how writes are classified is like this, in order:
- If you have '
logbias=throughput
' set or the write is an O_DIRECT write, it is an indirect write. - If you don't have a separate log device and the write is equal to
or larger than zfs_immediate_write_sz
(32 KBytes by default), it is an indirect write.
- If this is a synchronous write, it is a 'copied' write, including if your
filesystem has 'sync=always'
set.
- Otherwise it's a 'needcopy' write.
If your system is doing normal IO (well, normal writes) and you don't have a separate log device, large writes are indirect writes and small writes are 'needcopy' writes. This keeps both of them out of the in-memory ZIL. However, on our systems I see a certain volume of 'copied' writes, suggesting that some programs or ZFS operations force synchronous writes. This seems to be especially common on our ZFS based NFS fileservers, but it happens to some degree even on the ZFS fileserver that mostly does local IO.
The corollary to this is that if you do have a separate log device and you don't do O_DIRECT writes (and don't set logbias=throughput), all of your writes will go to your log device during ZIL flushes, because they'll fall through the first two cases and into case three or four. If you have a sufficiently high write volume combined with ZIL flushes, this may increase the size of a separate log device that you want and also make you want one that has a high write bandwidth (and can commit things to durable storage rapidly).
(We don't use any separate log devices for various reasons and I don't have well informed views of when you should use them and what sort of device you should use.)
Once upon a time (when I wrote my old entry), there was a zil_slog_limit tunable that pushed some writes back to being indirect writes even if you had a separate log device, under somewhat complex circumstances. That was apparently removed in 2017 and was partly not working even before then (also).
2025-06-13
Will (more) powerful discrete GPUs become required in practice in PCs?
One of the slow discussions I'm involved in over on the Fediverse started with someone wondering what modern GPU to get to run Linux on Wayland (the current answer is said to be an Intel Arc B580, if you have a modern distribution version). I'm a bit interested in this question but not very much, because I've traditionally considered big discrete GPU cards to be vast overkill for my needs. I use an old, text-focused type of X environment and I don't play games, so apart from needing to drive big displays at 60Hz (or maybe someday better than that), it's been a long time since I needed to care about how powerful my graphics was. These days I use 'onboard' graphics whenever possible, which is to say the modest GPU that Intel and AMD now integrate on many CPU models.
(My office desktop has more or less the lowest end discrete AMD GPU with suitable dual outputs that we could find at the time because my CPU didn't have onboard graphics. My current home desktop uses what is now rather old onboard Intel graphics.)
However, graphics aren't the only thing you can do with GPUs these days (and they haven't been for some time). Increasingly, people do a lot of GPU computing (and not just for LLMs; darktable can use your GPU for image processing on digital photographs). In the old days, this GPU processing was basically not worth even trying on your typical onboard GPU (darktable basically laughed at my onboard Intel graphics), and my impression is that's still mostly the case if you want to do serious work. If you're serious, you want a lot of GPU memory, a lot of GPU processing units, and so on, and you only really get that on dedicated discrete GPUs.
You'll probably always be able to use a desktop for straightforward basic things with only onboard graphics (if only because of laptop systems that have price, power, and thermal limits that don't allow for powerful, power-hungry, and hot GPUs). But that doesn't necessarily mean that it will be practical to be a programmer or system administrator without a discrete GPU that can do serious computing, or at least that you'll enjoy it very much. I can imagine a future where your choices are to have a desktop with a good discrete GPU so that you can do necessary (GPU) computation, bulk analysis, and so on locally, or to remote off to some GPU-equipped machine to do the compute-intensive side of your work.
(An alternate version of this future is that CPU vendors stuff more and more GPU compute capacity into CPUs and the routine GPU computation keeps itself to within what the onboard GPU compute units can deliver. After all, we're already seeing CPU vendors include dedicated GPU computation capacity that's not intended for graphics.)
Even if discrete GPUs don't become outright required, it's possible that they'll become so useful and beneficial that I'll feel the need to get one; not having one would be workable but clearly limiting. I might feel that about a discrete GPU today if I did certain sorts of things, such as large scale photo or video processing.
I don't know if I believe in this general future, where a lot of important things require (good) GPU computation in order to work decently well. It seems a bit extreme. But I've been quite wrong about technology trends in the past that similarly felt extreme, so nowadays I'm not so sure of my judgment.
2025-06-12
What would a multi-user web server look like? (A thought experiment)
Every so often my thoughts turn to absurd ideas. Today's absurd idea is sparked by my silly systemd wish for moving processes between systemd units, which in turn was sparked by a local issue with Apache CGIs (and suexec). This got me thinking about what a modern 'multi-user' web server would look like, where by multi-user I mean a web server that's intended to serve content operated by many different people (such as many different people's CGIs). Today you can sort of do this for CGIs through Apache suexec, but as noted this has limits.
The obvious way to implement this would be to run a web server process for every different person's web area and then reverse proxy to the appropriate process. Since there might be a lot of people and not all of them are visited very often, you would want these web server processes to be started on demand and then shut down automatically after a period of inactivity, rather than running all of the time (on Linux you could sort of put this together with systemd socket units). These web server processes would run as appropriate Unix UIDs, not as the web server UID, and on Linux under appropriate systemd hierarchies with appropriate limits set.
(Starting web server units through systemd would also mean that your main web server process didn't have to be privileged or have a privileged helper, as Apache does with suexec. You could have the front end web server do the process starting and supervision itself, but then it would also need the privileges to change UIDs and the support for setting other per-user context information, some of which is system dependent.)
Although I'm not entirely fond of it, the simplest way to communicate between the main web server and the per-person web server would be through HTTP. Since HTTP reverse proxies are widely supported, this would also allow people to choose what program they'd use as their 'web server', rather than your default. However, you'd want to provide a default simple web server to handle static files, CGIs, and maybe PHP (which would be even simpler than my idea of a modern simple web server).
The main (or front-end) web server would still want to have a bunch of features like global rate limiting, since it's the only thing in a position to see aggregate requests across everyone's individual server. If you wanted to make life more complicated but also potentially more convenient, you could chose different protocols to handle different people's areas. One person could be handled via a HTTP reverse proxy, but another person might be handled through FastCGI because they purely use PHP and that's most convenient for them (provided that their FastCGI server could handle being started on demand and then stopping later).
While I started thinking of this in the context of personal home pages and personal CGIs, as we support on our main web server, you could also use this for having different people and groups manage different parts of your URL hierarchy, or even different virtual hosts (by making the URL hierarchy of the virtual host that was handed to someone be '(almost) everything').
With a certain amount of work you could probably build this today on Linux with systemd (Unix) socket activation, although I don't know what front-end or back-end web server you'd want to use. To me, it feels like there's a certain elegance to the 'everyone gets their own web server running under their own UID, go wild' aspect of this, rather than having to try to make one web server running as one UID do everything.
2025-06-11
Some thoughts on GNOME's systemd dependencies and non-Linux Unixes
One of the pieces of news of the time interval is (GNOME is) Introducing stronger dependencies on systemd (via). Back in the old days, GNOME was a reasonably cross-platform Unix desktop environment, one that you could run on, for example, FreeBSD. I believe that's been less and less true over time already (although the FreeBSD handbook has no disclaimers), but GNOME adding more relatively hard dependencies on systemd really puts a stake in it, since systemd is emphatically Linux-only.
It's possible that people in FreeBSD will do more and more work to create emulations of systemd things that GNOME uses, but personally I think this is quixotic and not sustainable. The more likely outcome is that FreeBSD and other Unixes will drop GNOME entirely, retaining only desktop environments that are more interested in being cross-Unix (although I'm not sure what those are these days; it's possible that GNOME is simply more visible in its Linux dependency).
An aspect of this shift that's of more interest to me is that GNOME is the desktop environment and (I believe) GUI toolkit that has been most vocal about wanting to drop support for X, and relatively soon. The current GDM has apparently already dropped support for starting non-Wayland sessions, for example (at least on Fedora, although it's possible that Fedora has been more aggressive than GNOME itself recommends). This loss of X support in GNOME has been a real force pushing for Wayland, probably including Wayland on FreeBSD. However, if FreeBSD no longer supports GNOME, the general importance of Wayland for FreeBSD may go down. Wayland's importance would especially go down if the general Unix desktop world splits into one camp that is increasingly Linux-dependent due to systemd and Wayland requirements, and another camp that is increasingly 'old school' non-systemd and X only. This second camp would become what you'd find on FreeBSD and other non-Linux Unixes.
(Despite what Wayland people may tell you, there are still a lot of desktop environments that have little or no Wayland support.)
However, this leaves the future of X GUI applications that use GTK somewhat up in the air. If GTK is going to seriously remain a cross-platform thing and the BSDs are effectively only doing X, then GTK needs to retain X support and GTK based applications will work on FreeBSD (at least as much as they ever do). But if the GNOME people decide that 'cross-platform' for GTK doesn't include X, the BSDs would be stuck in an awkward situation. One possibility is that there are enough people using FreeBSD (and X) with GTK applications that they would push the GNOME developers to keep GTK's X support.
(I care about this because I want to keep using X for as long as possible. One thing that would force me to Wayland is if important programs such as Firefox stop working on X because the GUI toolkits they use have dropped X support. The more pressure there is from FreeBSD people to keep the X support in toolkits, the better for me.)
2025-06-10
Python argparse has a limitation on argument groups that makes me sad
Argparse is the straightforward standard library module for handling command line arguments, with a number of nice features. One of those nice features is groups of mutually exclusive arguments. If people can only give one of '--quiet' and '--verbose' and both together make no sense, you can put them in a mutually exclusive group and argparse will check for you and generate an appropriate error. However, mutually exclusive groups have a little limitation that makes me sad.
Suppose, not hypothetically, that you have a Python program that has some timeouts. You'd like people using the program to be able to adjust the various sorts of timeouts away from their default values and also to be able to switch it to a mode where it never times out at all. Generally it makes no sense to adjust the timeouts and also to say not to have any timeouts, so you'd like to put these in a mutually exclusive group. If you have only a single timeout, this works fine; you can have a group with '--no-timeout' and '--timeout <TIME>' and it works. However, if you have multiple sorts of timeouts that people may want adjust all of, this doesn't work. If you put all of the options in a single mutually exclusive group, people can only adjust one timeout, not several of them. What you want is for the '--no-timeouts' switch to be mutually exclusive with a group of all of the timeout switches.
Unfortunately, if you read the current argparse documentation, you will find this note:
Changed in version 3.11: Calling
add_argument_group()
oradd_mutually_exclusive_group()
on a mutually exclusive group is deprecated. These features were never supported and do not always work correctly. The functions exist on the API by accident through inheritance and will be removed in the future.
You can nest a mutually exclusive group inside a regular group, and there are some uses for this. But you can't nest any sort of group inside a mutually exclusive group (or a regular group inside of a regular group). At least not officially, and there are apparently known issues with doing so that won't ever be fixed, so you probably shouldn't do it at all.
Oh well, it would have been nice.
(I suspect one reason that this isn't officially supported is that working out just what was conflicting with what in a pile of nested groups (and what error message to emit) might be a bit complex and require explicit code to handle this case.)
As an extended side note, checking this by hand isn't necessarily all that easy. If you have something, such as timeouts, that have a default value but can be changed by the user, the natural way to set them up in argparse is to make the argparse default value your real default value and then use the value argparse sets in your program. If the person running the program used the switch, you'll get their value, and if not you'll get your default value, and everything works out. Unfortunately this usage makes it difficult or impossible to see if the person running your program explicitly gave a particular switch. As far as I know, argparse doesn't expose this information, so at a minimum you have to know what your default value is and then check to see if the current value is different (and this doesn't catch the admittedly unlikely case of the person using the switch with the default value).
2025-06-09
Potential issues in running your own identity provider
Over on the Fediverse, Simon Tatham had a comment about (using) cloud identity providers that's sparked some discussion. Yesterday I wrote about the facets of identity providers. Today I'm sort of writing about why you might not want to run your own identity provider, despite the hazards of depending on the security of some outside third party. I'll do this by talking about what I see as being involved in the whole thing.
The hardcore option is to rely on no outside services at all, not even for multi-factor authentication. This pretty much reduces your choices for MFA down to TOTP and perhaps WebAuthn, either with devices or with hardware keys. And of course you're going to have to manage all aspects of your MFA yourself. I'm not sure if there's capable open source software here that will let people enroll multiple second factors, handle invalidating one, and so on.
One facet of being an identity provider is managing identities. There's a wide variety of ways to do this; there's Unix accounts, LDAP databases, and so on. But you need a central system for it, one that's flexible enough to cope with with real world, and that system is load bearing and security sensitive. You will need to keep it secure and you'll want to keep logs and audit records, and also backups so you can restore things if it explodes (or go all the way to redundant systems for this). If the identity service holds what's considered 'personal information' in various jurisdictions, you'll need to worry about an attacker being able to bulk-extract that information, and you'll need to build enough audit trails so you can tell to what extent that happened. Your identity system will need to be connected to other systems in your organization so it knows when people appear and disappear and can react appropriately; this can be complex and may require downstream integrations with other systems (either yours or third parties) to push updates to them.
Obviously you have to handle primary authentication yourself (usually through passwords). This requires you to build and operate a secure password store as well as a way of using it for authentication, either through existing technology like LDAP or something else (this may or may not be integrated with your identity service software, as passwords are often considered part of the identity). Like the identity service but more so, this system will need logs and audit trails so you can find out when and how people authenticated to it. The log and audit information emitted by open source software may not always meet your needs, in which case you may wind up doing some hacks. Depending on how exposed this primary authentication service is, it may need its own ratelimiting and alerting on signs of potential compromised accounts or (brute force) attacks. You will also definitely want to consider reacting in some way to accounts that pass primary authentication but then fail second-factor authentication.
Finally, you will need to operate the 'identity provider' portion of things, which will probably do either or both of OIDC and SAML (but maybe you (also) need Kerberos, or Active Directory, or other things). You will have to obtain the software for this, keep it up to date, worry about its security and the security of the system or systems it runs on, make sure it has logs and audit trails that you capture, and ideally make sure it has ratelimits and other things that monitor for and react to signs of attacks, because it's likely to be a fairly exposed system.
If you're a sufficiently big organization, some or all of these services probably need to be redundant, running on multiple servers (perhaps in multiple locations) so the failure of a single server doesn't lock you out of everything. In general, all of these expose you to all of the complexities of running your own servers and services, and each and all of them are load bearing and highly security sensitive, which probably means that you should be actively paying attention to them more or less all of the time.
If you're lucky you can find suitable all-in-one software that will handle all the facets you need (identity, primary authentication, OIDC/SAML/etc IdP, and perhaps MFA authentication) in a way that works for you and your organization. If not, you're going to have to integrate various different pieces of software, possibly leaving you with quite a custom tangle (this is our situation). The all in one software generally seems to have a reputation of being pretty complex to set up and operate, which is not surprising given how much ground it needs to cover (and how many protocols it may need to support to interoperate with other systems that want to either push data to it or pull data and authentication from it). As an all-consuming owner of identity and authentication, my impression is that such software is also something that's hard to add to an existing environment after the fact and hard to swap out for anything else.
(So when you pick an all in one open source software for this, you really have to hope that it stays good, reliable software for many years to come. This may mean you need to build up a lot of expertise before you commit so that you really understand your choices, and perhaps even do pilot projects to 'kick the tires' on candidate software. The modular DIY approach is more work but it's potentially easier to swap out the pieces as you learn more and your needs change.)
The obvious advantage of a good cloud identity provider is that they've already built all of these systems and they have the expertise and infrastructure to operate them well. Much like other cloud services, you can treat them as a (reliable) black box that just works. Because the cloud identity provider works at a much bigger scale than you do, they can also afford to invest a lot more into security and monitoring, and they have a lot more visibility into how attackers work and so on. In many organizations, especially smaller ones, looking after your own identity provider is a part time job for a small handful of technical people. In a cloud identity provider, it is the full time job of a bunch of developers, operations, and security specialists.
(This is much like the situation with email (also). The scale at which cloud providers operates dwarfs what you can manage. However, your identity provider is probably more security sensitive and the quality difference between doing it yourself and using a cloud identity provider may not be as large as it is with email.)
2025-06-08
Thinking about facets of (cloud) identity providers
Over on the Fediverse, Simon Tatham had a comment about cloud identity providers, and this sparked some thoughts of my own. One of my thoughts is that in today's world, a sufficiently large organization may have a number of facets to its identity provider situation (which is certainly the case for my institution). Breaking up identity provision into multiple facets can leave it not clear if and to what extend you could be said to be using a 'cloud identity provider'.
First off, you may outsource 'multi-factor authentication', which is to say your additional factor, to a specialist SaaS provider who can handle the complexities of modern MFA options, such as phone apps for push-based authentication approval. This SaaS provider can turn off your ability to authenticate, but they probably can't authenticate as a person all by themselves because you 'own' the first factor authentication. Well, unless you have situations where people only authenticate via their additional factor and so your password or other first factor authentication is bypassed.
Next is the potential distinction between an identity provider and an authentication source. The identity provider implements things like OIDC and SAML, and you may have to use a big one in order to get MFA support for things like IMAP. However, the identity provider can delegate authenticating people to something else you run using some technology (which might be OIDC or SAML but also could be something else). In some cases this delegation can be quite visible to people authenticating; they will show up to the cloud identity provider, enter their email address, and wind up on your web-based single sign on system. You can even have multiple identity providers all working from the same authentication source. The obvious exposure here is that a compromised identity provider can manufacture attested identities that never passed through your authentication source.
Along with authentication, someone needs to be (or at least should be) the 'system of record' as to what people actually exist within your organization, what relevant information you know about them, and so on. Your outsourced MFA SaaS and your (cloud) identity providers will probably have their own copies of this data where you push updates to them. Depending on how systems consume the IdP information and what other data sources they check (eg, if they check back in with your system of record), a compromised identity provider could invent new people in your organization out of thin air, or alter the attributes of existing people.
(Small IdP systems often delegate both password validation and knowing who exists and what attributes they have to other systems, like LDAP servers. One practical difference is whether the identity provider system asks you for the password or whether it sends you to something else for that.)
If you have no in-house authentication or 'who exists' identity system and you've offloaded all of these to some external provider (or several external providers that you keep in sync somehow), you're clearly at the mercy of that cloud identity provider. Otherwise, it's less clear and a lot more situational as to when you could be said to be using a cloud identity provider and thus how exposed you are. I think one useful line to look at is to ask whether a particular identity provider is used by third party services or if it's only used to for that provider's own services. Or to put it in concrete terms, as an example, do you use Github identities only as part of using Github, or do you authenticate other things through your Github identities?
(With that said, the blast radius of just a Github (identity) compromise might be substantial, or similarly for Google, Microsoft, or whatever large provider of lots of different services that you use.)