One core problem with DNSSEC
As a sysadmin, my view of DNSSEC is that life is too short for me to debug other people's configuration problems.
One fundamental problem of DNSSEC today is that it suffers from the false positive problem, the same one that security alerts suffer from. In practice today, for almost all people almost all of the time, a DNSSEC failure is not a genuine attack; it is a configuration mistake, and the configuration mistake is almost never on the side making the DNS query. This means that almost all of the time, DNSSEC acts by stopping you from doing something safe that you want to do and further, you can't fix the DNSSEC problem except by turning off DNSSEC, because it's someone else's mistake (in configuration, in operation, or in whatever).
This is not a recipe for a nice experience for actual people; this is a recipe for mathematical security. As such, it is a security failure. Any security system that is overwhelmed by false positives has absolutely failed to tackle the real problem, which is that your security system must be useful. Security systems that are not useful get turned off, and that is exactly what is happening with DNSSEC.
Another big problem with DNSSEC today, one that magnifies the core problem, is that it has terrible visibility and diagnostics (at least in common implementations). If there is a DNSSEC related failure, generally what happens is that you don't get DNS answers. You don't get told that what has failed is DNSSEC and you don't get a chance to bypass it and proceed anyway (however dangerous that choice might be in practice); instead you mysteriously fail. Mysterious failures are what you could politely call a terrible user experience. Mysterious failures that are not your fault and that you cannot fix (except by turning off DNSSEC) are worse.
(DNSSEC advocates may protest that this is not how it is supposed to work. I am afraid that security measures exist in the real world, where how it actually works is what actually matters. Once again, security is not mathematics.)
PS: To the extent that people are experiencing DNS attacks, the modern Internet world has chosen to deal with it in another way, through HTTPS and TLS in general.
(I have written before about my older experiences with DNSSEC and how I thought DNSSEC would have to be used in the real world. Needless to say, the DNSSEC people have continued with the program of 'everyone must get it right all the time, no errors allowed, hard failures for everyone' since back then in 2014. For my views on DNSSEC in general, well, see this.)
Non-uniform caches are harder to make work well
One way to view what can happen to your Unix system when you don't have swap space is that it's one more case of the Unix virtual memory system facing additional challenges because it is what I will call a non-uniform cache. In a uniform cache, all entries come from the same source at the same speed (more or less), can naturally be accessed as fast and as frequently as each other, and can be evicted or freed at the same speed and volume. In a non-uniform cache, some or many of those are not true. A Unix system without swap is an extreme case, since one sort of pages cannot be evicted from RAM at all, but Unix has experienced problems here before, for example when it introduced a unified buffer cache and discovered that certain sorts of pages could naturally be accessed a lot faster than others.
One source of problems is that a non-uniform cache mingles together two factors when you observe pressure on it. In a uniform cache, the observed pressure on elements in the cache is a true reflection of the real needs of the things using the cache. In a non-uniform cache, the pressure you observe is some combination of how much elements are really needed and how readily they can be fetched, accessed, and dropped. To work out the true pressure and balance the cache properly, the system needs some way to split these two factors apart again, generally by knowing or working out the impact of the various sorts of non-uniformity.
(Then, of course, it needs to be able to balance the cache at all. Having no swap space is an extreme case, but even with swap space you can usually only evict so many anonymous pages from RAM.)
Automatically coping with or working out the impact of non-uniformity is a hard problem, which is one reason that tuning knobs proliferate in non-uniform caches (another is that punting the problem to the cache's users is a lot easier than even trying). Another surprisingly hard problem seems to be realizing that you even have a non-uniform cache at all, or at least that the non-uniformity is going to matter and how it will manifest (many caches have some degree of non-uniformity if you look at them closely enough).
(This probably shouldn't be surprising; in practice, it's impossible to fully understand what complex systems are doing in advance.)
One corollary of this for me is that if I'm creating or dealing with a cache, I should definitely think about whether it might be non-uniform and what effects that might have. It's tempting to think that your cache is sufficiently uniform that you don't have to dig deeper, but it's not always so, and ignoring that a cache is non-uniform is a great way to get various sorts of bad and frustrating performance under load.
(Of course if I really care I should profile the cache for the usual reasons.)
The practical difference between CPU TDP and observed power draw illustrated
Last year, in the wake of doing power measurements on my work machine and my home machine, I wrote about how TDP is misleading. Recently I was re-reading this Anandtech article on the subject (via), and realized that I actually have a good illustration of the difference between TDP and power draw, and on top of that it turns out that I can generate some interesting numbers on the official power draw of my home machine's i7-8700K under load.
I'll start with the power consumption numbers for my machines. I have a 95 W TDP Intel CPU, but when
I go from idle to a full load of
mprime -t, my home machine's power
consumption goes from 40 watts to 174 watts, an increase of 134
watts. Some of the extra power consumption will come from the PSU
not being 100% efficient, but based on this review,
my PSU is still at least 90% efficient around the 175 watt level
(and less efficient at the unloaded 40 watt level). Other places
where the power might vanish on the way to the CPU are the various
fans in the system and any inefficiencies in power regulation and
supply that the motherboard has.
(Since motherboard voltage regulation systems get hot under load, they're definitely not 100% efficient. That heat doesn't appear out of nowhere.)
However, there's another interesting test that I can do with my
home machine. Since I have a modern Intel CPU, it supports Intel's
RAPL (Running Average Power Limit) system
and Mozilla has a
rapl program in the Firefox source tree
that will provide a report that is more or less the CPU's power
usage, as Intel thinks it is.
Typical output from
rapl for my home machine under light load, such as
writing this entry over a SSH connection in an
xterm, looks like this
(over 5 seconds):
total W = _pkg_ (cores + _gpu_ + other) + _ram_ W [...] #06 3.83 W = 2.29 ( 1.16 + 0.05 + 1.09) + 1.54 W
When I load my machine up with '
mprime -t', I get this (also over
#146 106.23 W = 100.15 (97.46 + 0.01 + 2.68) + 6.08 W #147 106.87 W = 100.78 (98.04 + 0.06 + 2.68) + 6.09 W
Intel's claimed total power consumption for all cores together is surprisingly close to their 95 W TDP figure, and Intel says that the whole CPU package has increased its power draw by about 100 watts. That's not all of the way to my observed 134 watt power increase, but it's a lot closer than I expected.
(Various things I've read are inconsistent about whether or not I should be expecting my CPU to be exceeding its TDP in terms of power draw under a sustained full load. Also, who knows what the BIOS has set various parameters to, cf. I haven't turned on any overclocking features other than an XMP memory profile, but that doesn't necessarily mean much with PC motherboards.)
As far as I know AMD Ryzen has no equivalent to Intel's RAPL, so I
can't do similar measurements on my work machine. But now that
I do the math on my power usage measurements, both the Ryzen and the Intel increased
their power draw by the same 134 watts as they went from idle to a
mprime -t load. Their different power draw under full load
is entirely accounted for by the Ryzen idling 26 watts higher than
Wireless networks have names and thus identify themselves
Recently something occurred to me that sounds obvious when I phrase it this way, which is that wireless networks have names. Wireless networks intrinsically identify themselves through their SSID. This is unlike wired networks, which mostly have no reliable identifier (one exception is wired networks using IEEE 802.1X authentication, since clients need to know what they're authenticating to).
This matters because there are a number of situations where programs might want to know what network they're on, so they can treat different networks differently. As a hypothetical example, browsers might want to apply different security policies to different networks. With wireless networking, the browser can at least theoretically know what network it's on; with wired networking, probably it can't (not reliably, at any rate).
(Another case where you might want to behave differently depending on what network you're connected to is DNS over HTTPS. On some networks, not only can you trust the DNS server you've gotten to be not malicious, but you know you need to use it to resolve names properly. On random others, you may definitely know you want to bypass their DNS server in favour of a more trusted DoH server.)
PS: I believe that Windows somewhat attempts to identify 'what network are we on' even on a wired connection, presumably based on various characteristics of the network it gets from DHCP information and other sources (this is apparently called 'network locations'). My experience with this is that it's annoying because it keeps thinking that my virtualized Windows system is moving from network to network even though it isn't. This makes a handy demonstration of the hazards of trying to do this for wired networks, namely that you're relying on heuristics and they can misfire in both directions.
Some brief views on iOS clients for Mastodon (as of mid 2019)
I'm on Mastodon and I have both an iPhone and an iPad, so of course I've poked at a number of iOS clients for Mastodon. (I'm restricting my views to Mastodon specifically instead of the Fediverse as a whole because I've never used any of these clients on a non-Mastodon instance.)
I'll put my UI biases up front; what I want is basically Tweetbot for Mastodon. I think that Twitter and Mastodon are pretty similar environments, and Tweetbot has a very well polished interface and UI that works quite well. Pointless departures from the Tweetbot experience irritate me, especially if they also waste some of the limited screen space. Also, I can't say that I've tried out absolutely every iOS Mastodon client.
- Amaroq is a perfectly good straightforward iPhone Mastodon
client that delivers the basic timeline experience that you'd
want, and it's free. Unfortunately it's iPhone only. It's not
updated all that often so it's not going to be up to date on the
latest Mastodon features. As far as I know it only has one colour
scheme, white on black (or dark blue, I'm not sure).
- Tootdon is also a perfectly good straightforward Mastodon client,
and unlike Amaroq it works on iPads too. It's free, but it has
the drawback that it sends a copy of toots it sees off to
its server, where
they are (or were) only kept for a month and only used for
searches. The Tootdon
My memory is that I found Tootdon not as nice as Amaroq on my iPhone, when I was still using both clients.
- Toot! is the best iPad client that I've found and is pretty good
on the iPhone too. It has all of the features you'd expect and a
number of little conveniences (such as inlining partial content
from a lot of links, which is handy when people I follow keep
linking to Twitter; actually visiting Twitter links is a giant
pain on a phone, entirely due to how Twitter acts). It's a paid
client but, like Tweetbot, I don't regret
spending the money.
Toot! is not perfect on an iPad because it insists on wasting a bit too much space on its sidebar; you can see this in its iPad screen shots. It has a public issue tracker, so perhaps I should raise this grump there.
- Mast is written by an enthusiastic and energetic programmer with
many ideas, which very much shows in the end result. Some people like it a great
deal and consider it the best designed iOS client. I think it's
a good iPhone client but not particularly great on an iPad, where
it wastes too much space all of the time and has UI elements that
don't seem to work very well. It's a paid client too.
(Mast has had several iterations of its UI on the iPad. As I write this, the current UI squeezes the actual toots into a narrow column in order to display at least one other column that I care much less about.)
I find that Mast is a somewhat alarming client to use, because it has so many features that touching and moving my finger almost anywhere can start to do something. So far I haven't accidentally re-tooted something or the like, but it feels like it's only a matter of time. I really wish there was a way to get Mast to basically calm down.
I think that Mast and Toot! are very close to each other on the iPhone; there are some days where I prefer one and other days when I like the other better. On my iPad it is no contest; the only client I use there is Toot!, because I decided that I wasn't willing to put up with what Tootdon was doing (partly because I wasn't willing to be responsible for sending other people's toots off to some server somewhere under unclear policies).
Both Toot! and Mast have a black on white colour scheme, among others. Mast has many, many customizations and options; Toot! has a moderate amount that cover the important things.
(I have both Mast and Toot! because I bought Mast first based on some people's enthusiastic praise for it, then wound up feeling sufficiently dissatisfied with it on my iPad that I was willing to buy another client.)
PS: I have no opinion on Linux clients; so far I just use the website. This works well at the moment because my Mastodon timeline is low traffic and there's no point in checking it very often.
(The problem with visiting Twitter links in a phone browser is that Twitter keeps popping up interstitial dialogs that try to get me to log in and run a client. Roughly every other time I follow a Twitter link something gets shoved in the way and I have to dismiss it. Needless to say, I hate playing popup roulette when I follow links.)
SMART drive self-tests seem potentially useful, but not too much
I've historically ignored all aspects of hard drive SMART apart, perhaps, from how
smartd would occasionally email us to complain about things and
sometimes those things would even be useful. There is good reason
to be a SMART sceptic, seeing as many of the SMART attributes are
underdocumented, SMART itself is peculiar and obscure, hard drive
vendors have periodically had their drives outright lie about SMART
things, and SMART attributes are not necessarily good predictors
of drive failures (plenty of drives die abruptly with no SMART
warnings, which can be unnerving). Certain sorts
of SMART warnings are usually indicators of problems (but not
always), but the absence of SMART warnings
is no safety (see eg,
and also Blackblaze from 2016).
smartctl manpage is very long.
But, in the wake of our flaky SMART errors
and some other events with Crucial SSDs here, I wound up digging
deeper into the
smartctl manpage and experimenting with SMART
where the hard drive tries to test itself, and SMART logs, where
the hard drive may record various useful things like read errors
or other problems, and may even include the sector number involved
(which can be useful for various things). Like much of the rest of
SMART, what SMART self-tests do is not precisely specified or
documented by drive vendors, but generally it seems that the 'long'
self-test will read or scan much of the drive.
By itself, this probably isn't much different than what you could
dd or a software RAID scan. From my perspective,
what's convenient about SMART self-tests is that you can kick them
off in the background regardless of what the drive is being used
for (if anything), they probably won't get too much in the way of
your regular IO, and after they're done they automatically leave a
record in the SMART log, which will probably persist for a fair
while (depending on how frequently you run self-tests and so on).
On the flipside, SMART self-tests have the disadvantage that you don't really know what they're doing. If they report a problem, it's real, but if they don't report a problem you may or may not have one. A SMART self-test is better than nothing for things like testing your spare disks, but it's not the same as actually using them for real.
On the whole, my experimentation with SMART self-tests leaves me feeling that they're useful enough that I should run them more often. If I'm wondering about a disk and it's not being used in a way where all of it gets scanned routinely, I might as well throw a self-test at it to see what happens.
(They probably aren't useful and trustworthy enough to be worth scripting something so that we routinely run self-tests on drives that aren't already in software RAID arrays.)
PS: Much but not all of my experimentation so far has been on hard
drives, not SSDs. I don't know if the 'long' SMART self-test on a
SSD tests more thoroughly and reaches more bits of the drive internals
than you can with just an external read test like
dd, or conversely
if it's less thorough than a full read scan.
Intel's approach to naming Xeon CPUs is pretty annoying
Suppose, hypothetically, that someone offers to pass on to you (by which I mean your group) an old server or two, and casually says that they have 'Xeon E7-4830' CPUs. Do you want the machines?
Well, that depends. As I found out in the process of writing yesterday's entry on the effects of MDS on our server fleet, a Xeon that is called an 'E7-4830' may be anything from a first generation Westmere CPU from 2011 up to (so far) a Broadwell CPU from 2016, the 'E7-4830 v4'. Intel has recycled the 'E7-4830' label for at least four different CPUs so far, the original, v2 (Ivy Bridge, from 2014), v3 (Haswell, 2015), and the most recent v4. These are not just minor iterations; they have had variations in core count, processor base and turbo frequencies, cache size, CPU socket used, and supported memory and memory speeds.
These days, they also have differences in microcode availability for various Intel CPU weaknesses, since Westmere generation CPUs will not be updated for MDS at all. I'd guess that Intel will provide microcode security updates for future issues for Ivy Bridge (v2) for a few years more, since it's from 2014 and that's fairly recent as these things go, but no one can be sure of anything except that there will probably be more vulnerabilities discovered and more microcode updates needed. This makes the difference between the various versions potentially much more acute and significant.
All of these different E7-4830's are very much not different versions of the same product, unless you re-define 'product' to be 'a marketing segment label'. An E7-4830 v4 is not a substitute for another version, except in the generic sense that any Xeon with ECC is a substitute for another one. Instead of being product names, Intel has made at least some of their Xeon CPU names into essentially brands or sub-brands. This is really rather confusing if you don't know what's going on (because you are not that immersed in the Xeon world) and so do not know how critical the presence or omission of the 'vN' bit of the CPU name is.
(It would be somewhat better if Intel labeled the first iteration
as 'Xeon E7-4830 v1', especially in things like the names that the
CPUs themselves return. As I discovered in the process of researching
all of this, the CPU name information shown in places like Linux's
/proc/cpuinfo doesn't come from a mapping table in software;
instead, this 'model name', 'brand name' or 'brand string' is
directly exposed by the processor through one use of the
This is of course not unique to the E7-4830, or even Xeon E7s in general. The Xeon E5 and E3 lines also have their duplicate CPU names. In a casual search to confirm this, I was particularly struck with the E3-1225, which exists in original, v2, v3, v5, and v6 versions, but apparently not in v4 (although the Intel MDS PDF does list some Xeon E3 v4 CPUs).
Hardware Security Modules are just boxes running opaque and probably flawed software
The news of the time interval is that researchers discovered a remote unauthenticated attack giving full, persistent control over a HSM (via). When I read the details, I was not at all surprised to read that one critical issue was that the internal HSM code implementing the PKCS#11 commands had exploitable buffer overflows, because sooner or later everyone seems to have code problems with PKCS#11 (I believe it's been a source of issues for Unix TLS/SSL libraries, especially OpenSSL).
(The flaws have apparently since been fixed in a firmware update by the HSM vendor, which sounds good until you remember that some people deliberately destroy the ability to apply firmware updates to their HSMs to avoid the possibility of being compelled to apply a firmware update that introduces a back door.)
There is perhaps a tendency to think that HSMs and hardware security keys are magic and invariably secure and flawless. As this example and the Infineon RSA key generation issue demonstrate quite vividly, HSMs are just things running opaque proprietary software that is almost certainly not as good or as well probed as open source code. Proprietary software development is not magic, any more than open source development is, but open source code has the advantage that it's much easier to inspect, fuzz, and so on, and if a project is popular, there probably are a number of people doing that. The number of people who will ever apply this level of scrutiny to your average HSM is much lower, just as it is much lower with most proprietary software.
This doesn't mean that HSMs are useless, especially as hardware security tokens for authenticating people (where under most circumstances they serve as proof of something that you have). But I have come to put much less trust in them and look much more critically at their use. For server side situations under many threat models, I increasingly think that you might be better off building a carefully secured and sealed Unix machine of your own, using well checked open source components.
(Real HSMs are hopefully better secured against hardware tampering than any build it yourself option, but how much you care about this depends on your threat model. An entirely encrypted system that is not on the network and must have a boot password supplied when it powers on goes a long way. Talk to it over a serial port using a limited protocol and write all of the software in a memory safe language using popular and reasonably audited cryptography libraries, or audited tools that work at as high a level as you can get away with.)
PS: The one flaw in the build your own approach in a commercial setting is that often security is not really what you care most about. Instead, you may well care most about is that it's not your fault if something goes wrong. If you buy a well regarded HSM and then a year later some researchers go to a lot of work and find a security flaw in it, that is not your fault. If you build your own and it gets hacked, that is your fault. Buying the HSM is much safer from a blame perspective than rolling your own, even if the actual security may be worse.
(This is a potential motivation even in non-commercial settings, although the dynamics are a bit different. Sometimes what you really care most about is being able to clearly demonstrate due diligence.)
Some general things and views on DNS over HTTPS
Over on Twitter I said something and then Brad Beyenhof asked me a sensible question related to DNS over HTTPS. Before I elaborate my Twitter answers to that specific question, I want to do an overview of some general views on DNS over HTTPS on the whole.
DNS over HTTPS (hereafter DoH) is what it sounds like; it's a protocol for making DNS queries over a HTTPS connection. There is also the older DNS over TLS, but my impression is that DoH has become more popular, perhaps partly because it's more likely to make it through middleware firewalls and so on. DoH and DoT are increasingly popular ideas for the straightforward reason that on the modern Internet, ISPs are one of your threats. A significant number of ISPs snoop on your DNS traffic to harvest privacy-invasive things about your Internet activities, and some of them actively tamper with DNS results. DoH (and DoT) mostly puts a stop to both DNS snooping and DNS interference.
(It's likely that DoH doesn't completely stop these because of the possibility of traffic analysis of the encrypted traffic, which has already been shown to reveal information about HTTPS browsing, but we probably won't know how serious an issue this is until DoH is enough used in the field to attract research attention.)
As far as I can tell, DoH (and DoT) are currently used and intended to be used between clients and resolving DNS servers. If there is work to protect queries between a resolving DNS server and authoritative servers, I think it's using a different protocol (Wikipedia mentions DNSCurve, but it's not widely adopted). This doesn't matter for typical users, who use someone else's DNS server, but it matters for people who run their own local resolving server that wants to query directly to authoritative servers instead of delegating to another resolving server.
DoH protects you from your ISP monitoring or tampering with your DNS queries, but it doesn't protect you from your DoH server of choice doing either. To protect against tampering, you'd need some form of signed DNS (but then see Against DNSSEC). It also doesn't currently protect you against your ISP knowing what HTTPS websites you go on to visit, because of SNI; until we move to ESNI, we're always sending the website name (but not the URL) in the clear as part of the early TLS negotiation, so your ISP can just capture it there. In fact, if your ISP wants to know what HTTPS sites you visit, harvesting the information from SNI is easier than getting your DNS queries and then correlating things.
(There is very little protection possible against your DoH server monitoring your activity; you either need to have a trustworthy DNS server provider or run your server yourself in a situation where its queries to authoritative servers will probably not be snooped on.)
More or less hard-coded use of DNS over HTTPS servers in client programs like browsers poses the same problems for sysadmins as basically any hard-coded use of an upstream resolver does, which is that it will bypass any attempts to use split-horizon DNS or other ways of providing internal DNS names and name bindings. Since these are obvious issues, I can at least optimistically hope that organizations like Mozilla are thinking about how to make this work without too much pain (see the current Mozilla wiki entry for some discussion of this; it appears that Mozilla's current setup will work for unresolvable names but not names that get a different result between internal and external DNS).
PS: There are a number of recursive DNS servers that can be configured to use DoH to an upstream recursive server; see eg this brief guide for Unbound. Unbound can also be configured to be a DoH server itself for clients, but this doesn't really help the split horizon case for reasons beyond the scope of this entry.
TLS certificate rollover outside of the web is complex and tangled
On the web, renewing and rolling over TLS certificates is a well understood thing, with best practices that are exemplified by a Let's Encrypt based system. There is a chain of trust starting from the server's certificate and running up to a root certificate that browsers know, and everything except the root certificate is provided to the browser by the web site. Server certificates are rolled over regularly and automatically, and when this happens the website is also provided with the rest of the certificate chain, which it can and should simply serve to browsers. Intermediate certificates roll over periodically, but root certificates almost never do.
However, all of this relies on a crucial property, which is that web server certificates are ephemeral; they're used in the course of a single HTTPS connection and then they're forgotten. This means that clients visiting the website don't have to be updated when the web server certificate chain changes. Only the web server has to change, and the web PKI world has made a collective decision that we can force web servers to change on a regular basis.
(The one thing that requires browsers and other clients to change is changing Certificate Authority root certificates; how slow and hard it is to do that is part of the reason why CA root certificates are extremely long-lived.)
However, you don't always have ephemeral signatures and certificates, at least not naturally. One obvious case is the various forms of code signing, where you will get a signed blob once and then keep it for a long time, periodically re-verifying its signature and chain of trust. As Mozilla has demonstrated, rolling over any of the certificates involved in the signing chain is rather more challenging and being capable of doing it is going to have significant effects on the design of your system.
Very generally, if you want true certificate rollover (where the keys change), you need to have some way of re-signing existing artifacts and then (re-)distributing at least the new signatures and trust chain. If your signatures are detached signatures, stored separately from the blob being signed, you only need to propagate them; if the signatures are part of the blob, you need to re-distribute the whole blob. If you redistribute the whole blob, you need to take care that the only change is to the signature; people will get very unhappy if a new signature and chain causes other changes. You can also contrive more complex schemes where an integrated signature chain can be supplemented by later detached signatures, or signatures in some database maintained by the client program that re-verifies signatures.
(Since what you generally sign is actually a cryptographic hash of the blob, you don't have to keep full copies of every version of everything you've ever signed and still consider valid; it's sufficient to be able to re-sign the hashes. This does prevent you from changing the hash algorithm, though, and you may want to keep copies of the blobs anyway.)
For rolling over intermediate certificates in specific, I think that you only need access to the original end certificate to produce a version re-signed with your new intermediate, and you should be keeping a copy of these certificates anyway. You could then contrive a scheme where if a signature chain fails to verify purely because one of the certificates is out of its validity period, the client attempts to contact your service to get a new version of the end certificate and trust chain (and then saves the result, if it validates). But this scheme still requires the client to contact you periodically to download signature updates that are relevant to it.
In Mozilla's case, they have the advantage that their program is more or less intrinsically used online on the Internet (with small exceptions) and is generally already set to check for updates to various things. Fetching some sort of signature updates for addons would not be a huge change to Firefox's operation in general, although it probably would be a significant change to how addons are verified and perhaps how they're updated.
All of this is an area of TLS certificates and certificate handling that I don't normally think about. Until Mozilla's problem made this quite visible to me, I hadn't even considered how things like code signatures have fundamental operational differences from TLS website certificates.
PS: Short-lived code signing certificates and other non-ephemeral signatures have the obvious and not entirely pleasant side effect that the signed objects only keep working for as long as you maintain your (re-)signing infrastructure. If you decide to stop doing so, existing signed code stops being accepted as soon as its certificates run out. The friendly way to shut down is probably to switch over to extremely long lived certificates before you decommission things.
What usually identifies an intermediate or root TLS certificate
The usual way of describing TLS certificates for things like websites is that they are a chain of trust, where your website's TLS certificate is signed by a Certificate Authority's current intermediate certificate and then that intermediate certificate is signed by the CA's root certificate. For example, you can read about Let's Encrypt's chain of certificates. Some CAs use a chain of multiple intermediate certificates.
(Modern TLS certificates often include a URL to fetch their parent certificate, as covered here.)
But this raises a question, namely how a TLS certificate specifies what its parent certificate is (the certificate that signed it). Until recently, if you had asked me and I had answered off the cuff, I would have guessed that it was based on something like the cryptographic hash of the parent certificate. As it turns out, this is not how it works; instead, as you can find out from actually looking at a certificate, the parent certificate's identity is given by providing its X.509 Subject Name attribute, which identifies both the organization and the intermediate or root certificate's Common Name attribute.
(Okay, there are also Key Identifiers, see here and here. It appears that Key Identifiers are normally exactly that, which is to say that they identify the public key used, not the certificate as such.)
There are some interesting consequences of this that I didn't fully realize or think about before recently. The first is what Let's Encrypt does, which is that it has two versions of its intermediate certificate, one signed by its own root certificate and one signed by IdenTrust. Since they have the same Subject (including Common Name), TLS certificate validation by browsers and other things will accept either. If you inspect the current two Let's Encrypt X3 certificates, you'll find that they also have slightly different key validity periods, presumably because the one signed by the LE root was signed later (about seven months, if we go from the not-before date).
The second is that you can take an existing intermediate certificate and re-issue and re-sign it with new validity dates (and perhaps other modifications to certificate metadata) but the same keypair, the same Subject and Common Name, and so on. This new version of the intermediate certificate can be transparently substituted for the old version because everything that identifies it matches. However, it absolutely must re-use the same public key, because that's the only way to have existing signatures verify correctly.
(This is similar to how in the old days of manual TLS certificate renewal, you could re-use your existing keypair in your new CSR if you wanted to, although it wasn't the recommended practice.)
(See also how TLS certificates specify the hosts they're for which covers server certificates and talks more about X.509 Subject Names.)
PS: Re-signing a certificate this way doesn't require access to its private key, so I believe that if you wanted to, you could re-sign someone else's intermediate certificate with your own CA. I can imagine some potential uses of this under sufficiently weird circumstances, since this would create a setup where with the right certificate chain a single TLS certificate would validate against both a public CA and your internal CA.