Intel's approach to naming Xeon CPUs is pretty annoying
Suppose, hypothetically, that someone offers to pass on to you (by which I mean your group) an old server or two, and casually says that they have 'Xeon E7-4830' CPUs. Do you want the machines?
Well, that depends. As I found out in the process of writing yesterday's entry on the effects of MDS on our server fleet, a Xeon that is called an 'E7-4830' may be anything from a first generation Westmere CPU from 2011 up to (so far) a Broadwell CPU from 2016, the 'E7-4830 v4'. Intel has recycled the 'E7-4830' label for at least four different CPUs so far, the original, v2 (Ivy Bridge, from 2014), v3 (Haswell, 2015), and the most recent v4. These are not just minor iterations; they have had variations in core code, processor base and turbo frequencies, cache size, CPU socket used, and supported memory and memory speeds.
These days, they also have differences in microcode availability for various Intel CPU weaknesses, since Westmere generation CPUs will not be updated for MDS at all. I'd guess that Intel will provide microcode security updates for future issues for Ivy Bridge (v2) for a few years more, since it's from 2014 and that's fairly recent as these things go, but no one can be sure of anything except that there will probably be more vulnerabilities discovered and more microcode updates needed. This makes the difference between the various versions potentially much more acute and significant.
All of these different E7-4830's are very much not different versions of the same product, unless you re-define 'product' to be 'a marketing segment label'. An E7-4830 v4 is not a substitute for another version, except in the generic sense that any Xeon with ECC is a substitute for another one. Instead of being product names, Intel has made at least some of their Xeon CPU names into essentially brands or sub-brands. This is really rather confusing if you don't know what's going on (because you are not that immersed in the Xeon world) and so do not know how critical the presence or omission of the 'vN' bit of the CPU name is.
(It would be somewhat better if Intel labeled the first iteration
as 'Xeon E7-4830 v1', especially in things like the names that the
CPUs themselves return. As I discovered in the process of researching
all of this, the CPU name information shown in places like Linux's
/proc/cpuinfo doesn't come from a mapping table in software;
instead, this 'model name', 'brand name' or 'brand string' is
directly exposed by the processor through one use of the
This is of course not unique to the E7-4830, or even Xeon E7s in general. The Xeon E5 and E3 lines also have their duplicate CPU names. In a casual search to confirm this, I was particularly struck with the E3-1225, which exists in original, v2, v3, v5, and v6 versions, but apparently not in v4 (although the Intel MDS PDF does list some Xeon E3 v4 CPUs).
Hardware Security Modules are just boxes running opaque and probably flawed software
The news of the time interval is that researchers discovered a remote unauthenticated attack giving full, persistent control over a HSM (via). When I read the details, I was not at all surprised to read that one critical issue was that the internal HSM code implementing the PKCS#11 commands had exploitable buffer overflows, because sooner or later everyone seems to have code problems with PKCS#11 (I believe it's been a source of issues for Unix TLS/SSL libraries, especially OpenSSL).
(The flaws have apparently since been fixed in a firmware update by the HSM vendor, which sounds good until you remember that some people deliberately destroy the ability to apply firmware updates to their HSMs to avoid the possibility of being compelled to apply a firmware update that introduces a back door.)
There is perhaps a tendency to think that HSMs and hardware security keys are magic and invariably secure and flawless. As this example and the Infineon RSA key generation issue demonstrate quite vividly, HSMs are just things running opaque proprietary software that is almost certainly not as good or as well probed as open source code. Proprietary software development is not magic, any more than open source development is, but open source code has the advantage that it's much easier to inspect, fuzz, and so on, and if a project is popular, there probably are a number of people doing that. The number of people who will ever apply this level of scrutiny to your average HSM is much lower, just as it is much lower with most proprietary software.
This doesn't mean that HSMs are useless, especially as hardware security tokens for authenticating people (where under most circumstances they serve as proof of something that you have). But I have come to put much less trust in them and look much more critically at their use. For server side situations under many threat models, I increasingly think that you might be better off building a carefully secured and sealed Unix machine of your own, using well checked open source components.
(Real HSMs are hopefully better secured against hardware tampering than any build it yourself option, but how much you care about this depends on your threat model. An entirely encrypted system that is not on the network and must have a boot password supplied when it powers on goes a long way. Talk to it over a serial port using a limited protocol and write all of the software in a memory safe language using popular and reasonably audited cryptography libraries, or audited tools that work at as high a level as you can get away with.)
PS: The one flaw in the build your own approach in a commercial setting is that often security is not really what you care most about. Instead, you may well care most about is that it's not your fault if something goes wrong. If you buy a well regarded HSM and then a year later some researchers go to a lot of work and find a security flaw in it, that is not your fault. If you build your own and it gets hacked, that is your fault. Buying the HSM is much safer from a blame perspective than rolling your own, even if the actual security may be worse.
(This is a potential motivation even in non-commercial settings, although the dynamics are a bit different. Sometimes what you really care most about is being able to clearly demonstrate due diligence.)
Some general things and views on DNS over HTTPS
Over on Twitter I said something and then Brad Beyenhof asked me a sensible question related to DNS over HTTPS. Before I elaborate my Twitter answers to that specific question, I want to do an overview of some general views on DNS over HTTPS on the whole.
DNS over HTTPS (hereafter DoH) is what it sounds like; it's a protocol for making DNS queries over a HTTPS connection. There is also the older DNS over TLS, but my impression is that DoH has become more popular, perhaps partly because it's more likely to make it through middleware firewalls and so on. DoH and DoT are increasingly popular ideas for the straightforward reason that on the modern Internet, ISPs are one of your threats. A significant number of ISPs snoop on your DNS traffic to harvest privacy-invasive things about your Internet activities, and some of them actively tamper with DNS results. DoH (and DoT) mostly puts a stop to both DNS snooping and DNS interference.
(It's likely that DoH doesn't completely stop these because of the possibility of traffic analysis of the encrypted traffic, which has already been shown to reveal information about HTTPS browsing, but we probably won't know how serious an issue this is until DoH is enough used in the field to attract research attention.)
As far as I can tell, DoH (and DoT) are currently used and intended to be used between clients and resolving DNS servers. If there is work to protect queries between a resolving DNS server and authoritative servers, I think it's using a different protocol (Wikipedia mentions DNSCurve, but it's not widely adopted). This doesn't matter for typical users, who use someone else's DNS server, but it matters for people who run their own local resolving server that wants to query directly to authoritative servers instead of delegating to another resolving server.
DoH protects you from your ISP monitoring or tampering with your DNS queries, but it doesn't protect you from your DoH server of choice doing either. To protect against tampering, you'd need some form of signed DNS (but then see Against DNSSEC). It also doesn't currently protect you against your ISP knowing what HTTPS websites you go on to visit, because of SNI; until we move to ESNI, we're always sending the website name (but not the URL) in the clear as part of the early TLS negotiation, so your ISP can just capture it there. In fact, if your ISP wants to know what HTTPS sites you visit, harvesting the information from SNI is easier than getting your DNS queries and then correlating things.
(There is very little protection possible against your DoH server monitoring your activity; you either need to have a trustworthy DNS server provider or run your server yourself in a situation where its queries to authoritative servers will probably not be snooped on.)
More or less hard-coded use of DNS over HTTPS servers in client programs like browsers poses the same problems for sysadmins as basically any hard-coded use of an upstream resolver does, which is that it will bypass any attempts to use split-horizon DNS or other ways of providing internal DNS names and name bindings. Since these are obvious issues, I can at least optimistically hope that organizations like Mozilla are thinking about how to make this work without too much pain (see the current Mozilla wiki entry for some discussion of this; it appears that Mozilla's current setup will work for unresolvable names but not names that get a different result between internal and external DNS).
PS: There are a number of recursive DNS servers that can be configured to use DoH to an upstream recursive server; see eg this brief guide for Unbound. Unbound can also be configured to be a DoH server itself for clients, but this doesn't really help the split horizon case for reasons beyond the scope of this entry.
TLS certificate rollover outside of the web is complex and tangled
On the web, renewing and rolling over TLS certificates is a well understood thing, with best practices that are exemplified by a Let's Encrypt based system. There is a chain of trust starting from the server's certificate and running up to a root certificate that browsers know, and everything except the root certificate is provided to the browser by the web site. Server certificates are rolled over regularly and automatically, and when this happens the website is also provided with the rest of the certificate chain, which it can and should simply serve to browsers. Intermediate certificates roll over periodically, but root certificates almost never do.
However, all of this relies on a crucial property, which is that web server certificates are ephemeral; they're used in the course of a single HTTPS connection and then they're forgotten. This means that clients visiting the website don't have to be updated when the web server certificate chain changes. Only the web server has to change, and the web PKI world has made a collective decision that we can force web servers to change on a regular basis.
(The one thing that requires browsers and other clients to change is changing Certificate Authority root certificates; how slow and hard it is to do that is part of the reason why CA root certificates are extremely long-lived.)
However, you don't always have ephemeral signatures and certificates, at least not naturally. One obvious case is the various forms of code signing, where you will get a signed blob once and then keep it for a long time, periodically re-verifying its signature and chain of trust. As Mozilla has demonstrated, rolling over any of the certificates involved in the signing chain is rather more challenging and being capable of doing it is going to have significant effects on the design of your system.
Very generally, if you want true certificate rollover (where the keys change), you need to have some way of re-signing existing artifacts and then (re-)distributing at least the new signatures and trust chain. If your signatures are detached signatures, stored separately from the blob being signed, you only need to propagate them; if the signatures are part of the blob, you need to re-distribute the whole blob. If you redistribute the whole blob, you need to take care that the only change is to the signature; people will get very unhappy if a new signature and chain causes other changes. You can also contrive more complex schemes where an integrated signature chain can be supplemented by later detached signatures, or signatures in some database maintained by the client program that re-verifies signatures.
(Since what you generally sign is actually a cryptographic hash of the blob, you don't have to keep full copies of every version of everything you've ever signed and still consider valid; it's sufficient to be able to re-sign the hashes. This does prevent you from changing the hash algorithm, though, and you may want to keep copies of the blobs anyway.)
For rolling over intermediate certificates in specific, I think that you only need access to the original end certificate to produce a version re-signed with your new intermediate, and you should be keeping a copy of these certificates anyway. You could then contrive a scheme where if a signature chain fails to verify purely because one of the certificates is out of its validity period, the client attempts to contact your service to get a new version of the end certificate and trust chain (and then saves the result, if it validates). But this scheme still requires the client to contact you periodically to download signature updates that are relevant to it.
In Mozilla's case, they have the advantage that their program is more or less intrinsically used online on the Internet (with small exceptions) and is generally already set to check for updates to various things. Fetching some sort of signature updates for addons would not be a huge change to Firefox's operation in general, although it probably would be a significant change to how addons are verified and perhaps how they're updated.
All of this is an area of TLS certificates and certificate handling that I don't normally think about. Until Mozilla's problem made this quite visible to me, I hadn't even considered how things like code signatures have fundamental operational differences from TLS website certificates.
PS: Short-lived code signing certificates and other non-ephemeral signatures have the obvious and not entirely pleasant side effect that the signed objects only keep working for as long as you maintain your (re-)signing infrastructure. If you decide to stop doing so, existing signed code stops being accepted as soon as its certificates run out. The friendly way to shut down is probably to switch over to extremely long lived certificates before you decommission things.
What usually identifies an intermediate or root TLS certificate
The usual way of describing TLS certificates for things like websites is that they are a chain of trust, where your website's TLS certificate is signed by a Certificate Authority's current intermediate certificate and then that intermediate certificate is signed by the CA's root certificate. For example, you can read about Let's Encrypt's chain of certificates. Some CAs use a chain of multiple intermediate certificates.
(Modern TLS certificates often include a URL to fetch their parent certificate, as covered here.)
But this raises a question, namely how a TLS certificate specifies what its parent certificate is (the certificate that signed it). Until recently, if you had asked me and I had answered off the cuff, I would have guessed that it was based on something like the cryptographic hash of the parent certificate. As it turns out, this is not how it works; instead, as you can find out from actually looking at a certificate, the parent certificate's identity is given by providing its X.509 Subject Name attribute, which identifies both the organization and the intermediate or root certificate's Common Name attribute.
(Okay, there are also Key Identifiers, see here and here. It appears that Key Identifiers are normally exactly that, which is to say that they identify the public key used, not the certificate as such.)
There are some interesting consequences of this that I didn't fully realize or think about before recently. The first is what Let's Encrypt does, which is that it has two versions of its intermediate certificate, one signed by its own root certificate and one signed by IdenTrust. Since they have the same Subject (including Common Name), TLS certificate validation by browsers and other things will accept either. If you inspect the current two Let's Encrypt X3 certificates, you'll find that they also have slightly different key validity periods, presumably because the one signed by the LE root was signed later (about seven months, if we go from the not-before date).
The second is that you can take an existing intermediate certificate and re-issue and re-sign it with new validity dates (and perhaps other modifications to certificate metadata) but the same keypair, the same Subject and Common Name, and so on. This new version of the intermediate certificate can be transparently substituted for the old version because everything that identifies it matches. However, it absolutely must re-use the same public key, because that's the only way to have existing signatures verify correctly.
(This is similar to how in the old days of manual TLS certificate renewal, you could re-use your existing keypair in your new CSR if you wanted to, although it wasn't the recommended practice.)
(See also how TLS certificates specify the hosts they're for which covers server certificates and talks more about X.509 Subject Names.)
PS: Re-signing a certificate this way doesn't require access to its private key, so I believe that if you wanted to, you could re-sign someone else's intermediate certificate with your own CA. I can imagine some potential uses of this under sufficiently weird circumstances, since this would create a setup where with the right certificate chain a single TLS certificate would validate against both a public CA and your internal CA.
One of my problems with YAML is its sheer complexity
YAML is unarguably a language, with both syntax (how you write and format it) and semantics (what it means when you write things in particular ways). This is not unusual for configuration files; in fact, you could say that it's absolutely required to represent a configuration in text in any way. However, most configuration file languages are very simple ones, with very little syntax and not much semantics.
YAML is not a simple language. YAML is a complex language, with both a lot of syntax to know and a lot of semantics attached to that syntax. For example, there are nine different ways to write multi-line strings (via), with subtle differences between each one. Like Markdown and even HTML, this complexity might be okay if everything used YAML, but of course this isn't the case. Even in our small Prometheus environment, Prometheus uses YAML but Grafana uses a .ini file (and sometimes JSON, if we were to use provisioning).
The problem with a complex configuration file language is that it is not the only thing in the picture. Specifying the configuration itself has its own syntax and semantics, often complex ones; it is just that they are not at the lexical level of 'what is a string' and 'what is a block' and 'how do you define names and attach values to them'. This is actually something that using YAML makes quite clear; your YAML describes a structure, but it doesn't say anything about what the structure means or how you write it so that it means the things you want. That's all up to the system reading the YAML, and it can be as complicated as the program involved wants to make it. Often this is pretty complicated, because the program needs to let you express a bunch of complicated concepts.
Since specifying the configuration is often intrinsically complex, a complex configuration file language on top of that is adding an extra layer of pain. Now instead of one complex thing to learn and become decently capable at, you have two. Dealing with two complex things slows you down, increases the chances of problems, and probably means that you will lose familiarity with the overall system faster once you stop actively working on it (since there is more you need to remember to do so).
Complexity also invites writing your YAML based partly on superstitions. If you don't fully understand YAML because it's so complex, your best approach is to copy what works and modify it only if you have to. This will probably not give you the right or the best YAML, but it will give you YAML that works, which is your real goal. Your goal is not to write perfect YAML, your goal is to configure your system; YAML (and the entire configuration file) is an obstacle in the way that you need to climb over.
(Do I write superstitious YAML? Yes, absolutely.)
Ultimately that is my biggest objection to the complexity of YAML; it puts a substantial obstacle in the way of getting to what you want, an obstacle that is generally unnecessary. Very few configuration languages need the complexity and generality in their configuration files that YAML provides, so almost all of them would be better served by a simpler configuration file language.
(This entry is informed by YAML: probably not so great after all, which covers a number of other problems and traps in YAML that don't concern me as much, partly because I don't think I'm likely to run into them. Via, itself via.)
My problem with YAML's use of whitespace
Over on Mastodon, there was a little exchange:
current status: yaml
please send your thoughts & prayers to be with me at this difficult time
It's impressive how yaml took Python's significant whitespace and somehow made it far worse.
I've been thinking about my remark since I said it, and I think I've finally put my finger on a large part of why I feel that way about the YAML I've written for Prometheus. As best as I can put it simply, what makes Prometheus YAML such a bad experience here is that it uses deeply nested blocks and it's written without explicit de-indents. Helping this along is that the standard indent is only two spaces, which blurs indent levels together.
Here, let me show you what I mean. This is the kind of structure I wind up working on all of the time in Prometheus's YAML configuration files:
groups: - name: fred rules: # A big comment that in practice # is several lines long - ath: bth # Maybe another comment cth: a long string [...] dth: [...] eth: host gth: # This may have a comment too # even a multi-line one hth: a long string
If you can't really tell these indentation levels apart in the first place, well, that is one of the drawbacks of two-space indents being the cultural standard in YAML.
All of this is almost always more than one screen long (unless I really stretch out my terminal window). Now, imagine that you're coming along and you want to add another rule after the first one. How do you figure out how much to indent it? You have to exactly match the '- ath: bth' indent level, but that's quite possibly off the top of your screen, so you're scrolling up and down trying to match the indent level. Alternately you have to remember that the last line of the previous thing is N indent levels in from the top (for a varying N depending on what you're writing) and de-indent that much relative to it.
Although Python uses significant whitespace too, you don't usually
write Python code in this deeply nested way (and the Python standard
is four-space indents, which makes things much more visibly
distinct). Python also has very predictable indent levels for most
things that you're going to be adding (normally the
def for a new
function isn't indented at all, the
def for a new class method
is one indent level in, and so on). And stylistically, sprawling
and deeply nested functions and code would often be considered a
code smell. People deliberately avoid them and work to flatten
deeply indented structures.
In the kind of YAML that Prometheus uses, sprawling and deeply nested structures are everywhere. Everything is a thing inside another thing inside a third thing and so on, so everything gets indented and indented and indented. There are almost no explicit de-indentation markers that you write, either intrinsically as part of the objects or culturally as, say, a '# end of <thing>' comment at the outer indent level at the end.
(I have other issues with YAML, but I think I will defer those to another entry. Also, the indentation I'm using here may have one unnecessary level to it, and it's certainly got an inconsistency; part of this is inherited, and part of it is because I do not really understand the rules of when YAML indentation is required and when it's optional.)
Notifications and interruptions, and my view on them
On Twitter, I got irritated at Apple News:
Congratulations, Apple News: for allowing some random news service I've never indicated any interest in (or even looked at) to push a notification (complete with a sound), you have completely and permanently lost all notification privileges.
(For bonus points this was on the lock screen, which means that Apple News made a noise to attract my attention to an otherwise inactive device.)
I'm not the only person that Apple News is doing this to; see Matthew Cassinelli's Apple News has a notification spam problem (and my memory of that article was part of why I reacted strongly here). But a large part of this is my general view of notifications, at least on iOS.
On iOS, notifications are interruptions. That's what they're explicitly designed to do, when they pop up on your lock screen or shove themselves into the top of your screen; they're there to push something in front of you that can't wait. Like many people, I don't like getting interrupted by things that aren't actually important. If you keep pestering me with interruptions that aren't worth yanking me away from what I was doing, in real life or otherwise, I will stop allowing you to interrupt me. Well, putting it that way is underselling my view on iOS notifications, because I consider them suspect unless proven otherwise. If I allow notifications at all for an app, it's on more or less permanent probation; every notification had better be worth it, and if not the app is probably losing that permission.
(To its credit, iOS makes it relatively easy to remove these permissions from apps after the fact, even from vendor provided system apps like Apple News. Apple did feel free to silently and automatically grant Apple News all sorts of permissions when I first ran it, though, which is rather pushing things. Especially since I would have denied it those permissions if it had asked.)
Originally I was going to call this entry something like 'my view on notifications', but then I realized that this wasn't really accurate. This is my view on iOS-style notifications, but notifications in general don't have to be done so that they're obtrusive interruptions. My Linux desktops have notifications, but in the form of little bubbles that show up quietly in a corner of the screen, linger for a bit so I can read them if necessary, and then fold away again (leaving a little 'you have unread notifications' marker that I often don't even really notice). These notifications don't interrupt me or get in the way, and I don't think any Linux desktop environment would dream of having them show up on the lock screen. As a result, I mostly don't turn these off and I even periodically find them convenient.
(There are various problems with doing this sort of notifications on iOS; one obvious one is that phones have limited screen space, so it's much harder to put up a useful notification that's only in an unobtrusive corner of the screen.)
You might as well get an x86 CPU now, despite Meltdown and its friends
A year or so ago I wrote an entry about how Meltdown and Spectre had made it a bad time to get a new x86 CPU, because current CPUs would suffer from expensive mitigations for them and future ones wouldn't. Then I went and bought a new home CPU and machine anyway, and as time has passed I've become more and more convinced that I made the right decision. Now I don't think that people should delay getting new x86 CPUs (or any CPUs), at least not unless you're prepared to wait quite a long time.
Put simply, speculative execution attacks have turned out to be worse than at least I expected back in the days when Meltdown and Spectre were new. New attacks and attack variations keep getting published and it's not clear that people have any idea how to effectively re-design CPUs to close even the current issues, never mind new ones that researchers keep coming up with. That mythical future CPU that will mitigate most everything with significantly less performance penalty is probably years in the future at this point. I'd expect it to take at least one CPU design cycle after people seem to have stopped discovering new speculative execution attacks, and it might be longer than that (it may take CPU designers some time to work out good mitigations, for example).
So yes, any current x86 CPU you buy will pay a performance penalty to deal with speculative execution problems (assuming that you don't turn the mitigations partially or completely off). But so will future ones, although they'll probably pay a lower penalty. Effectively, new CPUs with improved hardware-based mitigations against speculative execution are now one more source of the modest but steady progress in CPU performance. Like a number of other sources of performance improvements (such as additional special SIMD instructions), the improvements will matter a lot to some people and not very much to others. For desktop and general use, they'll probably be useful but not critical.
(It's even possible that future CPUs will see effective decreases in some aspects of performance. For example, Intel dropped HyperThreading in recent generations of i7 CPUs at the same time as they increased the core count. I don't believe Intel has explicitly linked this to speculative execution issues, but certainly HT makes some of them worse, so dropping HT is an easy mitigation that can also be used to drive sales of higher end CPUs in Intel's usual fashion.)
PS: I'm not even going to guess at the benefits and risks of turning various mitigations off in various cases, especially for desktop use, because it depends on so many factors. Right now I'm going with the Linux and Fedora defaults, because that's the easiest way and I have fast enough CPUs and light enough usage that it hopefully doesn't matter a lot to me (but of course I haven't measured that).
A VPN for me but not you: a surprise when tethering to my phone
My phone supports VPNs, of course, and I have it set up to talk to our work VPN. This is convenient for reasons beyond mere privacy when I'm using it over networks I don't entirely trust; there are various systems at work that can only be reached from 'inside' machines (including the VPN server), or which are easier to use that way.
My phone also supports tethering other devices to it to give them Internet access through the phone's connection (whatever that is at the time). This is built in to iOS as a standard function, not supplied through a provider addition or feature (as far as I know Apple doesn't allow cellular providers any control over whether iOS allows tethering to be used), and is something that I wind up using periodically.
As I found out the first time I tried to do both at once, my phone has what I consider an oddity: only the phone's traffic uses the (phone) VPN, not the traffic from any tethered devices. The VPN is for the phone only, not for any attached devices; they're on their own, which is sometimes inconvenient for me. It would be a fair bit easier if any random machine I tethered to the phone could take advantage of the phone's VPN and didn't have to set up a VPN configuration itself.
(In fact we've had problems on our VPN servers in the past when there were multiple VPN connections from the same public IP, which is what I'd get if I had both the phone and a tethered machine using the VPN at the same time. I think those aren't there any more, although I'm not sure.)
As far as I know, there is no technical requirement that forces this; in general you certainly could route NAT'd tethered traffic through the VPN connection too. If anything, my phone may have to go out of its way to route locally originated traffic in one way and tethered traffic in another way (although this depends on how NAT and VPNs interact in the iOS kernel). Doing things this way seems likely to be mostly or entirely a policy decision, especially by now (after so many years of iOS development, and a succession of people asking about this on the Internet, and so on).
(I don't currently have a position on whether it's a good or a bad policy decision, although I think it is a bit surprising. I certainly expected tethered traffic to be handled just the same way as local traffic from the phone itself.)