Wandering Thoughts

2021-04-18

Using NVMe SSDs over SATA SSDs in basic servers is an awkward sales pitch

We have traditionally mostly bought basic, inexpensive 1U servers, almost all of which have had either two or four drive bays. Our transition of these servers from using HDs to using SATA SSDs was relatively straightforward. It was driven by dropping SATA SSD prices, the clear improvements in performance even for casual activities like installing the operating system and upgrading packages (and rebuilding software RAID mirrors), the probable increase in service lifetime, and how relatively easy it was to substitute a SATA SSD for a SATA HD (except that for a while that was a hassle, which slowed down our transition).

Putting NVMe SSDs in such basic servers with only a few drive bays is reasonably doable in terms of PCIe lanes and space. Even if you support four NVMe SSDs, that only needs 16 PCIe lanes, the M.2 form factor doesn't need much space, and people have accepted non-hotswap drives in 1U servers before (our Dell R210s have two non-hotswap bays). Or you could go with the U.2 NVMe form factor; as noted by Andy Smith in a comment on my entry on SSD versus NVMe for basic servers today, Tyan will already sell you a 1U server with 4x U.2 and 4x 3.5" drive bays (here's one model).

However, actually getting people to do this seems like an awkward sales pitch. Right now you'd either pay more for a U.2 SSD or probably live without hotswap for an M.2 NVMe SSD, and while you get much better IO performance, relatively few basic 1U servers are doing things that are IO constrained on SATA SSDs. If you're using SATA SSDs, you're getting the durability advantages of solid state storage and probably most of the power and heat savings, and you've already taken the first massive leap from HD performance.

What feels to me the most likely path toward NVMe on basic servers is the spread of machines like Tyan's model, with 4x U.2 and 4x 3.5" bays (at competitive prices against basic 1U servers if the U.2 bays are empty). We and probably many other buyers would almost always use SATA SSDs in the 3.5" bays, but if U.2 SSDs became competitively priced we would probably start switching to them (if nothing else, we wouldn't need adapters). The existence of these servers with the option of U.2 might act to drive down the price of U.2 SSDs by increasing the market (we can hope).

(Having the option of either 3.5" bays or NVMe bays seems smart, because HDs remain your best option if you want a lot of disk space in not much physical space or cost. You can now get 4 TB SATA SSDs, but they're quite costly. Meanwhile, 8 TB HDs are a few hundred dollars.)

Another option is that makers of basic servers could make very short servers that use M.2 NVMe SSDs on the motherboard in order to save internal space. Dell already did something like this with their half-length R210s, which had non-hotswap drive bays because the drives were mounted sideways to save depth. However I'm not sure how much of a priority rack depth is for people these days and we certainly found that having a mix of rack depths could be awkward. If these servers were the inexpensive option, we would probably buy some (assuming M.2 NVMe SSD prices for small sizes stay basically the same as the same sized SATA SSDs).

All of this leaves me expecting that any transition of basic servers to NVMe for us will be every slower than our transition to SATA SSDs, which took way more time that I expected (we more or less started in 2013 but as late as 2017 we were still using HDs in some new servers).

ServerNVMEHardSalesPitch written at 23:25:34; Add Comment

2021-04-13

Getting NVMe and related terminology straight (for once)

In a comment on yesterday's entry on SSD versus NVMe for basic servers, Andrew noted:

I object to "SSD vs NVMe" — they're NVMe SSDs.

This set me off on a long overdue journey to understand the terminology here and get it right.

NVMe, also known as NVM Express, is the general standard for accessing non-volatile storage over PCIe (aka PCI Express). NVMe doesn't specify any particular drive form factor or way of physically connecting drives, but it does mean PCIe; a 'NVMe SATA drive' is a contradiction in terms. The term 'NVMe SSD' is arguably redundant but seems to be in common usage, is likely to be widely understood, and so is generally better than my usage so far of 'NVMe drive'.

In general, NVMe SSDs have either two or four PCIe lanes (commonly called 'x2' and 'x4'), with (very) cheap NVMe SSDs sticking to x2 and everyone else using x4. There have been several successive versions of PCIe, with NVMe SSDs using PCIe 3.0, PCIe 3.1, and now high end drives coming out with PCIe 4.0 (which is faster and better). PCIe 4.0 NVMe SSDs are backward compatible to systems with only PCIe 3.0 or 3.1. If you talk about a plain 'NVMe SSD', I suspect that most people today will assume it's a PCIe 3.0 NVMe SSD (and x4).

(PCIe 3.1 seems to have been a small enough improvement that a lot of consumer NVMe SSDs just label themselves as PCIe 3.0.)

The dominant consumer form factor and physical connector for NVMe SSDs is M.2, specifically what is called 'M.2 2280' (the 2280 tells you the physical size). If you say 'NVMe SSD' with no qualification, many people will assume you are talking about an M.2 2280 NVMe SSD, or at least an M.2 22xx NVMe SSD. A common physical form factor alternate to M.2 NVMe SSDs in higher end hardware is U.2; a U.2 NVMe SSD basically looks like a traditional 2.5" SATA SSD with a slightly different connector. If you say 'M.2 NVMe' or 'U.2 NVMe', without the 'SSD', people will understand what you're talking about (assuming they even know what U.2 is). If you say 'M.2 SSD' or 'U.2 SSD', I think people will assume the NVMe part, although it's slightly ambiguous for at least M.2.

(So if you wanted to fully name a NVMe SSD, you might say 'M.2 2280 x4 PCIe 4.0 NVMe SSD'. You can see why this can get chopped down in common usage.)

Traditional SATA SSDs are, well, SATA SSDs, in the usual 2.5" form factor and with the usual SATA edge connectors (which are the same for 2.5" and 3.5" drives). If you simply say 'SSD' today, most people will probably assume that you mean a SATA SSD, not a NVMe SSD. Certainly I will. If I want to be precise I should use 'SATA SSD', though. SATA comes in various speeds but today everyone will assume 6 Gbits/s SATA (SATA 3.x).

The M.2 connector can also be used for a SATA connection instead of a PCIe one. M.2 form factor drives that use SATA instead of PCIe are generally called 'M.2 SATA' (and I believe are usually physically M.2 2280). When NVMe was new and even basic NVMe SSDs cost extra, apparently M.2 SATA SSDs were popular in notebooks and similar machines because SSDs in the M.2 form factor were physically much smaller than 2.5" SSDs. Now this form factor is seems to be going away, with models being discontinued without replacements (source).

In our environment, we use 2.5" SATA SSDs in all of our servers (so far) and I have M.2 NVMe SSDs in my office workstation (along with SATA SSDs). We might someday use either M.2 NVMe SSDs or U.2 NVMe SSDs in future servers, depending on how prices and hardware availability change. U.2 NVMe SSDs seem easier to manage but currently are harder and more expensive to buy.

NVMeGettingTermsStraight written at 19:38:27; Add Comment

2021-04-12

SSD versus NVMe for basic servers today (in early 2021)

I was recently reading Russell Coker's Storage Trends 2021 (via). As part of the entry, Coker wrote:

Last year NVMe prices were very comparable for SSD prices, I was hoping that trend would continue and SSDs would go away. [...]

Later Coker notes, about the current situation:

It seems that NVMe is only really suitable for workstation storage and for cache etc on a server. So SATA SSDs will be around for a while.

Locally, we have an assortment of servers, mostly basic 1U ones, which have either two 3.5" disk bays, four 3.5" disk bays, or rarely a bunch of disk bays (such as our fileservers). None of these machines can natively take NVMe drives as far as I know, not even our 'current' Dell server generation (which is not entirely current any more). They will all take SSDs, though, possibly with 3.5" to 2.5" adapters of various sorts. So for us, SSDs fading away in favour of NVMe would not be a good thing, not until we turn over all our server inventory to ones using NVMe drives. Which raises the question of where those NVMe servers are and why they aren't more common.

For servers that want more than four drive bays, such as our fileservers, my impression is that one limiting factor for considering an NVMe based server has generally been PCIe lanes. If you want eight or ten or sixteen NVMe drives (or more), the numbers add up fast if you want them all to run at x4 (our 16-bay fileservers would require 64 PCIe lanes). You can get a ton of PCIe lanes but it requires going out of your way, in CPU and perhaps in CPU maker (to AMD, which server vendors seem to have been slow to embrace). You can get such servers (Let's Encrypt got some), but I think they're currently relatively specialized and expensive. With such a high cost for large NVMe, most people who don't have to have NVMe's performance would rather buy SATA or SAS based systems like our fileservers.

(To really get NVMe speeds, these PCIe lanes must come directly from the CPU; otherwise they will get choked down to whatever the link speed is between the CPU and the chipset.)

Garden variety two drive and four drive NVMe systems would only need eight or sixteen PCIe lanes, which I believe is relatively widely available even if you're saving an x8 for the server's single PCIe expansion card slot. But then you have to physically get your NVMe drives in the system. People who operate servers really like drive carries, especially hot-swappable ones. Unfortunately I don't think there's a common standard for this for NVMe drives (at one point there was U.2, but it's mostly vanished). In theory a server vendor could develop a carrier system that would let them mount M.2 drives, perhaps without being hot swappable, but so far I don't think any major vendor has done the work to develop one.

(The M.2 form factor is where the NVMe volume is due to consumer drives, so basic commodity 1U servers need to follow along. The Dell storage server model that Let's Encrypt got seems to use U.2 NVMe drives, which will presumably cost you 'enterprise' prices, along with the rest of the server.)

All of this seems to give us a situation where SATA remains the universal solvent of storage, especially for basic 1U servers. You can fit four 3.5" SATA drive bays into the front panel of a 1U server, which covers a lot of potential needs for people like us. We can go with two SSDs, four SSDs, two SSDs and two big HDs, and so on.

(NVMe drives over 2 TB seem relatively thin on the ground at the moment, although SSDs only go up one step to 4 TB if you want plenty of options. Over that, right now you're mostly looking at 3.5" spinning rust, which is another reason to keep 1U servers using 3.5" SATA drive bays.)

ServerSSDVsNVMeIn2021 written at 22:03:28; Add Comment

2021-03-27

Internet routing can now vary based on things you wouldn't expect

Today Toronto had a little issue with Cloudflare, which gave me a chance to learn a useful lesson about the modern Internet and how it routes traffic. The summary of the lesson is that the venerable Unix traceroute command may not be your friend any more.

I come from an era of a relatively simple Internet. Back then, the path that your packets took through the network was expected to depend only on the destination and the source IPs. Things in the middle might drop some traffic or filter parts of it out, but the path was the same whether you were using ICMP, UDP, or TCP, and regardless of what port TCP or UDP port you were connecting to. In this environment, ping and traceroute were reliable diagnostics in general; if routes weren't flapping, traceroute would tell you the path that all of your traffic was using, while ping could tell you that the target host was there.

(If something pinged but didn't respond to the port you wanted, it was a firewall issue.)

The Cloudflare issue today did not behave like that. In particular, plain traceroute reported one path, a short five-hop one, while 'traceroute -T -p 443' reported a rather different ten-hop path that seemed to take a detour off to Chicago before coming back to Toronto (and not reaching the target Cloudflare IP). At one level, port-based routing makes a certain amount of sense; it's a lower level version of application load balancers, and why go to all the bother of doing complicated things just to reject UDP packets that you don't handle. At another level it makes troubleshooting and testing more complicated, especially for outside people. ICMP, random UDP traffic, and actual TCP traffic to specific ports (or emulations of it) may go to completely different places, so information gathered in one way for one of them doesn't necessarily apply to anything else.

Fortunately not everything is like this. Unfortunately the people who are most likely to be like this are the large cloud providers and CDNs, and those collectively host a lot of websites and places of interest (and their complexity provides more room for subtle problems).

For myself, my lesson learned from this is that if I'm trying to check out the network path to some outside place, I should use 'traceroute -T -p 443' (or the applicable port, but HTTPS is the most likely). Once HTTP/3 becomes common, I'll potentially also want to check with UDP port 443 (although that gets complicated fast). Plain ping and traceroute are not as trustworthy as they used to be any more.

InternetPathsMayVaryByPort written at 23:51:55; Add Comment

2021-03-20

Paging out memory can be 'global' or 'local' these days

The traditional way that operating systems (Unix and others) have implemented swapping (really paging) is on a global, system wide basis. All processes (from all users on a multi-user system) are in contention for the same limited pool of RAM, and the operating system can pick on pages from any process to be evicted from RAM. Page eviction decisions are made using global page replacement algorithms that are generally blind to what process was using what RAM. There were (and are) good reasons for this, including that global paging is simpler and that pages of RAM can be shared widely, including between otherwise unrelated processes.

(This sharing of pages can complicate what we mean when we talk about something's memory usage. Also, the global page eviction can still wind up scanning process by process.)

However, modern versions of Unix are increasingly adding ways to limit memory usage on a more fine grained basis, where one process, a group of them, or a general container is only allowed to use so much RAM (and there are hierarchical systems). Often hitting such a RAM limit causes the operating system to start evicting pages from whatever is being limited without touching the memory of other processes on the system (assuming that they aren't hitting their own limits). We can call this 'local' swapping, in contrast to the traditional system-wide 'global' swapping.

(Naturally this gets complicated if a page of RAM is used both by a process that's hit its limit and a process that hasn't. It's not really useful to keep stealing the page table entry from the limited process if the process just accesses it again and the page never actually leaves RAM.)

Applying local memory limits and creating local swapping can have performance impacts on the affected programs. But if you impose such limits, presumably you've decided that the performance impacts on the relevant processes are less important than the potential system-wide effects if you let everyone get into global swapping.

In general, local swapping isn't guaranteed to be sufficient by itself, especially if people are allowed to set limits by hand and overcommit the machine's memory. Operating systems still need global swapping and still need to be able to do global eviction in some way. How well the two sorts of swapping cooperate with each under may vary.

PS: I don't know if non-Unix systems are also adding fine grained memory limiting, but it wouldn't surprise me. And one way to view virtual machines is that they're in part an extremely brute force way of limiting RAM (and CPU) usage.

GlobalVsLocalSwapping written at 01:12:26; Add Comment

2021-03-03

What signal a RJ-45 serial connection is (probably) missing

We have a central (serial) console server, which means that we have a bunch of serial lines running from the serial consoles of servers to various Digi Etherlites. However, we do not use actual serial cables for this. Probably like most places that deal with a lot of serial lines, we actually use Ethernet cables. On one end they plug directly into the Etherlites; on the other end we use what are generally called 'DB9 to RJ45 modular adapters'. All of this is routine and I basically don't think about it, so until I recently read Taking This Serially it didn't occur to me to wonder what pin was left out in this process.

As Taking This Serially covers, DB-9 (really DE-9) has nine pins, all of them used for serial signals:

  1. Data carrier detect (DCD)
  2. Rx
  3. Tx
  4. Data terminal ready (DTR)
  5. Ground
  6. Data set ready (DSR)
  7. Request to send (RTS)
  8. Clear to send (CTS)
  9. Ring indicator

Ethernet cables and RJ-45 connectors have at most eight wires that you can use, so one of the remaining DB-9 serial signals must be dropped (or otherwise altered). If you consult online descriptions of typical DB-9 to RJ-45 wiring, you will wind up at the obvious answer: RJ45 serial connections typically drop the 'Ring indicator'.

The actual situation is rather complicated. There are multiple different ways to connect the 8 RJ45 wires to the nine DB-9 wires, and only some of the ones covered on Wikipedia drop Ring Indicator. One, EIA/TIA-561, combines it with DSR. Or at least that's how EIA/TIA-561 is described on Wikipedia and some sources; others don't mention DSR at all. And for the signals that are passed through, each different way has its own set of pin assignments for what signal is on what RJ45 pin. Or your hardware may be different from all of the options listed on Wikipedia, as Digi's online documentation suggests is the case for our Etherlites.

(Having looked this up, I now understand why we always buy the 'wire your own' DB-9 to RJ-45 modular connectors instead of the prewired ones. As always, the problem with serial connections is that there are so many different standards.)

RJ45SerialMissingSignal written at 19:03:32; Add Comment

2021-03-01

The balance of power between distributions and software authors

In a comment on my entry on how modern software controls dependencies because it helps software authors, Peter Donis raised a good and interesting point:

I understand the issue you are describing for software authors, but as a software user, if bundling static dependencies instead of using shared libraries means security vulnerabilities in software I'm using take longer to get fixed, then I'll just have to stop using that software altogether. And for the vast majority of software I use, I don't get it directly from the author; I get it from my distro, so what matters to me is how easily the distro can fix issues and push updates to me.

And the distro maintainers know that, and they know that users like me are depending on the distros to get those vulnerabilities fixed and push updates in a timely manner; so if particular software programs make it hard for them to do that, they have an incentive to stop packaging those programs. It seems to me that, if push comes to shove, the distros have the upper hand here. [...]

My view is that there is a somewhat complicated balance of power involved between software authors and distributions, all starting with the people who are actually using the software. This balance of power used to be strongly with distributions, but various aspects of modern software has shifted it toward software authors.

As a person using software, I want to, well, use the software, with as little hassle and as few security issues, bugs, and other problems as possible. Like most people I'm broadly indifferent to how this is done, because my goal is to use the software. However, how much aggravation people will put up with to use software depends on how important it is to them, and I'm no exception to this.

(This balance is individual, not universal; different people and organizations have different needs and priorities.)

If the software is not all that critical or important to me, my threshold for aggravation is low. In fact if the aggravation level is too high I may not even use the software. Here the distribution mostly holds the balance of power, as in Peter Donis' comment, because obtaining software through the distribution is often much easier (although that's changing). If the distribution refuses to package a particular piece of software and it's not that important, I may well do without; if it only packages an old version, that version can be good enough. Software authors may need to play along with what the distributions want or be left out in the cold.

But if the software is important or critical to us, our threshold of aggravation is very high. Doing without or living with bugs fixed upstream in more recent versions becomes less and less of an option or is outright off the table. If we can use a distribution version we generally will, because that's less work, but if we have to we will abandon the distribution and work directly with the upstream version in one way or another. Here the software author holds the balance of power. The distributions can either go along with how the software author is working or be bypassed entirely.

(As an illustration of this power, we install the upstream versions of Prometheus and rspamd instead of the Ubuntu versions. And we use Grafana despite it not being packaged in Ubuntu 18.04.)

This means that one of factors in the balance of power is what sort of software it is, which goes a long way to determining how critical it is to people. The authors of a utility program usually have a lot less leverage than the authors of important services. Authors of important services are also much more exposed to people's anger if distributions make mistakes, as they do every so often, which gives these software authors some biases.

Modern software ecologies like Rust, Python, NPM, and Go have shifted this balance of power in an important way, especially for things like utility programs, because all of them provide a simple way for people to install a program straight from its authors, bypassing the distribution. I don't think that the ecologies are deliberately trying to make distributions obsolete, but they do have a strong motivation to remove the distributions as gatekeepers. No distribution will ever package all of the software that's written in any language, so the easier the language makes it to install software outside of a distribution, the more software written in it will be used and spread.

(These ecologies do somewhat less well at letting you easily keep the programs you've installed up to date, but I suspect that that will come with time. Also, pragmatically a lot of people don't really worry about that for a lot of programs, especially utility programs.)

Sidebar: The historical balance of power shift

In the old days, it was often difficult (and time consuming) to either compile your own version of something written in C (or C++) or obtain a working binary from upstream. For building, there was no standard build system, you often had to pick a bunch of install time options, and you might need to do this for a whole collection of dependencies (which you would need to install somewhere that building the main program could find). For binaries, you had to deal with shared library versions and ABI issues. Distributions did all of this work for you, and it was hard work.

Modern software environments like Rust, Go, NPM, and Python change both the binary and the build your own side of this. Rust and Go statically link everything by default and produce binaries that are often universal across, eg, Linux on 64-bit x86, making it basically trouble free to get the upstream one. For building your own, all of them have single commands that fetch and build most programs for you, including the right versions of all of their dependencies.

SoftwareAndDistroPower written at 23:01:21; Add Comment

2021-02-20

Modern software controls dependencies because it helps software authors

Over on Twitter I had a hot take:

Hot take: Every distribution packager who's saying "you shouldn't bundle dependencies" is implicitly telling software authors "you should do more work for us and limit the features (and reliability) of your software".

(This was sparked by reading The modern packager’s security nightmare, via. I'm not exactly on one side or the other, but I do think distributions should be honest about what they're asking for and I don't think they're going to get it.)

This hot take is a bit too narrow. What really matters is software authors and modern software systems restricting the versions of dependencies (for both maximum and minimum versions). Explicit or implicit bundling on top of that just makes the problem slightly worse for distributions.

For software authors, restricting the versions of dependencies that they work with reduces the amount of work that they have to do, both to test against a range of versions and to either forever chase after whatever changes those dependencies like to make or to forever limit what features of dependencies they use to ones available in old versions (and sometimes both at once). In theory, both testing and chasing after changes would be dealt with by Semantic Versioning (if everyone followed it), at least for a single program. In practice, not only are people fallible but also people have a different understanding of what semantic versioning means because semantic versioning is ultimately a social thing, not a technical one. Our field's history has shown (sometimes vividly) that if software authors allow versions of dependencies to move on them, soon or later things break and the software author has to fix it.

(There's also the practical issue that not all dependencies even claim or agree to follow semantic versioning in the first place.)

For distributors, once software authors start restricting versions the distributor has both an upgrade problem and a distribution problem. On the upgrade side, dealing with an issue in a program may now require persuading it to accept a new version of a dependency. On the distribution side, it's now likely that you'll have multiple programs that have different version requirements for the same dependency. At the very least this multiplies the packages involved.

(Many distributions also have package system design problems that restricts the range of versions of things that they can have installed at the same time. Even under Semantic Versioning, this is a problem for the distribution the moment that you have two programs with conflicting version requirements that can't both be packaged and installed at the same time.)

However, there's no free lunch here. What distributors want when they ask for unbundled dependencies without version restrictions is for software authors to do the work to accept any version of their dependencies, or at least any version that falls within Semantic Versioning, and for dependencies to faithfully follow semver and also make it possible to package and install different major versions (at least) at the same time. Accepting a broad version range of your dependencies is actual work, even apart from the limitations it may impose on what your code and your software can do. Software authors and the creators of software package ecosystems (like Go and Rust) are not refusing to do this because they don't like distributions; they are refusing to do this because they have found, over and over again, that this doesn't really work and does cause problems for software authors (and often users of programs) in the long run.

(The software community that's gone through this experience the most visibly is Go, which started out with intrinsically unversioned dependencies that were used universally across all your programs by default and wound up switching to strongly versioned dependencies after many people had many problems with that initial state. Go experienced so many problems that they adopted an unusually strict and software author friendly versioning scheme.)

It's popular for people to argue that software authors should be doing this work anyway even if the distributions weren't asking for it, so them actually doing it is no big deal. This is quite convenient for the people making the argument, but it doesn't make the argument valid. Software authors don't owe anyone any work what so ever; they do whatever work serves their needs and is interesting to them. With limited time and interest, it's both rational and proper for software authors to optimize for their own development.

PS: Generally distributions also want some combination of all software to update to the latest version of their dependencies and for dependencies to explicitly support older versions. This is also extra work for software authors, especially when the distribution also wants it to happen for older versions of programs.

BundlingHelpsSoftwareAuthors written at 21:53:02; Add Comment

2021-02-17

TLS certificates specifying hosts via their CommonName field is more or less gone

TLS certificates for hosts and domains must somehow identify what hostname (or names) they're for. Historically there have been two ways to do this. The first way was a specific sub-field, the CN or CommonName, of the certificate's overall Subject Name. This had the problem that it could only have one name. When people started wanting to have TLS certificates that covered more than one name, they invented another mechanism, the Subject Alternative Name (SAN) extension.

As a practical matter, all vaguely modern software that wants to properly validate TLS certificates has supported (and often preferred) Subject Alternative Names for some time. A great many TLS certificates in the wild are for multiple hosts and it's generally unlikely that the host you're connecting to is the one name that the system chose to put in the CN field; software that only supports CN cannot validate those TLS certificates. As a matter of timing, SANs have been theoretically mandatory since 2002 and checking only SANs has been theoretically required since 2011 (which means that since 2011 or earlier, the CN was supposed to always be one of the SANs).

These days, any remaining support for looking at TLS certificate CommonName to validate TLS certificates is getting more and more extinct (and more so than I expected when I started writing this entry). In the browser realm, Chrome apparently turned it off in 58, released in 2017, and then threw out the option to check it again in Chrome 65 (from the comment on my old entry, which was ironically written shortly before Chrome did this). Firefox is said to have removed support in version 48, from August of 2016. Safari apparently stopped looking at CommonName in iOS 13 and macOS 10.15, which I believe date from late 2019. This Go change also talks about how browsers removed it in 2019 ('last year' for a mid 2020 change).

In non-browser TLS code, Go started ignoring CN by default in Go 1.15 (released in August of 2020) and this will be the only option starting in Go 1.17 (to be released in August of 2021), per here. Since Firefox doesn't support CN any more, I assume that NSS doesn't either, since NSS is basically Firefox's underlying TLS implementation. I have no idea what other TLS libraries are doing, but I would expect that many of them will support CommonName for some time to come; TLS libraries are historically behind browser practices. Hopefully they are all following the 2011 requirement to check only SANs when SANs are present (which they should always be in public certificates).

Probably TLS certificates will continue to contain CommonName fields for a long time to come. Having a Subject Name in general is common (although apparently not actually required) and the CN is a standard (although not required) part of the Subject Name, so you might as well throw it in. Even Mozilla and Let's Encrypt (still) have TLS certificates with CNs. However, since I checked this now, the current CA/Browser Forum baseline requirements (version 1.7.3) allow but don't require CommonName (section 7.1.4.2.2, which says that it's 'discouraged, but not prohibited'). Given how conservative most Certificate Authorities are, I expect them to be issuing TLS certificates with CommonName fields until they're required to stop.

(An interested party could scan Certificate Transparency logs to see if there were very many issued certificates without CNs. Probably there are some; someone must have tried it out at some point through an official CA.)

PS: no-common-name.badssl.com has a TLS certificate without a CN, or at least it's supposed to (via), but the TLS certificate is expired right now as I write this entry so it's hard to test how client software behaves. See also, which pointed me to no-subject.labs.vu.nl, which has a currently valid TLS certificate with no Subject Name at all.

TLSCertificateCNMostlyGone written at 23:28:25; Add Comment

2021-02-11

Getting high IOPS requires concurrency on modern SSDs and NVMe drives

My intuitions (or unthinking assumptions) about disk performance date back far enough that one of them is that a single program acting on its own can get the disk's 'normal' random read performance for plain ordinary reads (which are pretty much synchronous and issued one at a time). This was more or less true on hard drives (spinning rust), where your program and the operating system had more than enough time on their hands to saturate the drive's relatively small 100 to 150 IOPS rate. This is probably not true on modern SSDs, and is definitely not true on NVMe drives.

In order to deliver their full rated performance, modern NVMe drives and the operating system interfaces to them require you to saturate their command queues with constant activity (which means that IOPS ratings don't necessarily predict single request latency). Similarly, those impressive large random IO numbers for SSDs are usually measured at high queue depths. This presents some practical problems for real system configurations, because to get a high queue depth you must have a lot of concurrent IO. There are two levels of issues, the program level and then the system level.

On the program level, writes can generally achieve high concurrency if you have a high write volume because most writes are asynchronous; your program hands them to the operating system and then the operating system dispatches them while your program generates the next set of writes. The obvious exception is if you're performing synchronous writes or otherwise actually waiting for the data to be really written to disk. Reads are another matter. If you have a single program performing a single read at a time, you can't get high queue depths (especially if you're only reading a small amount of data). To get higher levels of concurrent read requests, either the program has to somehow issue a lot of separate read requests at once or you need multiple processes active, all reading independently. Often this isn't going to be all that simple.

Once you have enough concurrency at the program level you need to be in an environment where there's nothing in the operating system that's forcing this concurrency to be serialized. Unfortunately there are all sorts of things inside filesystems that might partially serialize either writes or reads, especially at volume. For instance, random reads in large files generally require the filesystem to load indirect mapping blocks into memory (to go from a byte offset to a location on disk). If you have two concurrent reads for separate locations that both need the same indirect mapping block to be read into memory, they've both blocked on a single resource. Similarly, writing data out may require loading free space information into memory, or writing out updates to it back to disk.

SSDs and NVMe drives are still very fast for single random IOs at a time (although we don't generally know how fast, since people only rarely measure that and it's dependent on your operating system). But they aren't necessarily as fast as they look on the specification sheet unless you really load up the rest of your system, and that's a change from the past. Getting really top notch performance from our SSDs and NVMe drives likely needs a more concurrent, multi-process overall system than we needed in the past. Conversely, a conventional system with limited concurrency may not get quite the huge performance numbers we expect from the SSD and NVMe spec sheet numbers, although it should still do pretty well.

(It would be nice to have some performance numbers for 'typical IOPS or latency for minimal single read and write requests' for both SSDs and NVMe drives, just so we could get an idea of the magnitude involved. Do IOPS drop to a half? To a fifth? To a tenth? I don't know, and I only have moderately good ways of measuring it.)

PS: This may well have been obvious to many people for some time, but it hadn't really struck me until very recently.

IOPSNowNeedsConcurrency written at 23:43:28; Add Comment

(Previous 10 or go back to February 2021 at 2021/02/10)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.