Wandering Thoughts

2022-05-22

Modern (public) TLS has only a limited number of intermediate certificates

TLS certificates for servers are not normally signed directly by the Certificate Authority's root certificate (for various good reasons); instead, they're signed by an intermediate certificate that chains up to the CA's root certificate (in modern usage, there generally is only one intermediate certificate between the server certificate and the CA's root certificate). In theory the server is supposed to send the intermediate certificate with its own certificate in a certificate chain, but in practice servers don't always send intermediate certificates and this can cause mysterious problems.

In the old days, one part of this problem is that there were a lot of intermediate certificates and no one really knew what they all were. I won't say that Certificate Authorities created intermediate certificates freely, because I'm sure they charged a lot of money for them, but in practice CAs were relatively willing to give intermediate certificates to third parties who were sufficiently persuasive. Those days are over. In the modern TLS world, intermediate certificates are limited, controlled, and known (and thus very finite). Third party intermediate certificates, ones not under the control of the Certificate Authority that signed them, are especially out of favour and many of them are now dead. Encountering a genuinely unknown intermediate certificate is generally the sign that some Certificate Authority is about to have a bad time, as browser security people descend on it with very pointed questions.

The simple version of what triggered this transition is that browsers became increasingly aware that third party intermediate certificates were essentially root Certificate Authorities with less visibility and adult (ie, browser) supervision. TLS intermediate certificates have never really had an effective limit on what TLS server certificates they could sign and have accepted by people, which meant that everyone with a TLS intermediate certificate could create a probably-valid TLS server certificate for any random website, like say google.com. Browsers didn't like this for obvious reasons and so they eventually cracked down.

These days things are much more restricted, even for intermediate certificates controlled by Certificate Authorities. Let me quote the Mozilla article on preloading intermediate certificates into Firefox:

[...] As a result of Mozilla’s leadership in the CA community, each CA in Mozilla’s Root Store Policy is required to disclose these intermediate CA certificates to the multi-browser Common CA Database (CCADB). [...]

(In the browser crackdown, Certificate Authorities were required to identify all of the intermediate certificates that they had issued and mostly either bring them back under their control or see the intermediate certificates destroyed. Browsers held the cards here because they had various measures to control what intermediates were accepted for various Certificate Authorities.)

In case anyone is tempted to cheat (or slips up by accident), Certificate Transparency makes it relatively easy to detect new intermediate certificates, at least for anything you want to be usable in major browsers. A TLS certificate logged into CT logs includes information on what certificate signed it, so interested parties can monitor the CT logs to detect mentions of new ones. In many cases they will be able to obtain a copy of the new intermediate certificate, letting them identify in turn who signed it.

(I don't know if Chrome requires intermediate certificates to also be in the CT logs, but if it does you can directly look for new ones there. Intermediate certificates are recognizably different from TLS server certificates.)

TLSNowLimitedIntermediates written at 22:33:43; Add Comment

2022-05-15

The idea of hierarchical filesystems doesn't feel like an API to me

One of the things that people say sometimes (including about serving static files on the web) is that "the filesystem is an API". On a superficial level this sounds fine. The abstracted form of a filesystem hierarchy is a common interface between programs, one that you can write code to use; you can create files and directories, inspect the hierarchy, read data that has been put in there, and delete things again (if you have permissions).

However, I've come to feel that the filesystem as a thing (abstract or otherwise) doesn't particularly feel like other APIs. Filesystems are program-independent hierarchical namespaces with some operations and types of entities that we all agree on (and a number of both that we don't; consider Unix symbolic links and hard links). But they aren't hard, nailed down interfaces for programs to code against in the way that other APIs are. The abstract idea of "the filesystem" has no defined content or structure (that's up to any particular use you make of the idea), and as far as actual program code goes we merely have a general agreement on the names for common operations and common filesystem contents ('files' and 'directories'). Different programming environments and even operating systems implement somewhat different actual APIs for filesystem access and manipulation, especially once you get out of the very basic operations.

(Even the idea of 'the contents of a file' is somewhat fuzzy. Is that a binary file or a text file? In some environments, the difference matters.)

If you say that the filesystem is an API, I feel that you're saying about as much as if you said that the web (as an idea and a general thing) is an API (which is true but in a very broad, architecture astronaut way). The filesystem being an API is pretty much the idea that you can use a commonly agreed on hierarchical namespace to communicate between programs, and between people and programs.

One place where this matters is if people aspire for other, more concrete APIs to become as broadly adopted and available as the filesystem "API". I feel that that would require people transforming those more concrete APIs into something more or less as broad, encompassing, and general as the idea of filesystems. For many things that people want to be broad APIs, I tend to not see any obvious path for that to happen because there doesn't seem to be a level between a broad idea and a concrete API.

(In part, there's a bit of me that bristles at calling the filesystem or the idea of filesystems an API, and I want to figure out what my objection is. I'm not sure I fully understand my objections yet.)

PS: You can use a filesystem to make a concrete API, or as part of a concrete API, but that's a different thing. You have one or more programs that's ascribing particular meaning to a hierarchy with specific contents, and specific manipulation of the contents may have specific effects.

FilesystemVersusAPI written at 20:55:51; Add Comment

2022-05-03

NVMe disk drives and SMART attributes (and data)

NVMe, also known as NVM Express, is the general standard for accessing non-volatile storage over PCIe. People commonly talk about NVMe SSDs while I sometimes call them 'NVMe drives' (see my entry on the terminology and some of the technology). By contrast you have SATA and SAS SSDs, which are a subset of SATA and SAS drives in general, because people (us included) still do use spinning rust.

SMART is a standard, originally for ATA hard drives, for reporting various information about the drive. When we talk about 'SMART' or 'SMART data', we often specifically mean SMART 'attributes', which tell you various things about the drive's state, health, wear over time, and so on; sometimes this information is even useful. SMART attributes have an ID number and some values associated with them, but what the attributes mean (and thus how you should interpret their values) is mostly vendor specific.

Whether or not NVMe SSDs have SMART attributes depends on your perspective. In the perspective of the NVMe specifications, a NVMe SSD is required to provide you with what is called 'SMART / Health Information', so maybe you could say it has SMART attributes. In practice, this information is not in the format of SMART attributes and it doesn't have the fields that they do, not even so much as an ID number. Instead, the "page" of this health information has a set of fields at specific byte offsets that have specific interpretations; for instance, the two bytes at offsets 1 and 2 contain a "composite temperature" for the NVMe SSD in Kelvins. In the current NVMe specification (version 2.0b), you can read all of the gory details in section 5.16.1.3.

The practical effect of this is to leave general tools that deal with SMART data in a somewhat awkward spot when it comes to NVMe SSDs. NVMe SSDs have information that corresponds to a variety of common SMART attributes, but you can't really present the information in the same format because the information lacks a lot of things that SMART attributes have. If you tried to pretend that NVMe health data was SMART attributes, you would have to invent SMART ID numbers for all of them along with some other stuff that people often care less about. However, presenting the information in a different form means that people need new practices and processes to deal with it (including in their monitoring and metrics systems).

(I don't know why this happened, but I like to think that everyone involved in the NVMe specification took one look at the anarchic mess that SMART attributes have become and vowed to not allow anything even vaguely like it. I don't know if the NVMe specification even allows additional vendor-specific health information.)

(NVMe SSDs also provide general information about themselves that is again somewhat different than what SATA/SAS/etc drives provide through 'SMART'. For example, NVMe drives provide an OUI for who made them (see eg this listing of OUIs and who they're assigned to), while I believe that SMART provides this as a text string. The OUI is more definitive, but the text string may be easier to deal with in things like metrics systems. And cross-mapping between the two for a mix of NVMe SSDs and SATA drives is a fun problem.)

NVMeAndSMART written at 22:43:57; Add Comment

2022-04-21

4K HiDPI monitors come in inconvenient sizes if you want two of them

It used to be that I used the same monitors at work and at home, although at home I had one of them instead of the two at work; most recently this was one of Dell's 24" 16:10 1920x1200 monitors. Several years ago I upgraded my home setup to a Dell 27" 16:9 4K HiDPI monitor. Recently the price of good 4K monitors dropped low enough that I got two of them at work; specifically, I got two 27" Dells, more or less like my home monitor that I was already familiar with. The migration over to them at work has turned out a little differently than I expected, because it turns out that there is a substantial size difference in practice between two 24" 16:10 monitors and two 27" monitors. My work 'desk' (a table) has room for them, but it's become obvious that the resit is awkwardly wide if I want to look at the far corners.

If I was doing this over again and had a free choice, I would get two 16:10 4K 24" monitors, or maybe two 25" 16:9 and live with the slightly less vertical space. However, even if I got to redo my upgrade, there's a problem: there are almost no 4K monitors under 27", and those that still exist are unusually expensive. Dell used to have a well regarded 24" 4K 16:9 monitor, but they took it out of production. LG theoretically has one and theoretically will sell it to you if you're in the US (for not too much). That seems to be about it.

(There also don't seem to be many 24" monitors with somewhat smaller resolution, and 2K on a 24" display is already a lower pixel density than 4K on a 27".)

It's not hard to come up with some reasons for why this could be the case. First off, generally big monitors are more attractive to most people, since most people only have one. Now that it's possible to make economical 4K large monitors, most people buying 4K monitors are probably primarily interested in them. The market for 4K 24" is probably somewhat niche. Second, I believe that the higher the real pixel density is, the more expensive it is to produce regardless of the eventual size of display it will go into. This would push 24" 4K panels to a higher cost per square cm than 27" 4K panels, possibly enough to not be compensated for by their smaller absolute size,

(Based on Wikipedia display size figures, a 27" 16:9 panel has a display area of 2010 cm² and a 24" 16:9 is 1588 cm². A 24" 16:10, such as my old 24" monitors, is 1670 cm².)

There are certainly worse problems to have, and I don't regret my two 27" monitors at work. Sooner or later I'll figure out some way to handle the far edges, or maybe I'll completely reorganize how I work with dual monitors (perhaps so there's one main monitor and a side monitor, instead of the current approach where I basically work in the middle where the two meet).

DualMonitorsAndHiDPI written at 23:48:20; Add Comment

2022-04-16

"Long term support" Unixes and versions of software in them

In a comment on my entry about how Firefox versions are relatively tightly tied to Rust versions and how this affects LTS Unixes, Tom said, in part:

[...] But, if Ubuntu LTS is going to have a version of rust that is relevant to users (rather than just as a build dependency), they should really package a up-to-date version of rust as well, since it has the same support period as does Firefox.

The problem that all Unix distributions face sooner or later is that the people using them generally want the platonic ideal version of semantic versioning minor releases, namely updates that only fix bugs and improve things and never introduce backward compatibility problems or undesired changes. Apart from other problems with semantic versioning, the reality of life is that almost no modern open source project works this way for very long, including languages. Rust has stopped accepting cargo.toml files that it used to (cf), Go has significantly changed how the toolchain worked (cf), and even C compilers have broken compilation of existing things by adding new warnings (cf).

(And languages are often the conservative things here. Many things with GUIs undergo much more change much faster. Firefox and Chrome are arguably two poster children for this.)

The state of modern software is that almost nothing holds to the ideal of semantic versioning minor releases over the span of a few years, much less the five years that is the common duration for modern "LTS" releases (although they may or may not make this clear in their version numbers). This is true for the latest releases of software, and it's also almost always true for the supported versions, because very few software projects support old releases for multiple years, especially three or four or five years. The consequence is that whether you keep up with the latest versions or just the latest supported version, sooner or later you'll have to install an update that isn't fully backward compatible. Some of the time, this will make people unhappy (although some of the time the change will be in an area that they don't care about).

(Note that not all projects follow semantic versioning in their version and release numbers. I think both Rust and Go would say that they don't, for example. And semantic versioning is ultimately a social understanding anyway.)

This leaves Unix distributions with three choices. They can not pretend to be stable over the long term, they can be stable over the long term with old software versions, or they can be "stable" over the long term with newer versions of eg Rust with the hopes that this won't introduce too many changes that upset people. Most Linux distributions pick either the first or the second, with as little of the third as they can get away with. If nothing else, this leaves people with a relatively clear choice; you can accept churn with the benefit of being relatively current, or you can accept stale software in order to get long periods of low or no churn.

LTSVersusSoftwareVersions written at 22:13:24; Add Comment

2022-04-02

Sorting out IPMI and BMC terminology and technology

In the past I have tended to use the term 'IPMI' for several different things, in a somewhat confusing and imprecise way (much like how I've been imprecise about terms like 'SSD' and 'NVMe'). For various reasons I want to straighten that out in my head, so I'm going to write up my understanding of the whole area.

A BMC is a "Baseboard management controller", an additional server embedded in your real server (or server class motherboard). The BMC is alive and booted up any time the power is being provided to the physical server, regardless of whether the host system is powered on or not, and can manage and control various aspects of the host system's operation, such as whether the host system gets to be powered on. IPMI is the "Intelligent Platform Management Interface", a set of standards for some things that BMCs do and how to talk to BMCs. BMCs are not limited to only doing things covered in the IPMI standards, or being talked to only over IPMI interfaces. It's not uncommon for me and other people to say that a server has an 'IPMI' or is 'IPMI capable' or the like when we really mean that it has a BMC (and the BMC supports IPMI). An example of this is my entry on "the clock in your server's IPMI drifts over time".

BMCs generally have a network connection, for remote management (via IPMI and otherwise). This network connection may be on a dedicated port or on a port shared with the host system; I consider shared IPMI network interfaces to be a problem (although you can help with a port isolated network). BMCs also commonly have a variety of system health sensors for temperature, fan RPM, power usage, and so on; these sensors are often not available to the host system to read directly. More advanced BMCs can present virtual USB devices to the system and can snoop on the system's video output, which allows them to offer "KVM over IP" and virtual media. where you can interact with the host system remotely as if you were sitting in front of it with a physical monitor and keyboard.

(KVM over IP support in the BMC is not part of the IPMI standards and is sometimes an extra-cost item from server vendors.)

There are broadly standardized host system interfaces to the host's BMC for IPMI, such as SMIC or IPMB over I2C. The Linux kernel has some documentation on this, on general IPMI and IPMB, and also see OpenBSD's ipmi(4), which has nice short descriptions of several of them. If your operating system supports these interfaces, you can use general user level tools to talk to most BMCs from the host itself, which is handy because the BMC doesn't need to be connected to the network and configured. This will let you configure (or de-configure) the BMC's networking, and also read those BMC sensors so you can monitor them.

(Common open source packages for this are ipmitool and FreeIPMI. Apparently OpenIPMI is no longer being maintained, but at one time it was another option.)

Part of the IPMI standards is a set of network (remote) access protocols that can let you read sensors, control the host system power, check the IPMI "system event log", connect to virtualized host serial ports ("serial over LAN"), and often do other things. BMCs are not at all limited to only talking to people with these IPMI network protocols; instead, they commonly also have web servers, respond to SNMP, often have a SSH interface, and more. None of these additional connection methods are at all standardized, and as we've seen the BMC may expose additional information through them that isn't available via IPMI. Since KVM over IP isn't part of IPMI, all BMCs that do KVM over IP provide it through some of these additional connection methods (often through their web server). BMCs that provide SSH access generally don't give you a Unix shell but instead give you some sort of captive environment. There is no particular cross vendor standardization in how you use these captive SSH environments; expect to have to learn a new set of commands for each different vendor's BMC.

(Dell BMC SSH commands look nothing like Supermicro BMC SSH commands, for example. This can be irritating when you want to do the same thing, like connect to a virtualized "serial over LAN" serial port by SSH'ing to the BMC.)

Part of general, standardized IPMI communication is the ability to tunnel "raw" messages back and forth (both locally and I believe over the network). These raw messages can be used to implement vendor specific additional BMC functionality, both to expose additional information and to configure additional things (including semi-standard things that aren't covered by IPMI, like whether or not the BMC should run a web server or a SSH server). For some of what can be done, see the manual page for FreeIPMI's ipmi-oem. Vendors naturally don't generally document their commands, but some of them have been reverse engineered or otherwise derived. Vendor tools don't have to use IPMI raw commands to talk to their BMCs, especially over the network, but it's often the easiest way to implement vendor specific local commands.

The IPMI standards have the concept of users with passwords and access levels, although what access level is allowed to do what can be vendor specific. Where BMCs have additional network access methods, such as web servers or SSH servers, it's common to reuse the IPMI users and their passwords for these accesses. However, I don't believe it's required; in theory a BMC could have completely separate IPMI authentication and web server authentication. Even if the user names are shared, BMCs may allow you to set which users are allowed to access what, so you could create IPMI-only users and web only users.

Many BMCs can be configured through the host system's BIOS. This is convenient to assign the BMC its IP address during initial system setup and perhaps to configure your standard BMC user name and access password. Once upon a time BMCs came with pre-set and widely known default usernames and initial passwords (although they were vendor specific). Recent legislation in various places has forced system vendors to move to per-server initial passwords.

BMCs are most often little computers running some ancient version of (embedded) Linux with equally ancient and limited versions of SSH daemons, web servers (and TLS support), and so on. This means that they have security vulnerabilities, and also means that if you leave them up for too long they may have issues. The completely reliable way to force a BMC reboot is to physically unplug the power cables from the system and then let it sit for a minute or two. BMCs sometimes offer a way to reboot the BMC through a software command. Historically this has often also rebooted the host server, so we tend not to use it or trust it. If we want the BMC to reboot, we pull the power.

(It follows that you never, ever want to expose your BMC to the Internet, or in general to any general use network. Gaining full access to a BMC often allows total compromise of the host system.)

IPMIAndBMCTerminology written at 22:55:17; Add Comment

2022-03-11

Filesystems can experience at least three different sorts of errors

Yesterday I wrote about how it would be nice if Linux exposed a count of disk errors and mentioned that some Linux filesystems do expose such a count of errors, but it's not clear what sort of errors they mean. This sounds like a peculiar thing to say, but in fact filesystems can experience at least two or three different sorts of errors. I will call these I/O errors, integrity errors, and structural errors.

An I/O error happens when the underlying storage device returns an error from a read or a write operation. Some filesystems have some internal redundancy that can hide such errors that occur in the right sort of places, but most of the time this is a direct user error that will correspond to an I/O error that's (hopefully) reported by the storage device. Because of this generally direct link between a lower level error and a filesystem error, a filesystem might opt not to track and report these errors, especially when they happen while reading user data instead of filesystem metadata.

An integrity error happens when the filesystem has some form of checksums over (some of) its on disk data, and the recorded checksum fails to match what it should be based on the data the filesystem got from the storage device. ZFS is famous for having checksums on both user data and filesystem metadata, although it's not the only filesystem to do this. There are other filesystems that have checksums that only apply to filesystem metadata. Almost all storage devices have some level of undetected bit corruption, and checksums can also detect various other sorts of disk damage (such as misdirected writes).

A structural error happens when the filesystem detects that some of its on-disk metadata is not correct, in any of the many specific ways for any particular sort of metadata to be incorrect. Sometimes this happens because on-disk data has been corrupted, but sometimes it happens because the filesystem code has bugs that caused something incorrect and invalid to be written out to disk (in which case the metadata may have perfectly valid checksums or other integrity checks). A filesystem that counts errors and can recognize integrity errors on metadata might not want to double count such errors as structural errors as well.

Given all of this, you can see that a filesystem that counts 'errors' without being more specific is rather unclear. Is this a count of all errors that the filesystem can detect, including I/O errors? Is this a count of all structural errors regardless of their cause, even if they come from detected (and logged) I/O errors or integrity errors? If a filesystem counts integrity errors somehow, does that count include failed integrity checks which at least implicitly happen when there are I/O errors?

(There are situations where you can experience I/O errors on I/O that that's only necessary to verify the integrity, not to return the requested data. You might reasonably count this as both an I/O error and an integrity error, as opposed to the situation where you have an I/O error on data that's directly necessary.)

Any given filesystem that reports or counts errors is going to have an answer to all of these questions, but there is no single set of obvious and clearly correct answers. It varies on a filesystem by filesystem basis, so if you only hear that a filesystem is reporting 'errors', you don't know as much about what it's reporting as you might think.

FilesystemsThreeErrorTypes written at 22:39:40; Add Comment

2022-03-08

Hardware can be weird, server and USB keyboard edition

Over on Twitter, I said some not entirely justified nasty things about the Ubuntu 20.04 server install ISO, because it wasn't letting me switch to an alternate Linux virtual console to get a shell so I could see what was going on with some things. The (current) Ubuntu documentation certainly implies that they do things differently, but that turned out not to be quite it. Instead, what was going on was an interesting and odd interaction between the keyboards I tried and the server. Since this is the modern age, it involves USB.

The server has two rear USB ports. On the first USB keyboard I started out using (the keyboard we normally use for server installs in our test area), none of the function keys worked (or seem to work) when the keyboard was plugged into either port. They probably didn't work in the BIOS (at one point I repeatedly tried to use F11 to get into a boot menu and had it ignored), and they definitely didn't seem to work in Linux, even when the system was booted and there multiple virtual consoles to switch between.

With the second keyboard I tried, nothing seemed to respond when I plugged it into one of the two USB ports (for example, nothing happened when I hit Return at the Linux login prompt). However, when I plugged it into the other USB port, everything worked, both the regular keyboard keys and the usual Alt plus Fn to switch Linux console virtual terminals.

Linux reports these keyboards slightly differently. The keyboard that is picky about its port but has fully working function keys is reported as 'USB HID v1.10 Keyboard', while the non-picky keyboard with no function keys is reported as 'USB HID v1.11 Device'. As far as I can see from kernel logs, Linux reports each keyboard the same regardless of which USB port it's plugged into.

This server has an IPMI that supports 'KVM over IP', which involves a virtual keyboard and mouse. This virtual keyboard and mouse show up as USB devices, a 'USB HID v1.00' Keyboard and Mouse, apparently on the same USB device. I wonder if this virtual device somehow interferes with a real keyboard on one but not the other USB port for some reason.

All of this is a useful reminder that sometimes the problem isn't that the BIOS or the OS installer is ignoring you. Sometimes you have a hardware problem, even if it's a weird one where only some of your keyboard's keys don't work.

(If I'm really energetic the next time I'm in the office I may try using usbmon to see the keyboard events for both keyboards in interesting situations.)

USBKeyboardServerWeirdIssue written at 22:50:31; Add Comment

2022-02-24

I've come to think that the Git index is a good thing

Over on Twitter I said something:

It's funny how "we don't have an equivalent of the git index" is now a DVCS anti-feature for me. I like what the index enables and it has a clear conceptual model, even if it can be sometimes annoying.

(I agree that it's a bad name.)

There are two sides of thinking that Git's index is a good thing. The first is the practical side, where I like what it enables me to do. On the level where all VCSes are user interfaces to mathematics, the operations and double checks that the Git index readily enables are both useful and reassuring. Being sure of what you're going to commit before you do it and having powerful selective commit capabilities are both useful, and they're even better together.

Of course you don't have to have Git's index in order to support these operations. Plenty of other VCSes support partial commits (even committing just parts of files and changes), checking what you're going to commit in advance, and so on. The other side of Git's index is that it provides a clear conceptual model for all of them. By creating a clear separation between 'what is in your working tree' and 'what has been prepared to be committed', Git makes it more straightforward to reason about how various things are going to behave. It also makes things more inspectable; Git's staging area is a real thing with a concrete existence and commands that manipulate and inspect it.

Git's index isn't perfect, apart from the name, and a fully elaborated system could be built with a different conceptual model (for example, you could have a model that commits are built up over time, amended and modified until you freeze them). But having a straightforward conceptual model that cleanly enables the useful user interface features is a valuable thing.

GitIndexGoodThing written at 22:27:05; Add Comment

2022-02-07

What does it mean for a filesystem to perform well on modern hardware?

Once upon a time, back in the days of spinning rust, whether or not you were getting good filesystem performance was sort of a straightforward question. Disks were slow and couldn't do very many seeks per second, so you could assess the performance of a filesystem by how close it got you to the raw disk read and write speed for sequential IO, or the raw disk seek limits for random IO. These days 'SSDs' (which is to say SATA and SAS SSDs) and especially NVMe drives have in one sense complicated the question drastically, along two dimensions.

First, modern operating systems can't necessarily even reach the raw performance that modern NVMe drives are capable of, especially through ordinary interfaces. When they do, it's a relatively recent development, and the internal kernel interfaces are likely not in place for filesystems to drive this sort of performance even in the best case. And the best case may be difficult to get to (for example, requiring large queue depths and large amounts of requests in flight). Serial attached SSDs (SATA and SAS) have lower limits for both bandwidth and IOPS, but even then it may be hard to hit their maximum performance under realistic situations even with an ideal filesystem.

Second, there is the question of how much performance you actually can use (or need) and the resulting question of how much differences among filesystems matter. Ultimately this is partially a question of Amdahl's law as applied to IO. If the kernel IO time dropped to zero (so that every IO operation was satisfied the moment it was made), there are plenty of programs that would not necessarily get much faster than they already are on most filesystems on most NVMe drives. Sometimes this is because IO is a relatively small portion of the program's operation; sometimes this is because the program is written in such a way that, for example, it does random IO with a single request at a time.

(One answer to this is to measure performance with common programs, since what ultimately matters is how the programs you're going to actually use behave. But this raises the question of what good performance for them looks like.)

All else being equal, more performance is always useful, even if it's just potential performance with programs written in just the right way. But all else isn't necessarily equal, since modern filesystems (and operating systems) differ in potentially important ways other than just performance. If you can establish a point where filesystems are "performing well", you can perhaps stop being concerned about just how well. But that leaves the question of how to decide on that point.

(I would like to think that a modern operating system could get more or less the full SATA bandwidth from a single SSD through any decent filesystem for streaming reads. But I haven't actually tried to test that. And I have no idea how that would go with NVMe drives.)

FilesystemPerfQuestionToday written at 23:36:27; Add Comment

(Previous 10 or go back to February 2022 at 2022/02/02)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.