My performance intuitions and the complexities of SSD performance

October 6, 2022

Back in the old days of mechanical hard drives (HDDs, aka 'spinning rust'), it was possible to feel that you had a reasonable general understanding of their performance because they were physical objects with relatively straightforward general operating principles. For example, they read your data by moving 'the' drive head to the track and then listening to the track as it spun past underneath the head to read either the individual sectors you wanted or (toward the end) the entire track (and then extracting what you wanted). You could almost always assume that these physical actions were the limiting factor on IO performance, and for a long time they didn't change very fast (especially the time it took to move the head to a track).

Flash based solid state disks are much more complex and opaque objects, without this reassuring mechanical nature. A SSD has your data in 'flash', which is often divided up into more than one bank, which can allow some reads (and maybe even writes) to happen in parallel. There is a Flash Translation Layer that turns 'disk block addresses' into locations in the flash; necessary portions of this FTL itself may need to be fetched from flash, or maybe the SSD loads it into RAM right away when it powers on. Writes are famously even more complicated, and because of that complexity there's often background processing happening in the SSD that may affect how your IO performs.

(The many pieces involved in a SSD's performance also provide plenty of room for differences between SSDs, much more so than there's been in HDDs for a long time. Manufacturers can even switch components and change designs within the same model over time, with visible performance effects (generally it goes down).)

Because there are no slow mechanical parts to SSDs (only somewhat slow flash parts), the speed of everything else in the system also increasingly matters. This is both the speed that the SSD's CPUs can do their work (and they have plenty of work) and the speed that the host system can send them work to do and respond when the work is complete. Because they have such fast communication between the host and the SSD, decent NVMe disks can make this very visible, with it requiring significant efforts on the host to achieve their theoretical performance.

I had (and still have) performance intuitions for HDDs, although I have to revise my HDD intuitions every so often. I don't really have performance intuitions for SSDs, except that I expect individual SATA SSDs to come close to saturating the theoretical SATA maximum read rate. Not only are SSDs pretty complex objects with hard to understand performance (and performance that can vary drastically from model to model), but the conditions around them are constantly changing since the host-side software keeps changing (and since it matters, you may need to think about reasonably specific configurations, not general intuitions).

My lack of performance intuitions (especially ones I trust) cases a shadow over my feelings about various pieces of software design. For the case that started me thinking about this, I have relatively little feel for whether my old entry on what problems the Maildir mail storage format does and doesn't solve is still reasonable applicable, and under what circumstances (SATA SSD versus NVMe SSD, for example).

(And even if it still hurts on an NVMe SSD to read a bunch of little files instead of read sequentially through one big one, it probably hurts for different reasons, such as overheads in OS system calls and issuing all those separate IOs.)

PS: Even when I can remember general SSD performance numbers, these 'raw' numbers don't necessarily translated into observed system performance the way that they did in the era of HDDs. A HDD that could do 150 seeks per second would probably deliver 150 random reads a second through your filesystem (because it was by far the limiting factor). A NVMe SSD that can do 10,000 read IOPS may or may not deliver 10,000 random reads through the filesystem, OS, kernel, and hardware that you're using (because the NVMe SSD may well no longer be the limiting factor).

Comments on this page:

SSDs are indeed complex virtualized I/O devices and a lot depends on the controller. That said, the nature of NAND flash erase-rewrite means that a SSD controller is essentially implementing something that looks like a copy-on-write journaling filesystem, which is why actual journaling filesystems perform better on SSDs, even if they may seem redundant. The key insight is turning random I/O into sequential I/O, as long as you remember to TRIM the released old copies. For something like Maildir, your filesystem matters more than the format itself.

There are some database systems like LevelDB/RocksDB or Aerospike that were explicitly designed with SSDs in mind, but in my experience SQLite outperforms RocksDB despite using traditional B-tree index structures instead of LSM trees.

Written on 06 October 2022.
« The Maildir mail storage format doesn't seem to work well over NFS
How old various Unix signals are »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 6 23:20:31 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.