A cynical view on needing SSDs in all your machines in the future

July 29, 2015

Let's start with my tweets:

@thatcks: Dear Firefox Nightly: doing ten+ minutes of high disk IO on startup before you even start showing me my restored session is absurd.
@thatcks: Clearly the day is coming when using a SSD is going be not merely useful but essential to get modern programs to perform decently.

I didn't say this just because programs are going to want to do more and more disk IO over time. Instead, I said it because of a traditional developer behavior, namely that developers mostly assess how fast their work is based on how it runs on their machines and developer machines are generally very beefy ones. At this point it's extremely likely that most developer machines have decently fast SSDs (and for good reason), which means that it's actually going to be hard for developers to notice they've written code that basically assumes a SSD and only runs acceptably on it (either in general or when some moderate corner case triggers).

SSDs exacerbate this problem by being not just fast in general but especially hugely faster at random IO than traditional hard drives. If you accidentally write something that is random IO heavy (or becomes so under some circumstances, perhaps as you scale the size of the database up) but only run it on a SSD based system, you might not really notice. Run that same thing on a HD based one (with a large database) and it will grind to a halt for ten minutes.

(Today I don't think we have profiling tools for disk IO the way we do for CPU usage by code, so even if a developer wanted to check for this their only option is to find a machine with a HD and try things out. Perhaps part of the solution will be an 'act like a HD' emulation layer for software testing that does things like slowing down random IO. Of course it's much more likely that people will just say 'buy SSDs and stop bugging us', especially in a few years.)


Comments on this page:

By Ewen McNeill at 2015-07-29 20:31:57:

Somewhat related, it turns out that JVM statistics can cause non-trivial pauses in the JVM, due to waiting on disk IO. Where non-trivial is "barely noticeable" on a SSD, and fairly obvious pauses on spinning disks. It took them 4 months to figure that particular one out.

There's some support for debugging disk IO related issues in the aggregate ("wow, that disk is busy"), but not much for actually pinning it back to specific processes/actions of processes. In part it's complicated by a caching layer that decouples disk-write-needed (eg, file write, memory ditrying) and actual write back to disk. But in part it's the synchronous writes (ie, wait for completion) that hurt the most.

About the only other thing which saves many of these situations is that modern machines now have so much RAM that, eg, things that used to need IO-intensive external sorts or data structures can just be all done in RAM. But that does require developers to realise that, eg, using 1GB of RAM can be a better choice these days than 10GB of disk IO to read through it 10 times.

Ewen

By liam at unc edu at 2015-08-03 09:39:29:

MacOS 10 is nearly at that point now.

By Anon at 2015-08-18 10:31:41:

There is IOgrind (https://wiki.gnome.org/Apps/iogrind ) but it's many years old and has apparently bitrotted. I guess if you have root and don't mind an unrealistic setup you could use device mapper's dmdelay to slow down I/O...

So I’ve been catching up on Vladan Djeric’s weblog, and whaddayaknow…

Written on 29 July 2015.
« Why I still report bugs
My workflow for testing Github pull requests »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 29 01:20:14 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.