LiveJournal and the path to NoSQL
A while back on Mastodon, I said:
Having sort of distantly observed the birth of NoSQL from the outside, I have a particular jaundiced view of its origins that traces it back to LiveJournal's extensive wrangling with MySQL (although this is probably not entirely correct, since I was a distant outside observer who was not paying much attention).
Oddly, this view is more flattering to NoSQL than many takes on its origins that I've seen. LJ's problems & solutions in the mid 00s make a good case for NoSQL.
LiveJournal aren't the only people who had problems with scaling MySQL in the early and mid 2000s, but they do have the distinction of having made a bunch of public presentations about them, many of which I read with a fair degree of interest back in the days. So let me rewind time to the early 2000s and quickly retrace LiveJournal's MySQL journey. To set the stage, this was a time before clouds and before affordable SSDs; if you ran a database-backed website at scale, you had real machines somewhere and they used hard drives.
LiveJournal started out with a single MySQL database machine. When the load overwhelmed it, they added read replicas (and then later in memory caching, creating memcached). However, their read replicas were eventually overwhelmed by write volume; reads can be served from one of N replicas, but the master and all replicas see the same write load. To deal with this, LiveJournal wound up manually sharding parts of their overall MySQL database into smaller 'user clusters', having already accepted that they never wanted to do what would be the equivalent of cross-cluster database joins.
At the point where you've manually sharded your SQL database, from a global view you don't really have an SQL database any more. You can't do cross-shard joins, cross-shard foreign keys, or cross-shard transactions, and you don't even have a guaranteed schema for your database since every shard has its own copy and they might not be in synchronization (they should be, but stuff happens). You're almost certainly querying and modifying your database through a higher level application library that packages up the work of finding the right shard, handling cross-shard operations, and so on, not by opening up a connection and sending SQL down it. The SQL is an implementation detail.
These problems aren't LiveJournal's alone. You need all of this sharding to operate at scale in the mid 2000s, because hard drives imposed a hard limit on how many reads and writes a second you could do. You couldn't get away from the tyranny of hard drive limitations on (seek) IO operations a second the way you can today with SSDs and now NVMe drives (and large amounts of RAM). And if you have to shard and don't have an SQL database after you shard, you might as well use a database that's designed for this environment from the start, one that handles the sharding, replication, and so on for you instead of making you build your own version. If it's really fast, so much the better.
(This idea is especially attractive to small startups, who don't have the people to build LiveJournal level software and libraries themselves. Small startups are busy enough trying to build a viable product without trying to also build a sharded database environment.)
Today, in a world of SSDs, NVMe drives, large amounts of RAM, cloud providers, managed SQL compatible database offerings, and so on, it's tempting to laugh at the idea of NoSQL. But at the time I think it looked like a sensible response to the challenges that large websites were facing.
(Whether so many people should have adopted NoSQL databases is a separate issue. But my impression is that startups are nothing if not ambitious.)