How Usenet used to be a filesystem stress test
Once upon a time, there was Usenet.
Wait, that's not far enough back. Once upon a time, Usenet software
used the simplest, most straightforward way to store articles. Each
newsgroup was a separate directory in the obvious directory hierarchy
(so rec.arts.anime.misc was rec/arts/anime/misc under the news
spool root directory) and each article was a file in that directory.
Cross-posted articles were hardlinked between all of the newsgroup
directories.
(Given that hardlinks can't cross filesystem boundaries, you may
notice an assumption here. Yes, this caused problems in the not
too long run.)
Once Usenet started having much volume, this design turned Usenet
spool filesystems into a marvelous (or hideous) worse case stress test
for filesystem code:
- active newsgroups might have tens of thousands of articles, which
meant tens of thousands of entries in a single directory. At the
time when this started happening, all filesystems used linear
searches through directory data when looking up names.
(I believe but am not completely sure that Usenet was a major
driving force behind the initial work on non-linear directory
lookups.)
- file creates were usually randomly distributed around these
directories, partly because servers generally made no attempt to
batch articles from one newsgroup together when they propagated
things around.
- file deletes were semi-random; articles might expire earlier or
later than other articles in the same newsgroup for various reasons.
(The first Usenet software did truly random file deletes; later
software at least ordered the article deletions based on what
directory they were in.)
- for a long time, the files were quite small (Usenet spools
often needed the inode to data ratio adjusted to create more inodes).
Once alt.binaries got active, the size distribution was extremely
lumpy; a bunch of small files, a lot of very large ones, and very
little in the middle.
- Usenet was effectively write-mostly random IO (at many sites, most
Usenet articles were never read except by the system). Even when
read IO was 'sequential' in some sense, as someone read through a
bunch of articles in a single newgroups, it wasn't at the simple
OS level because of the small separate files.
(Just to trip filesystems up, there were some large files that
were read sequentially.)
Really, Usenet spools had it all, especially once the alt hierarchy
got rolling. Now you may have a better understanding of why I said
earlier that an old-style Usenet filesystem would be a ZFS scrub worst
case.
(And it is not surprising that the traditional Usenet spool format
was eventually replaced by a more optimized storage format in INN.)