2009-03-12
The problem with /var
today
When /var
was created, people took
everything in /usr
that got written to and just threw it all into one
filesystem. After that, /var
became the place that you put anything
(besides config files and the like) that needed to change or be written
to, regardless of why.
The problem is that /var
has wound up with two very distinct sorts
of data in it: private program data and public data. Private program
data is the entire collection of caches, databases, and other tracking
information that various programs use to do their jobs. Public data is
everything that users and sysadmins create and look at, with things like
/var/mail
, /var/log
, user crontabs, and so on. (On some systems this
may include web pages, SQL databases, and more.)
This matters because the two have very different importances and need very different sorts of handling for things like backups and operating system upgrades. Fundamentally, you don't care about private program data as long as the program works right and you probably actively want to not preserve it when you do things like reinstall the system or roll back to a previous system snapshot. However, you absolutely must preserve public data when you do things like reinstall the system.
That the two sorts of data are aggressively commingled in /var
causes
all sorts of practical problems for system management. Effectively,
/var
has been turned into both a system filesystem and user
filesystem, and the two generally require very different and conflicting
treatment. Attempts to patch this up in software are awkward.
(For example, Sun's Live Upgrade stuff goes to all sorts of contortions
to try to copy some bits of your public data between various copies and
snapshots of your system's /var
.)
The obvious solution is to split /var
into two filesystems, one for
each sort of data. Unfortunately, changing Unix filesystem habits is a
lot of work (and work that really needs to be done by Unix vendors in
order for it to stick).
The not so secret history of /var
Originally, Unix had no /var
; what is currently put there went
into /usr
instead (with some of it going into /etc
), so you had
/usr/log
, /usr/spool
, /usr/tmp
, and so on. Remnants of this
era still linger on in /etc
, where you still find a certain number of
frequently updated data files like /etc/passwd
.
(One might sensibly ask why Unix had both /tmp
and /usr/tmp
. My
guess is that it goes back to the days before /usr
, and so /tmp
had to be retained when /usr
was
added but at the same time people wanted a bigger scratch space, so
/usr/tmp
was created.)
Then along came the idea of diskless workstations (I believe originally
from Sun). Even back then, /usr
was the biggest system filesystem, so
no one was really enthused about the idea of each diskless system having
its own copy. Since at this point symlinks had been introduced, people
came up with the idea of moving everything writable from /usr
into a
new filesystem, /var
, and leaving symlinks behind so that people and
programs could continue to use old paths like /usr/tmp
. This left
/usr
read-only and shareable among all of your diskless clients,
which saved a lot of disk space.
(Indeed, a shared /usr
and the accompanying disk space savings
are probably what made diskless clients viable in the first place.)
Over the years since then, the symlinks have been progressively removed on many systems. But today you can still find them on some systems that especially value backwards compatibility, for example Solaris 10.
In addition to moving things from /usr
to /var
, a certain number of
things were relocated from /etc
to /var
. Practically speaking this
was much less important, since you needed a separate /
filesystem for
each diskless client anyways, but it did create a culture where system
daemons shouldn't normally write to /etc
to store PID files and so on.