Beware of trying to compare the size of subtrees with du
One of the things I like to do to understand space usage is to use
du
to look at both the aggregate usage of a directory tree and a
breakdown of where the space is going (often with the handy -h options
to GNU du
and sort
). This is also something
you may wind up doing if you want to compare the disk space usage of
two versions of a directory tree and its subtrees (for example, the
disk space usage in /
for two systems). However, there is a somewhat
subtle trap hiding in a comparison of subtree sizes, and that trap is
hardlinks.
The trap is very clearly described in the du
info documentation:
If two or more hard links point to the same file, only one of the hard links is counted. The FILE argument order affects which links are counted, and changing the argument order may change the numbers and entries that ‘du’ outputs.
(This can be turned off with the -l
option, if you remember.)
This is a fair decision on du
's part. It wants to give you an
honest view of how much space in total is consumed by the top level
argument, which means counting hard links pointing to the same file
only once. Once it does that, it's easier to report the space usage
of hard linked files only in the first subtree it finds them in, and
it would be odd if the sum of the sizes of subtrees didn't add up to
the top level size.
However, this has some surprising consequences. First, you can get
different answers if you do 'du -h fred/barney
' and if you do 'du
-h fred
' and look at the line for fred/barney. If there are some
hard links in fred/barney that are for files in other parts of the
fred/ tree, the first du
will include them but the second du
might
exclude them from the fred/barney total, because they've already been
counted in another subtree.
Second, two versions of the same directory tree may report a different space breakdown between subtrees even if the total space is the same. GNU Du doesn't promise to traverse directory trees in any particular order, which means it may encounter hard links in a different order in two versions of a directory tree. This can result in the space of hard linked files that cross between subtrees being attributed to different subtrees in different versions of the directory tree.
If the two directory trees are already only mostly the same and you're
trying to compare them to pick out the differences, all of this can wind
up leading you astray. If you du
each tree and then look for space
differences in the subtrees to identify where things differ, you can
wind up seeing a distorted picture of what's really going on.
If you need to compare space usage all of the way down, I think your
best choice is to remember 'du -l
'. It will give you a misleading
picture about the total, aggregate space usage, but at least you'll
have an honest picture of where things differ. If you want to check
total aggregate space usage accurately, you can only do 'du -hs
'
on a single thing at once and you'll have to manually work through
the trees piece by piece.
Comments on this page:
|
|