Peculiarities about Unix's statfs()
or statvfs()
API
On modern Unixes, the official interface to get information about
a filesystem is statvfs()
; it's sufficiently official to be in
the Single Unix Specification as seen
here.
On Illumos it's an actual system call, statvfs(2)
. On many other Unixes (at least
Linux,
FreeBSD, and OpenBSD)), it's a library API on top of
a statfs(2)
system call (Linux, FreeBSD, OpenBSD). However you call it and however
it's implemented, the underlying API of the information that gets
returned is a little bit, well, peculiar, as I mentioned yesterday.
(In reality the API is more showing its age than peculiar, because it dates from the days when filesystems were simpler things.)
The first annoyance is that statfs()
doesn't return the number
of 'files' (inodes) in use on a filesystem. Instead it returns only
the total number of inodes in the filesystem and the number of
inodes that are free. On the surface this looks okay, and it probably
was back in the mists of time when this was introduced. Then we got
more advanced filesystems that didn't have a fixed number of inodes;
instead, they'd make as many inodes as you needed, provided that
you had the disk space. One example of such a filesystem is ZFS,
and since we have ZFS fileservers,
I've had a certain amount of experience with the results.
ZFS has to answer statfs()
's demands somehow (well, statvfs()
,
since it originated on Solaris), so it basically makes up a number
for the total inodes. This number is based on the amount of (free)
space in your ZFS pool or filesystem, so it has some resemblance
to reality, but it is not very meaningful and it's almost always
very large. Then you can have ZFS filesystems that are completely
full and, well, let me show you what happens there:
cks@sanjuan-fs3:~$ df -i /w/220 Filesystem Inodes IUsed IFree IUse% Mounted on <...>/w/220 144 144 0 100% /w/220
I suggest that you not try to graph 'free inodes over time' on a ZFS filesystem that is getting full, because it's going to be an alarming looking graph that contains no useful additional information.
The next piece of fun in the statvfs()
API is how free and used
disk space is reported. The 'struct statvfs
' has, well, let me
quote the Single Unix Specification:
f_bsize File system block size. f_frsize Fundamental file system block size. f_blocks Total number of blocks on file system in units of f_frsize. f_bfree Total number of free blocks. f_bavail Number of free blocks available to non-privileged process.
When I was an innocent person and first writing code that interacted
with statvfs()
, I said 'surely f_frsize
is always going to
be something sensible, like 1 Kb or maybe 4 Kb'. Silly me. As you
can find out using a program like GNU Coreutils stat(1)
, the actual
'fundamental filesystem block size' can vary significantly among
different sorts of filesystems. In particular, ZFS advertises a
'fundamental block size' of 1 MByte, which means that all space
usage information in statvfs()
for ZFS filesystems has a 1 MByte
granularity.
(On our Linux systems, statvfs()
reports regular extN filesystems as
having a 4 KB fundamental filesystem block size. On a FreeBSD machine
I have access to, statvfs()
mostly reports 4 KB but also has some
filesystems that report 512 bytes. Don't even ask about the 'filesystem
block size', it's all over the map.)
Also, notice that once again we have the issue where the amount of space in use must be reported indirectly, since we only have 'total blocks' and 'available blocks'. This is probably less important for total disk space, because that's less subject to variations than the total amount of inodes possible.
|
|