Wandering Thoughts archives


Peculiarities about Unix's statfs() or statvfs() API

On modern Unixes, the official interface to get information about a filesystem is statvfs(); it's sufficiently official to be in the Single Unix Specification as seen here. On Illumos it's an actual system call, statvfs(2). On many other Unixes (at least Linux, FreeBSD, and OpenBSD)), it's a library API on top of a statfs(2) system call (Linux, FreeBSD, OpenBSD). However you call it and however it's implemented, the underlying API of the information that gets returned is a little bit, well, peculiar, as I mentioned yesterday.

(In reality the API is more showing its age than peculiar, because it dates from the days when filesystems were simpler things.)

The first annoyance is that statfs() doesn't return the number of 'files' (inodes) in use on a filesystem. Instead it returns only the total number of inodes in the filesystem and the number of inodes that are free. On the surface this looks okay, and it probably was back in the mists of time when this was introduced. Then we got more advanced filesystems that didn't have a fixed number of inodes; instead, they'd make as many inodes as you needed, provided that you had the disk space. One example of such a filesystem is ZFS, and since we have ZFS fileservers, I've had a certain amount of experience with the results.

ZFS has to answer statfs()'s demands somehow (well, statvfs(), since it originated on Solaris), so it basically makes up a number for the total inodes. This number is based on the amount of (free) space in your ZFS pool or filesystem, so it has some resemblance to reality, but it is not very meaningful and it's almost always very large. Then you can have ZFS filesystems that are completely full and, well, let me show you what happens there:

cks@sanjuan-fs3:~$ df -i /w/220
Filesystem      Inodes IUsed IFree IUse% Mounted on
<...>/w/220        144   144     0  100% /w/220

I suggest that you not try to graph 'free inodes over time' on a ZFS filesystem that is getting full, because it's going to be an alarming looking graph that contains no useful additional information.

The next piece of fun in the statvfs() API is how free and used disk space is reported. The 'struct statvfs' has, well, let me quote the Single Unix Specification:

f_bsize    File system block size. 
f_frsize   Fundamental file system block size. 

f_blocks   Total number of blocks on file system
           in units of f_frsize. 

f_bfree    Total number of free blocks. 
f_bavail   Number of free blocks available to 
           non-privileged process. 

When I was an innocent person and first writing code that interacted with statvfs(), I said 'surely f_frsize is always going to be something sensible, like 1 Kb or maybe 4 Kb'. Silly me. As you can find out using a program like GNU Coreutils stat(1), the actual 'fundamental filesystem block size' can vary significantly among different sorts of filesystems. In particular, ZFS advertises a 'fundamental block size' of 1 MByte, which means that all space usage information in statvfs() for ZFS filesystems has a 1 MByte granularity.

(On our Linux systems, statvfs() reports regular extN filesystems as having a 4 KB fundamental filesystem block size. On a FreeBSD machine I have access to, statvfs() mostly reports 4 KB but also has some filesystems that report 512 bytes. Don't even ask about the 'filesystem block size', it's all over the map.)

Also, notice that once again we have the issue where the amount of space in use must be reported indirectly, since we only have 'total blocks' and 'available blocks'. This is probably less important for total disk space, because that's less subject to variations than the total amount of inodes possible.

unix/StatfsPeculiarities written at 23:46:13; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.