Wandering Thoughts archives

2023-09-07

(Unix) Directory traversal and symbolic links

If and when you set out to traverse through a Unix directory hierarchy, whether to inventory it or to find something, you have a decision to make. I can put this decision in technical terms, about whether you use stat() or lstat() when identifying subdirectories in your current directory, or put it non-technically, about whether or not you follow symbolic links that happen to point to directories. As you might guess, there are two possible answers here and neither is unambiguously wrong (or right). Which answer programs choose depends on their objectives and their assumptions about their environment.

The safer decision is to not follow symbolic links that point to directories, which is to say to use lstat() to find out what is and isn't a directory. In practice, a Unix directory hierarchy without symbolic links is a finite (although possibly large) tree without loops, so traversing it is going to eventually end and not have you trying to do infinite amounts of work. Partly due to this safety property, most standard language and library functions to walk (traverse) filesystem trees default to this approach, and some may not even provide for following symbolic links to directories. Examples are Python's os.walk(), which defaults to not following symbolic links, and Go's filepath.WalkDir(), which doesn't even provide an option to follow symbolic links.

(In theory you can construct both concrete and virtual filesystems that either have loops or, for virtual filesystems, are simply endless. In practice it is a social contract that filesystems don't do this, and if you break the social contract in your filesystem, it's considered your fault when people's programs and common Unix tools all start exploding.)

If a program follows symbolic links while walking directory trees, it can be for two reasons. One of them is that the program wrote its own directory traversal code and blindly used stat() instead of lstat(). The other is that it deliberately decided to follow symbolic links for flexibility. Following symbolic links is potentially dangerous, since they can create loops, but it also allows people to assemble a 'virtual' directory tree where the component parts of it are in different filesystems or different areas of the same filesystem. These days you can do some of this with various sorts of 'bind' or 'loopback' mounts, but they generally have more limitations than symbolic links do and often require unusual privileges to set up. Anyone can make symbolic links to anything, which is both their power and their danger.

(Except that sometimes Linux and other Unixes turns off your ability to make symlinks in some situations, for security reasons. These days the Linux sysctl is fs.protected_symlinks, and your Linux probably has it turned on.)

Programs that follow symbolic links during directory traversal aren't wrong, but they are making a more dangerous choice and one hopes they did it deliberately. Ideally such a program might have some safeguards, even optional ones, such as aborting if the traversal gets too deep or appears to be generating too many results.

PS: You may find the OpenBSD symlink(7) manual page interesting reading on the general topic of following or not following symbolic links.

unix/DirectoryTraversalAndSymlinks written at 23:23:39; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.