(Unix) Directory traversal and symbolic links
If and when you set out to traverse through a Unix directory
hierarchy, whether to inventory it or to find something, you have
a decision to make. I can put this decision in technical terms,
about whether you use stat()
or lstat()
when identifying
subdirectories in your current directory, or put it non-technically,
about whether or not you follow symbolic links that happen to point
to directories. As you might guess, there are two possible answers
here and neither is unambiguously wrong (or right). Which answer
programs choose depends on their objectives and their assumptions
about their environment.
The safer decision is to not follow symbolic links that point to
directories, which is to say to use lstat()
to find out what is
and isn't a directory. In practice, a Unix directory hierarchy
without symbolic links is a finite (although possibly large) tree
without loops, so traversing it is going to eventually end and not
have you trying to do infinite amounts of work. Partly due to this
safety property, most standard language and library functions to
walk (traverse) filesystem trees default to this approach, and some
may not even provide for following symbolic links to directories.
Examples are Python's os.walk()
, which defaults
to not following symbolic links, and Go's filepath.WalkDir()
, which doesn't even
provide an option to follow symbolic links.
(In theory you can construct both concrete and virtual filesystems that either have loops or, for virtual filesystems, are simply endless. In practice it is a social contract that filesystems don't do this, and if you break the social contract in your filesystem, it's considered your fault when people's programs and common Unix tools all start exploding.)
If a program follows symbolic links while walking directory trees,
it can be for two reasons. One of them is that the program wrote
its own directory traversal code and blindly used stat()
instead
of lstat()
. The other is that it deliberately decided to follow
symbolic links for flexibility. Following symbolic links is
potentially dangerous, since they can create loops, but it also
allows people to assemble a 'virtual' directory tree where the
component parts of it are in different filesystems or different
areas of the same filesystem. These days you can do some of this
with various sorts of 'bind' or 'loopback' mounts, but they generally
have more limitations than symbolic links do and often require
unusual privileges to set up. Anyone can make symbolic links to
anything, which is both their power and their danger.
(Except that sometimes Linux and other Unixes turns off your ability to make symlinks in some situations, for security reasons. These days the Linux sysctl is fs.protected_symlinks, and your Linux probably has it turned on.)
Programs that follow symbolic links during directory traversal aren't wrong, but they are making a more dangerous choice and one hopes they did it deliberately. Ideally such a program might have some safeguards, even optional ones, such as aborting if the traversal gets too deep or appears to be generating too many results.
PS: You may find the OpenBSD symlink(7) manual page interesting reading on the general topic of following or not following symbolic links.
|
|