Why Unix's lseek()
has that name instead of 'seek()
'
Over on the Fediverse Matthew Garrett said something which sparked a question from Nicolás Alvarez:
@mjg59: This has been bothering me for literally decades, but: why is the naming for fstat/lstat not consistent with fseek/lseek
@nicolas17: why is it even called lseek instead of seek?
The most comprehensive answer to both questions came from Zack
Weinberg's post,
with a posting by наб and also
some things from me
adding additional historical information about lseek()
. So today
I'm going to summarize the situation with some additional information
that's not completely obvious.
The first version of Unix (V1) had a 'seek()
' system call. Although
C did not yet exist, this system call took three of what would be
int
s as arguments, Since Unix was being written on the Digital
PDP-11, a '16-bit' computer,
these future ints were the natural register size of the PDP-11,
which is to say they were 16 bits. Even at the time this was
recognized as a problem; the OCR'd V1 seek()
manual page
says (transformed from hard formatting, and cf):
BUGS: A file can conceptually be as large as 2**20 bytes. Clearly only 2**16 bytes can be addressed by seek. The problem is most acute on the tape files and RK and RF. Something is going to be done about this.
V1 also had a closely related tell()
system call, that gave you information about the current file offset.
The V1 seek()
was system call 19, and tell()
was system call
20. The tell()
system call seems to disappear rapidly, but its
system call number remained reserved for some time. In the V4
sysent.c
it's 'no system call', and then in the V5 sysent.c system
call 20 is getpid()
.
In V4 Unix, seek()
still uses what are now C ints, but seek()
's
manual page
documents a very special hack to extend its range. If the third
parameter is 3, 4, or 5 instead of 0, 1, or 2, the seek offset is
multiplied by 512. At this point, C apparently didn't yet have a
long
type that could be used to get 32-bit integers on the PDP-11,
so the actual kernel implementation of seek()
used an array of
two ints
(in ken/sys2.c),
an implementation that stays more or less the same through V6's
kernel seek()
(still in ken/sys2.c).
(The V6 C compiler appears to have implemented support for a new
'long
' C type modifier, but it doesn't seem to have been documented
in the C manual or used in, eg, the kernel's seek()
implementation.
Interested parties can play around with it in places like this
online V6 emulator.)
Then finally in V7, we have C long
s and along with them a (renamed)
version of the seek()
system call that finally fixes the limited
range issue by using long
s instead of int
s for the relevant
arguments (the off_t type would be many years in the future).
However, the V7 lseek()
system call
thriftily reuses seek()
's system call number 19 (cf libc/sys/lseek.s,
and you can compare this against the V5 lseek.s).
It seems probable that this is why V7 renamed the system call from
seek()
to lseek()
, in order to force any old code using seek()
to fail to link. Since V7 C did not have function prototypes (they
too were years in the future), old code that called seek()
with
int
arguments would almost certainly have malfunctioned, passing
random things from the stack to the kernel as part of the system
call arguments.
(Old V6 binaries were on their own, but presumably this wasn't seen as a problem in the early days of Unix.)
So the reason Unix uses 'lseek()
' instead of 'seek()
' is that
it once had a 'seek()
' system call that took ints as arguments
instead of longs, and when this system call changed to take longs
it was renamed to have an l in front to mark this, becoming
'lseek()
'. The 'l' here is for 'long'. However, as covered by
Zack Weinberg, this
is an odd use of 'l' in Unix system call names. In the stat() versus
lstat() case, the 'l' is for special treatment of symbolic names,
and both versions of the system call still exist.
Comments on this page:
|
|