Why Unix's lseek() has that name instead of 'seek()'

January 2, 2024

Over on the Fediverse Matthew Garrett said something which sparked a question from Nicolás Alvarez:

@mjg59: This has been bothering me for literally decades, but: why is the naming for fstat/lstat not consistent with fseek/lseek

@nicolas17: why is it even called lseek instead of seek?

The most comprehensive answer to both questions came from Zack Weinberg's post, with a posting by наб and also some things from me adding additional historical information about lseek(). So today I'm going to summarize the situation with some additional information that's not completely obvious.

The first version of Unix (V1) had a 'seek()' system call. Although C did not yet exist, this system call took three of what would be ints as arguments, Since Unix was being written on the Digital PDP-11, a '16-bit' computer, these future ints were the natural register size of the PDP-11, which is to say they were 16 bits. Even at the time this was recognized as a problem; the OCR'd V1 seek() manual page says (transformed from hard formatting, and cf):

BUGS: A file can conceptually be as large as 2**20 bytes. Clearly only 2**16 bytes can be addressed by seek. The problem is most acute on the tape files and RK and RF. Something is going to be done about this.

V1 also had a closely related tell() system call, that gave you information about the current file offset. The V1 seek() was system call 19, and tell() was system call 20. The tell() system call seems to disappear rapidly, but its system call number remained reserved for some time. In the V4 sysent.c it's 'no system call', and then in the V5 sysent.c system call 20 is getpid().

In V4 Unix, seek() still uses what are now C ints, but seek()'s manual page documents a very special hack to extend its range. If the third parameter is 3, 4, or 5 instead of 0, 1, or 2, the seek offset is multiplied by 512. At this point, C apparently didn't yet have a long type that could be used to get 32-bit integers on the PDP-11, so the actual kernel implementation of seek() used an array of two ints (in ken/sys2.c), an implementation that stays more or less the same through V6's kernel seek() (still in ken/sys2.c).

(The V6 C compiler appears to have implemented support for a new 'long' C type modifier, but it doesn't seem to have been documented in the C manual or used in, eg, the kernel's seek() implementation. Interested parties can play around with it in places like this online V6 emulator.)

Then finally in V7, we have C longs and along with them a (renamed) version of the seek() system call that finally fixes the limited range issue by using longs instead of ints for the relevant arguments (the off_t type would be many years in the future). However, the V7 lseek() system call thriftily reuses seek()'s system call number 19 (cf libc/sys/lseek.s, and you can compare this against the V5 lseek.s). It seems probable that this is why V7 renamed the system call from seek() to lseek(), in order to force any old code using seek() to fail to link. Since V7 C did not have function prototypes (they too were years in the future), old code that called seek() with int arguments would almost certainly have malfunctioned, passing random things from the stack to the kernel as part of the system call arguments.

(Old V6 binaries were on their own, but presumably this wasn't seen as a problem in the early days of Unix.)

So the reason Unix uses 'lseek()' instead of 'seek()' is that it once had a 'seek()' system call that took ints as arguments instead of longs, and when this system call changed to take longs it was renamed to have an l in front to mark this, becoming 'lseek()'. The 'l' here is for 'long'. However, as covered by Zack Weinberg, this is an odd use of 'l' in Unix system call names. In the stat() versus lstat() case, the 'l' is for special treatment of symbolic names, and both versions of the system call still exist.


Comments on this page:

From 188.69.99.159 at 2024-01-05 05:38:31:

BUGS: A file can conceptually be as large as 2**20 bytes. Clearly only 2**16 bytes can be addressed by seek. The problem is most acute on the tape files and RK and RF. Something is going to be done about this.

Apparently the "something" was a change to have seek() take block numbers instead of bytes when it's used on a block device, and apparently that's why block devices exist as distinct from character devices...

From 188.69.99.159 at 2024-01-05 05:41:03:

Apparently the "something" was a change to have seek() take block numbers

(Ah, and the PDF is also from 1st Edition manual, I think. So it predates the 3/4/5 hack.)

Written on 02 January 2024.
« Alerting on our NTP servers having a high NTP stratum hasn't been useful
Ten years isn't long enough for maximum age settings »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Tue Jan 2 23:03:15 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.