Wandering Thoughts archives

2014-03-16

You don't have to reboot the system if init dies

One of the thing that makes PID 1 special on many systems is that if it ever exits or dies for any reason, the system will reboot. This behavior was introduced by BSD Unix (V7 ignored the possibility) and makes a certain amount of sense; init is crucial both for reaping orphan processes and restarting serial port logins. If it goes away, rebooting the system is an easy way to hopefully fix the situation.

However, this behavior is not set in stone. There are several alternatives. The first would be to simply have the kernel cope with no PID 1, handling and reaping orphan processes itself internally in some way (and possibly providing some special way for user level to restart a new PID 1). The second is for the kernel to re-exec init as PID 1 if necessary. If PID 1 exits, the kernel would not tear down its process but instead act as if it had done an exec. Ideally this would be accompanied by some way for init to store and then reload important state. Done right this actually provides a great way for init to transition itself into a new version; just record the current state, exit, and let the kernel re-exec the new init binary.

Perhaps the second behavior sounds odd and crazy. Then I should probably tell you that this is current Solaris behavior and nothing seems to have exploded as a result. In other words we already have an existence proof that it's possible to change the semantics of PID 1 exiting, so we could adopt it elsewhere if desired.

Apart from the innate conservatism of Unixes, I think one reason that other Unixes haven't done this is that it's almost never necessary anyways. Since init not exiting is so crucial today people have devoted a lot of engineering effort to make sure that it doesn't happen and have been quite successful at it. Even radically different and complex systems like Upstart and systemd have been extremely stable this way in practice.

(Also, this 're-exec init on failure' behavior needs cooperation from your init, both so that init doesn't always start trying to boot the system when it's executed and so that it journals state periodically so that a new init can pick it up again. This makes it easier to add in certain sorts of Unixes, ie the ones where one team can control both kernel changes and init changes.)

InitDeathAndReboots written at 00:47:56; Add Comment

2014-02-14

The good and bad of the System V init system

The good of System V init is that it gave us several big improvements over what came before in V7 and BSD Unix. First and largest, it modularized the boot process; instead of a monolithic shell script (or two, if you counted /etc/rc.local) you had a collection of little ones, one for each separate service. This alone is a massive win and enabled all sorts of things that we take for granted today (for example, casually stopping or starting a service).

The other big change is that System V init turned the entire work of init from a collection of hacks into a systematic and generalized thing. It formally defined runlevels and runlevel transitions and created in /etc/inittab a general mechanism for specifying all of the work init did, from booting to running gettys on serial lines (or running anything) to how to reboot the system. System V init removed the magic and hardcoding in favour of transparency. Things like reboot stopped killing processes and making special system calls and turned into 'tell init to go into runlevel ...', and then /etc/inittab and runlevel transitions said what to do so that this actually rebooted the machine. In the process it added a way to specify how services shut down.

(Simply defining runlevels formally meant that other systems could now tell what state the system was in and behave differently between eg single user mode and multiuser mode.)

The very general and high level view of the bad of the System V init system is that fundamentally all it does is blindly run shell scripts (and that only when the runlevel changes). This creates all sorts of lower-level consequences:

  • SysV init doesn't know what services are even theoretically running right now, much less which ones of them might have failed since they were started.

  • It doesn't know what processes are associated with what services. Even individual init scripts don't know this reliably, especially for modern multi-process services.

  • Even init scripts themselves can't be certain what the state of their service is. They must resort to ad hoc approaches like PID files, flag files for 'did someone run <script> start at some time this boot', checking process listings, and so on. These can misfire.

  • Services are restarted in a different environment than how they are started on boot. Often contamination leaks in to a restarted service (in the form of stray environment variables and other things).

  • Output from services being started is not logged or captured in any systematic way. Many init scripts simply throw it away and there's certainly no official proper place to put it.

  • The ordering of service starts is entirely linear, by explicit specification and guarantee. System V init explicitly says 'I start things in the following order'. There is no parallelism.

  • Services are only started and stopped when the runlevel changes. There is no support for starting services on demand, on events, or when their prerequisites become ready (or stopping them when a prerequisite is being shut down).

  • System V init has no idea of dependencies and thus no way for services to declare 'if X is restarted I need to be restarted too' or 'don't start me until X declares itself ready'.

  • There is no provision for restarting services on failure. Technically you can give your service a direct /etc/inittab entry (if it doesn't background itself) but then you move it outside of what people consider 'the init system' and lose everything associated with a regular init script.

  • Since init scripts are shell scripts, they're essentially impossible for programs to analyse to determine various things about them.

  • It's both hard and system-dependent to write a completely correct init script (and many init scripts are mostly boilerplate). As a result it's common for init scripts to not be completely correct.

  • Init scripts are not lightweight things in general, either in reading them to understand them or in executing them to do things.

In theory you can try to fix many of these issues by adding workarounds in your standard init script functionality. Your 'standard' init script utilities would capture all daemon output in a documented place and way, start everything in cgroups (on Linux) or containers to track processes reliably, have support for restarting services on failure, carefully scrub every last bit of the environment on restarts, monitor things even after start, et cetera et cetera, and then you would insist that absolutely every init script use your utilities and only your utilities. In practice nothing like this has ever worked in practice (people always show up with init scripts that have bugs, take shortcuts, or do not even try to use your complex 'standard' init utilities) and the result would not particularly be a 'System V init system' except in a fairly loose sense.

(It would also make each init script do even more work and run even more slowly than they do now.)

SystemVInitGoodBad written at 02:41:30; Add Comment

2014-02-13

Init's (historical) roles

Historically (by which I mean since at least V6 Unix), init aka PID 1 has had three and then four roles:

  1. It inherits orphan processes, ie processes that have had their regular parent exit. Doing this almost certainly simplified a bunch of V7 kernel code because it meant that every process has a parent process.

  2. Starting up the user level of Unix on boot. Originally this was done by running a monolithic shell script, as can still be sort of seen in OpenBSD. System V init modularized and generalized it into the multi-file form.

  3. Starting, managing, and restarting the getty processes for the console and (other) serial lines. System V init generalized this so that init started and restarted whatever you told it to via entries in /etc/inittab.

  4. Shutting down the user level of the system and rebooting. This role first appeared in System V init, using the modularity that it had introduced for booting. Modern BSDs also give init responsibility for rebooting (and it will run a shell script as part of this), but as late as 4.4 BSD reboot(8) did almost all of the work itself and there was no concept of running a shell script to bring services down in any orderly way; reboot(8) just killed everything in sight.

(Really. You can read the 4.4 BSD reboot(8) source if you want, it's not long. The violence starts at the 'kill(-1, SIGTERM)'.)

Putting the three (and then four) roles on the shoulders of a single process is likely due to both conservation of resources in early Unixes (given that they ran in very limited environments they likely didn't want to take up memory with extra programs) and simple least complexity and effort. Once you had init as the inheritor of orphan processes you might as well make it do all the other roles since it was already there. Why throw in additional programs without a good need? It probably helped that even in V7 the other two roles were pretty simple and minimal, per eg the V7 /etc/rc.

As a historical note, it was BSD Unix that decided that init was so crucial that the system should be rebooted if it ever exited. V7 Unix will probably get into an odd state if init ever exits but as far as I can tell from the kernel source PID 1 is not treated specially as far as exiting goes; as a practical matter V7 Unix just assumes it will never happen. Even what happens if /etc/init can't be executed on boot is not strictly a kernel thing in V7.

(In the initial environment of BSD, this decision was probably doubly correct. Even if you never have to deal with any orphaned processes or the kernel cleaned them up itself (let's wave our hands aggressively here), losing init means that getty processes will not be restarted on serial lines when people log out, which over time makes it impossible for anyone to log in. Of course in the modern era of networked machines this is no longer such an issue and you probably care a lot more about sshd than about gettys.)

Some modern init systems have split some or most of these roles out from PID 1. Solaris, for example, moved everything except the first role to separate processes (the SMF stuff runs in svc.startd et al and getty processes are handled through ttymon and sac).

InitHistoricalRoles written at 00:21:04; Add Comment

2014-01-29

One cause of Linux's popularity among Unixes

Regardless of what you feel about it, I think that most people can agree that Linux is winning whatever is left of the Unix wars. It isn't the only Unix left but for a fairly long time now it's been the leading one, often the default choice. You can attribute this to good PR if you want to, but I happen to think that that's a mistake. Linux has attracted people partly because it has genuine attractions.

In light of my rant about the waste inherent in building packages yourself, it has struck me that one such advantage has been Linux's general wide availability of packages. As I mentioned, system administrators really appreciate not having to spend their limited time compiling ordinary things and Linux is very good at that; most major Linux distributions will give you a precompiled version of almost any standard Unix program you could want (or at least a precompiled program to do almost any standard job). I don't think it's an accident that one of the long term favorite distributions is Debian, which has one of the biggest package archives going.

(Prepackaged software is not good enough if you need a specific version of something compiled in a specific way. But for many Unix machines you just need a working and reasonably current version of whatever. And there are a lot of packages on many machines where the exact details are not crucial.)

At this point I have to mention FreeBSD's ports collection, which even comes in precompiled packages; logically one would expect this to be just as good a selling point for FreeBSD as a Linux distribution with a similar package selection. However, I'm not convinced that it is in practice, and for why I'll point at the name: 'ports'. Well, more what the name means or is perceived as meaning.

Debian's vast collection of packages are all Debian together. Some of them are more important than others, but they are all part of the Debian whole. The dividing line between really important and less and less important is both relatively opaque to outsiders and somewhat subject to debate; it can get pushed back and forth if people want. By contrast at least to an outsider FreeBSD has a relatively sharp dividing line; you have FreeBSD core and then you have ports. Ports is clearly not the same and to drive the point home they install things into /usr/local instead of /usr. FreeBSD is probably at least as committed to ports as, say, Ubuntu is to packages in universe. But I'm not convinced that non-FreeBSD sysadmins who are looking at the situation really believe down in their guts that FreeBSD is as committed to ports as Ubuntu is to main (even if it is, and I don't believe it's that committed to all ports). And I think that that makes a difference.

(I am talking about non-FreeBSD sysadmins here because these are often the people who are making decisions about whether or not to use FreeBSD. Also that's the situation I'm in myself, so I don't know how it looks from the inside but I can talk with at least a little bit of authority how it feels from the outside.)

PS: I haven't mentioned commercial Unixes here because oh boy package availability on commercial Unixes, that's a funny joke. Provided by third parties at best. Red Hat Enterprise Linux is sort of in the same boat but at least they woke up and I think started doing something with EPEL.

LinuxPopularityOneCause written at 01:09:23; Add Comment

2014-01-25

The origin of RCS (the version control system)

Let's start with the tweets:

@johnregehr: quick! who remembers a revision control system before RCS?

@thatcks: SCCS. Bonus trivia: RCS exists because getting SCCS required paying AT&T extra money and universities don't have that.

I've mentioned this before in passing, but I might as well tell the full story (or at least folklore) here. The disclaimer is that this is the story as I heard it, not definitive history. A lot of Unix history goes around as folklore, or went around in the days when people were still passing Unix history around.

Version 7 Unix didn't ship with any version control system at all. SCCS, the first Unix version control system, first started appearing in in AT&T's PWB. PWB and things unique to it were not covered by AT&T's generous V7 university source code licenses; where you could get them at all (and I'm not sure outsiders could in any form before System III) AT&T apparently wanted extra money for them. Universities of course did not feel like paying extra money for niceties and anyways they weren't using PWB, they were using V7 or later BSD and so would have had to port anything they wanted from PWB to BSD. All of this meant that if you were using Unix at a university in the late 1970s and early 1980s you could look wistfully on SCCS from afar but you almost certainly could not get a copy.

Which is where RCS comes from. As the Wikipedia entry helpfully mentions, Walter Tichy wrote the initial version of RCS while at Purdue, where he had access to V7 and BSD but I assume not PWB. As the folklore goes, he wanted version control, could not get SCCS, and so wrote his own. As one did with Unix programs in an academic environment at the time, he released it for general use. Since quite a lot of universities were in a similar position of wanting some sort of version control on their Unix systems but not having SCCS, it got widely adopted.

As I mentioned before RCS required (V7) Unix source code (specifically for diff), which might strike you as odd if Tichy wrote it from scratch. As I remember the story, RCS required some additional diff features and and in an early 1980s university environment with V7 and BSD source code the easiest way for Tichy to get that was to modify the BSD diff to support what RCS needed. The reason you can get RCS widely today is GNU diff, which is both free and directly supports the features that RCS needs without any patching.

(I think that one of the diff features RCS needed was a three-way diff. I believe it may have also wanted a somewhat different format of diff output, given GNU diff's -n argument.)

(The official RCS home page has some early RCS papers online.)

RCSOrigin written at 00:20:55; Add Comment

2013-12-31

Two uses of fmt

The venerable fmt program is not something that I normally think of as a command that I use in my scripts or on the fly pipelines; I usually think of it more as, say, something that I use to reflow paragraphs in vi. But it has quietly been making its way into an increasing number of them because it turns out that fmt is the easy and lazy way to do two symmetrical things: to word-split lines and to merge 'one word per line' output back to a single line.

Word splitting is straightforward:

somecmd | fmt -1

Every version of fmt that I've seen accepts this and does the right thing; you get one word per line after fmt.

Word joining is much more annoying and it's all because of GNU coreutils. Real versions of fmt accept very large format widths:

/bin/ls -1 | fmt -9999

Unfortunately the GNU coreutils version of fmt has a maximum width of 2500 characters. Worse, it has a 'goal' width that defaults to 93% of the actual width, so if you're worried about getting close to that limit you need to use:

/bin/ls -1 | fmt -2500 -g2500

In practice I usually use 'fmt -999' in command pipelines because my actual output line is going to be nowhere near 999 characters to start with.

(Usually when I'm doing word merging it's because I'm going to take the one line that results and paste it to something else, which imposes a relatively modest line length limit in practice.)

What this points out is that fmt is not really the ideal solution to this (and in fact the FreeBSD version of fmt also has oddities, such the man page's description of behavior around the -n switch). The traditional Unix solution to these problems is tr, using it to either turn spaces to newlines or newlines to spaces. The problem for me in practice is that to use tr I need to remember or re-derive the octal value of newline (it's \012, by the way) and that is just a bit too much hassle. So I use fmt, vague warts and all.

(The other drawback of tr is that 'tr " " "\012"' will have a trailing space and no final newline. Usually this is not a big deal.)

Actually in writing this I've discovered that I'm behind the times. All of the versions of tr that I use today will accept \n instead of the octal literal. Either there was a day when this wasn't true or I just never read far enough in the tr manpage (and had it stick) to notice that particular feature. (I'm probably still going to keep on using fmt, though.)

FmtTwoUses written at 23:31:01; Add Comment

2013-12-30

My growing entanglement into vi

It started with vi becoming my sysadmin's editor, the editor that I used for quick edits because it was everywhere, worked in minimal environments, and it started fast. But of course it didn't stop there. Any good tool has a virtuous circle where more use makes you more familiar with it and thus makes it easier to use so you use it more; vi goes well beyond that in terms of rewarding extended use. Vi's march into my editing life has not been fast but it's feeling more and more relentless as time goes by, especially when I do things like specifically configure git to use vi instead of my normal default. I'm not using vi pervasively quite yet, but increasingly my major holdout (writing email in my full email environment) feels a little bit awkward.

(My normal default $EDITOR is a script that tries to intelligently pick the editor to use based on my current environment based on things like whether or not I have X available.)

This has not fundamentally changed my view of vi as a whole (it remains not my favorite editor). I am simply being seduced by convenience and familiarity, and running into the limits and issues in my major other editor. Not that vi is bad (rather the contrary), but I still miss things from my other editors and often would sort of prefer to be using them.

(Possibly this attachment to my major other editor is just emotion speaking.)

While I've been additional learning vi (well, vim) features slowly over time, I still have not really attempted to become solidly familiar with Vim's advancements over the core vi editing commands (I'm going to wave my hands about the reasons why, but see above about vi still not being my favorite editor). If I get more seriously into vi, and it seems inevitable that I will, I should probably change that. My obvious weak areas are the areas where vi itself is weak: working fluidly with multiple files and also with split screens for editing two files simultaneously. Mastering doing this in Vim would remove one significant reason that I revert to other editors.

(I will probably always edit Python, C, and Go code in GNU Emacs when I have a choice. But there is a fair amount of other multi-file work that would at least be more convenient if I knew what I was really doing in Vim.)

I know that Vim has a universe of advanced text movement and text manipulation commands but I'm honestly not sure that I feel much interest in learning them. The mere fact that there is a universe of them is kind of daunting and I have no confidence that they'd speed up the sort of editing work that I do very much. Probably some of them would, so I suppose I should at least glance over the list to see if anything stands out.

(This has come out more rambling and thinking aloud than I thought it would. I do think that there's something interesting about how vi has wormed its way into my computing life as more and more the editor I reach for, but I don't have the words for it right now.)

ViEntanglement written at 02:57:39; Add Comment

2013-12-15

Making large selections in xterm (and urxvt and Gnome Terminal)

Suppose that you have a large chunk of output in a terminal window, specifically more than a full screen's worth, and you want to copy it into an email message, text file, or however else you may be logging it for the record. As I knew vaguely but had never really read up on or used until very recently, it turns out that there is a convenient way to do this in xterm. Specifically, this is what the right mouse button is for; it extends the selection from where it is until the current point.

So in xterm what you do to select a huge selection is select a bit right at one end (the start or the end), scroll to the other end, and carefully hit the right mouse button at where you want the selection to end. The selection is instantly extended. You can do this several times if you want, extending the selection each time. Odder and less easily controlled things happen if you hit the right mouse button somewhere inside the selection.

This doesn't work in Gnome Terminal. Instead what you have to do is start the selection with the left mouse button and while making it, drag the mouse cursor to the edge of the window (or outside the window). G-T will scroll things for you, extending the selection in the process. G-T's scrolling is sufficiently rapid that this is a reasonably convenient and intuitive process, arguably better than xterm's.

Urxvt gives you both options; you can extend the selection explicitly with the right mouse button or let urxvt scroll things for you in the same way as Gnome Terminal. The one drawback is that urxvt by default scrolls inconveniently slowly (and there doesn't seem to be any way to control this from what I can see in the manual). You can scroll with a mouse scrollwheel and it works reasonably well although a bit jumpily in my quick test.

(Xterm doesn't scroll at all if you drag the mouse out of the window while you make a selection.)

In a brief test, KDE's Konsole works the same way as Gnome Terminal. I suspect that this is going to be the common behavior of more or less all modern 'smart' terminal emulators because it makes the most sense and it's relatively discoverable (unlike the right mouse button in xterm).

XTermLargeSelections written at 01:55:04; Add Comment

2013-12-06

The three levels of read-only NFS mounts

It's sometimes useful to understand that there are three ways that an NFS mounted filesystem can be 'read-only'. Let's call them three levels:

  • You can mount the NFS filesystem read-only on the client. The client kernel will then enforce this, disallowing write actions and so on. These days this is generally mostly handled in high level VFS code, since it's common behavior across filesystems.

    As with all remote filesystems, this read-only status is purely local to your client machine. Your machine doesn't get to order the NFS server not to make any changes on the filesystem (that would be laughable) so the NFS server is perfectly entitled to allow the filesystem to change underneath you and to have other clients mount it read-write (and write to it). If NFS is working right, you will see those changes at some point.

  • The server can export the NFS filesystem read-only (either to you or just in general). The NFS server code will then disallow all write actions that clients send it, returning an appropriate 'read only filesystem' error to errant clients (if any). Even if the NFS mount is exported read-only to all clients, it's still valid for the exported filesystem to be changed locally on the NFS server.

    (As far as I know, whether or not the NFS export is read-only is invisible to the client. It's purely something internal to the server and can even change on the fly.)

  • On the server you can mount the exported filesystem read-only (or otherwise set it that way). On competent NFS servers this disallows all writes to the filesystem, regardless of whether they're NFS or local and regardless of whether the filesystem was exported read-only by the NFS server.

    (On competent NFS servers, all NFS server operations on the exported filesystem go through the VFS et al and so have the standard handling of read-only mounts applied to them automatically.)

These can certainly be stacked on top of each other (a read-only server filesystem, NFS exported as read-only and mounted as read-only on clients) but they don't have to be. For instance you can NFS export filesystems as read-only but mount them read-write on clients (we do this here for complex reasons).

Now let's talk about atime and atime updates. In NFS, atime updates are the responsibility of the server, not the clients. More specifically they are generally the responsibility of the underlying server filesystem code or VFS, not specifically the NFS server code, and as such they can happen when you read data through a read-only NFS mount or even a read-only NFS export. The NFS clients asks to read data, the NFS server code makes a general VFS 'get me data' call, and as a side effect of this the VFS or the filesystem updates the atime (if atime updates are enabled at all).

(This implies that not all client reads necessarily update the server atime, because a client may satisfy a read from its own file cache instead of going to the server.)

If you think about it this is actually a feature. If you have atime enabled on a read-write filesystem mount, you have told the (server) kernel that you want to know when people read data from the filesystem and lo, this is exactly what you are getting. The read-only NFS export is just to tell the NFS server that it should not allow people to do 'write' VFS operations.

(Since you can export the same filesystem read-write to some clients and read-only to others, suppressing atime updates on read-only NFS exports could also produce odd effects. Read a file from client A and the atime updates, read the file from client B and it doesn't. And all because you didn't trust client B enough to let it actually make (filesystem level) changes to your valuable filesystem.)

Sidebar: NFS exporting of read-only filesystems

You might think that the NFS export process should notice when it's exporting a read-only filesystem as theoretically read-write and silently change it to read-only for you. One of the problems with this is that on many systems it's possible to switch filesystems back and forth between read-only and read-write status through various mechanisms (not just mount). In practice you might as well let the NFS server accept the write operations and have the VFS then reject them; the outcome is the same while the system is simpler and behaves better in the face of various things happening.

NFSReadonlyLevels written at 03:02:09; Add Comment

2013-12-03

The three faces of sudo

For reasons beyond the scope of this entry I've recently been thinking about my attitudes towards sudo. Sudo is a complex program with a lot of options and several different ways of using it, and in the process of my thinking I've realized that for me it's effectively three things in one (and I feel differently about each facet). So here are my three faces of sudo:

  1. sudo as a replacement for having specific setuid programs. You're using it to give (passwordless) access to something for some group of people (or everyone); instead of writing a setuid program you use sudo to run a non-setuid program or script with the necessary privileges. Often you may want to wrap the sudo invocation up in a cover script so you can tell people 'just run /some/script'.

  2. sudo as a way of giving non-sysadmin staff limited and guarded access to especially privileged and dangerous operations. This is the traditional 'operators are allowed to run reboot' situation, which I'll summarize as 'restricted root powers'. Here the people using sudo are not full sysadmins and are not trusted to hold unrestricted root privileges.

  3. sudo as the way you access unrestricted root privileges, where use of sudo replaces su. You're encouraged to use sudo to run specific commands (even a bunch of commands) instead of using it to just get a root shell and then doing stuff from there.

    (In practice, use of sudo this way temporarily turns your current shell session into a peculiar privileged hybrid environment where you can use root powers casually by prefixing a command with sudo.)

I think that there are lots of uses for sudo as a replacement for setuid programs. Setuid programs are hard to write securely and can only be written in a few languages. Using sudo lets you more or less safely write 'setuid' programs in, say, shell scripts or Perl or the like. Invocation of them is a bit funny (you have to say 'sudo <program>') but that can be hidden by a cover script. We use this here for a number of things (eg) and it works great.

I'm less sanguine about sudo as a way to give out restricted root powers, especially if you let people run ordinary programs instead of only custom-designed scripts. Unless you're very careful it's easy to accidentally give people a way out of your restricted jail, since programs are generally not designed to enforce a restricted environment and contain all sorts of odd holes. For instance, if you allow people to run 'vi /some/file' as root you've just given them full root access if they want it. The whole area is a massive minefield if you're faced with an attacker.

(This doesn't require your operators to be malicious. Unfortunately you've turned compromising an operator account into a great path towards root access.)

My feelings about sudo as a replacement for su are sufficiently complicated that they don't fit in this entry. The short version is that I think you're likely to be creating a different security model with different risks; how different they are depends on how you configure sudo. The more you make the risks of sudo match the risks of su, the more you turn sudo into su.

SudoThreeFaces written at 00:52:58; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.