Wandering Thoughts

2017-10-10

An interesting way to leak memory with Go slices

Today I was watching Prashant Varanasi's Go release party talk Analyzing production using Flamegraphs, and starting at around 28 minutes in the talk he covered an interesting and tricky memory leak involving slices. For my own edification, I'm going to write down a version of the memory leak here and describe why it happens.

To start with, the rule of memory leaks in garbage collected languages like Go is that you leak memory by retaining unexpected references to things. The garbage collector will find and free things for you, but only if they're actually unused. If you're retaining a reference to them, they stick around. Sometimes this is ultimately straightforward (perhaps you're deliberately holding on to a small structure but not realizing that it has a reference to a bigger one), but sometimes the retention is hiding in the runtime implementation of something. Which brings us around to slices.

Simplified, the code Prashant was dealing with was maintaining a collection of currently used items in a slice. When an item stopped being used, it was rotated to the end of the slice and then the slice was shrunk by truncating it (maintaining the invariant that the slice only had used items in it). However, shrinking a slice doesn't shrink its backing array; in Go terms, it reduces the length of a slice but not its capacity. With the underlying backing array untouched, that array retained a reference to the theoretically discarded item and all other objects that the item referenced. With a reference retained, even one invisible to the code, the Go garbage collector correctly saw the item as still used. This resulted in a memory leak, where items that the code thought had been discarded weren't actually freed up.

Now that I've looked at the Go runtime and compiler code and thought about the issue a bit, I've come to the obvious realization that this is a generic issue with any slice truncation. Go never attempts to shrink the backing array of a slice, and it's probably impossible to do so in general since a backing array can be shared by multiple slices or otherwise referenced. This obviously strongly affects slices that refer to things that contain pointers, but it may also matter for slices of plain old data things, especially if they're comparatively big (perhaps you have a slice of Points, with three floats per point).

For slices containing pointers or structures with pointers, the obvious fix (which is the fix adopted in Uber's code) is to nil out the trailing pointer(s) before you truncate the slice. This retains the backing array at full size but discards references to other memory, and it's the other memory where things really leak.

For slices where the actual backing array may consume substantial memory, I can think of two things to do here, one specific and one generic. The specific thing is to detect the case of 'truncation to zero size' in your code and specifically null out the slice itself, instead of just truncating it with a standard slice truncation. The generic thing is to explicitly force a slice copy instead of a mere truncation (as covered in my entry on slice mutability). The drawback here is that you're forcing a copy, which might be much more expensive. You could optimize this by only forcing a copy in situations where the slice capacity is well beyond the new slice's length.

Sidebar: Three-index slice truncation considered dangerous (to garbage collection)

Go slice expressions allow a rarely used third index to set the capacity of the new slice in addition to the starting and ending points. You might thus be tempted to use this form to limit the slice as a way of avoiding this garbage collection issue:

slc = slc[:newlen:newlen]

Unfortunately this doesn't do what you want it to and is actively counterproductive. Setting the new slice's capacity doesn't change the underlying backing array in any way or cause Go to allocate a new one, but it does mean that you lose access to information about the array's size (which would otherwise be accessible through the slice's capacity). The one effect it does have is forcing a subsequent append() to reallocate a new backing array.

GoSlicesMemoryLeak written at 00:26:09; Add Comment

2017-10-09

JavaScript as the extension language of the future

One of the things I've been noticing as I vaguely and casually keep up with technology news is that JavaScript seems to be showing up more and more as an extension language, especially in heavy-usage environments. The most recent example is Cloudflare Workers, but there are plenty of other places that support it, such as AWS Lambda. One of the reasons for picking JavaScript here is adequately summarized by Cloudflare:

After looking at many possibilities, we settled on the most ubiquitous language on the web today: JavaScript.

JavaScript is ubiquitous not just in the browser but beyond it. Node.js is popular on the server, Google's V8 JavaScript engine is apparently reasonably easy to embed into other programs, and then there's at least Electron as an environment to build client-side applications on (and if you build your application in JavaScript, you might as well allow people to write plugins in JavaScript). But ubiquity isn't JavaScript's only virtue here; another is that it's generally pretty fast and a lot of people are putting a lot of money into keeping it that way and speeding it up.

(LuaJIT might be pretty fast as well, but Lua(JIT) lacks the ecology of JavaScript, such as NPM, and apparently there are community concerns.)

This momentum in JavaScript's favour seems pretty strong to me as an outside observer, especially since its use in browsers insures an ongoing supply of people who know how to write JavaScript (and who probably would prefer not to have to learn another language). JavaScript likely isn't the simplest option as an extension language (either to embed or to use), but if you want a powerful, fast language and can deal with embedding V8, you get a lot from using it. There are also alternatives to V8, although I don't know if any of them are particularly small or simple.

(The Gnome people have Gjs, for example, which is now being used as an implementation language for various important Gnome components. As part of that, you write Gnome Shell plugins using JavaScript.)

Will JavaScript start becoming common in smaller scale situations, where today you'd use Lua or Python or the like? Certainly the people who have to write things in the extension language would probably prefer it; for many of them, it's one fewer language to learn. The people maintaining the programs might not want to embed V8 or a similar full-powered engine, but there are probably lighter weight alternatives (there's at least one for Go, for example). These may not support full modern JavaScript, though, which may irritate the users of them (who now have to keep track of who supports what theoretically modern feature).

PS: Another use of JavaScript as an 'extension' language is various NoSQL databases that are sometimes queried by sending them JavaScript code to run instead of SQL statements. That databases are picking JavaScript for this suggests that more and more it's being seen as a kind of universal default language. If you don't have a strong reason to pick another language, go with JavaScript as the default. This is at least convenient for users, and so we may yet get a standard by something between default and acclamation.

JavaScriptExtensionLanguage written at 01:46:27; Add Comment

2017-09-25

What I use printf for when hacking changes into programs

I tweeted:

Once again I've managed to hack a change into a program through brute force, guesswork, determined grepping, & printf. They'll get you far.

First off, when I say that I hacked a change in, I don't mean that I carefully analyzed the program and figured out the correct and elegant place to change the program's behavior to what I wanted. I mean that I found a spot where I could add a couple of lines of code that reset some variables and when the dust settled, it worked. I didn't carefully research my new values for those variables; instead I mostly experimented until things worked. That's why I described this as being done with brute force and guesswork.

The one thing in my list that may stand out is printf (hopefully the uses of grep are pretty obvious; you've got to find things in the source code somehow, since you're certainly not reading it systematically). When I'm hacking a program up like this, the concise way to describe what I'm using printf for is I use printf to peer into the dynamic behavior of the program.

In theory how the program behaves at runtime is something that you can deduce from understanding the source code, the APIs that it uses, and so on. This sort of understanding is vital if you're actively working on the program and you want relatively clean changes, but it takes time (and you have to actually know the programming languages involved, and so on). When I'm hacking changes into a program, I may not even know the language and I certainly don't have the time and energy to carefully read the code, learn the APIs, and so on. Since I'm not going to deduce this from the source code, I take the brute force approach of modifying the program so that it just tells me things like whether some things are called and what values variables have. In other words, I shove printf calls into various places and see what they report.

I could do the same thing with a debugger, but generally I find printf-based debugging easier and often I'd have to figure out how to hook up a debugger to the program and then make everything work right. For that matter, I may not have handy a debugger that works well with whatever language the program happens to use. Installing and learning a new debugger just to avoid adding some printfs (or the equivalent) is rather a lot of yak shaving.

PrintfToSeeDynamicBehavior written at 00:14:53; Add Comment

2017-09-24

Reading code and seeing what you're biased to see, illustrated

Recently I was reading some C code in systemd, one of the Linux init systems. This code is run in late-stage system shutdown and is responsible for terminating any remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   [...]
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);
   [...]
}

At this point it's important to note that the killall() function manually scans through all remaining processes. Also, this code is called to send either SIGTERM (plus SIGHUP) or SIGKILL to all or almost all processes.

The use of SIGSTOP and SIGCONT here are a bit unusual, since you don't need to SIGSTOP processes before you kill them (or send them signals in general). When I read this code, what I saw in their use was an ingenious way of avoiding any 'thundering herd' problems when processes started being signalled and dying, so I wrote it up in yesterday's entry. I saw this, I think, partly because I've had experience with thundering herd wakeups in response to processes dying and partly because in our situation, the remaining processes are stalled.

Then in comments on that entry, Davin noted that SIGSTOPing everything first did also did something else:

So, I would think it's more likely that the STOP/CONT pair are designed to create a stable process tree which can then be walked to build up a list of processes which actually need to be killed. By STOPping all other processes you prevent them from forking or worse, dieing and the process ID being re-used.

If you're manually scanning the process list in order to kill almost everything there, you definitely don't want to miss some processes because they appeared during your scan. Freezing all of the remaining processes so they can't do inconvenient things like fork() thus makes a lot of sense. In fact, it's quite possible that this is the actual reason for the SIGSTOP and SIGCONT code, and that the systemd people consider avoiding any thundering herd problems to be just a side bonus.

When I read the code, I completely missed this use. I knew all of the pieces necessary to see it, but it just didn't occur to me. It took Davin's comment to shift my viewpoint, and I find that sort of fascinating; it's one thing to know intellectually that you can have a too-narrow viewpoint and miss things when reading code, but another thing to experience it.

(I've had the experience where I read code incorrectly, but in this case I was reading the code correctly but missed some of the consequences and their relevance.)

CodeReadingNarrowness written at 00:53:22; Add Comment

2017-08-31

With git, it's useful to pick the right approach to your problem

One of the things about Git is that once you go past basic committing, it generally has any number of ways for you to do what you want done. As with programming languages, part of getting better at Git is learning to pick the right idiom to attack your problem with. I can't claim that I'm good at this, but I am getting more experience, and recently I had an interesting experience here.

I use Byron Rakitzis' version of rc as my shell, which these days can be found on Github. Well, as has come up before, I don't actually use (just) that official version. I have my own change (to add a built-in read command) and because it was there, I also use some completion improvements, which come from Bert Münnich's collection of interesting rc modifications.

(I originally tried out Bert Münnich's version to get an important improvement before it became an official change. This change also shows the awesome power of raising an issue even if you expect that it's hopeless, as well as exploring Github forks of a project you're interested in.)

Recently some improvements have landed in the main repo that Bert Münnich has not yet rebased his modifications on top of. The other day I decided that I should update my own version to pick up these changes, because it turned out that I wanted to rebuild it on Fedora 26 anyway (that's its own story). The obvious way to do this was a straight rebase on top of the main repo, so that's what I did first.

The end result worked, but getting there took a bunch of effort. Bert Münnich's modifications include things like changing the build system from GNU Autoconf to a simple Makefile based one, so there were a bunch of clashes and situations that I had to resolve by hand (and I wasn't entirely confident of my own changes, since I was modifying Münnich's modifications without fully understanding either them or the upstream changes they clashed with). It felt like I was both working too hard and creating a fragile edifice for myself, so at the end I took a step back and asked what I really wanted and if there was a simpler, better way to get it.

When I did this I realized that what I really wanted was the upstream with my addition plus only a few of Bert Münnich's modifications (I've become addicted to command completion). While I could have created this with more rebasing, there was a much simpler approach (partly enabled by a better understanding of Git remotes and so on):

  1. Create a new clone of the main repo.
  2. Add Münnich's repo and my previous everything-together local repo as additional remotes.
  3. git fetch from both new remotes in order to make all of their commits available locally.

  4. git cherry-pick my addition by its commit hash.

  5. git cherry-pick the modifications I wanted from Münnich's repo, which only amounted to a few of them. Again I did this commit by commit using the commit hash, rather than trying to do anything more sophisticated. One or two cherry-picks required minor adjustment; since I'd already had to deal with them during the rebase, it was easy to fix things up.

While having done the rebase helped me deal with the conflicts during cherry-picking, the cherry-picking still felt much easier. I could have arrived at the same place with an interactive rebase (which would have let me drop modifications I'd decided I either didn't want or didn't care about), but I think it would have felt more indirect and chancy. Cherry-picking more directly expressed my intentions; I wanted my change and then this, this, and this from another tree. Done.

(In both cases, the git repo I wind up with probably can't be used for further rebases against Münnich's repo, just for rebases against the main repo.)

Stepping back, thinking about what I wanted, and then finding the right mechanism to express this in Git worked out very well. When I switched from rebasing to cherry-picking, I went from feeling that I was fighting git to get what I wanted to feeling that I was directly and naturally expressing myself and git was doing just what I wanted. Of course the real trick here is having the Git knowledge and experience to realize what the good way is; had I not had some experience, I might not have been familiar enough with cherry-picking to reach for it here. And there are undoubtedly Git manipulations that I don't even know exist, so I'll never pick them as the right option.

(As a side note, this isn't really the copy versus move situation that I ran into before. Instead it's much more that I'm gluing together a completely new branch that happens to be made with bits and pieces from some other branches (and after I'm done the other branches aren't of direct interest to me).)

GitPickingRightApproach written at 22:25:52; Add Comment

2017-08-27

Is bootstrapping Go from source faster using Go 1.9 or Go 1.8?

A while back I wrote about building the Go compiler from source and then where the time goes when bootstrapping Go with various Go versions. In both of these, I came to the conclusion that bootstrapping with Go 1.8.x. Now that Go 1.9 is out, an interesting question is whether using Go 1.9 as your bootstrap Go compiler makes things appreciably faster. One particular reason to wonder about this is that the Go 1.9 announcement specifically mentions that Go 1.9 compiles functions in a package in parallel. Building Go from source builds the standard library, which involves many packages with a lot of functions.

I will cut to the chase: Go 1.9 is slightly faster for this, but not enough that you should rush out and change your build setup. It also appears that the speed increases are higher on heavily multi-CPU machines, such as multi-socket servers, and lower on basic machines like old four-core desktops. In the best case the improvements appear to be on the order of a couple of seconds out of a process that takes 30 seconds or so (even on that old basic machine).

(In the process of trying to test this, I discovered that the Go build process appears to only partially respect GOMAXPROCS. I believe that some things more or less directly look up how many CPU cores your system has in order to decide how many commands to run in parallel.)

This is a little bit disappointing, but we can't have everything and Go already builds very fast. It's also possible that the standard library is a bad codebase to show off this function-level parallelism. I could speculation about possible reasons why but it would be speculation, so for once I'm going to skip it.

(You get a small entry for today because I'm a bit tired right now. Plus, even negative results are results and thus are at least potentially worth reporting.)

GoBuildTime1.8Vs1.9 written at 00:48:09; Add Comment

2017-08-23

Why Go changes its minimum version requirements for OSes (and hardware)

Yesterday I wrote about the unexpected (to me) risk of Go ending support for older OS versions. This raises the obvious question of why Go does this and the related question of what is affected by the increased version requirements. I'm not going to try to give exhaustive answers here; instead, I have some examples.

There are several reasons why Go changes its requirements, depending on what is moving forward. To start with CPU architecture requirements, Go 1.8 said about its ARM support:

Go 1.8 will be the last release to support Linux on ARMv5E and ARMv6 processors: Go 1.9 will likely require the ARMv6K (as found in the Raspberry Pi 1) or later. [...]

Requiring more recent processors lets Go use instructions and other CPU features that first became available on those processors, which simplifies and improves code generation and so on. In a casual scan of the Go commit log, I can't spot a specific change that requires ARMv6K or later, but this issue comes close and is illustrative of the sort of things that come up.

Next we have the case of OpenBSD in Go 1.9's draft release notes:

Go 1.9 now enables PT_TLS generation for cgo binaries and thus requires OpenBSD 6.0 or newer. Go 1.9 no longer supports OpenBSD 5.9.

This is for thread local storage, per the OpenBSD ELF manpage, and PT_TLS is supported on other platforms. The code review, Github issue, and especially the commit show how removing the special case for OpenBSD not supporting PT_TLS simplified various bits of code.

Finally, moving a minimum OS version requirement forward lets the standard library use better system calls and so on that are only supported on more recent versions of the OS (or drop workarounds for things that are no longer needed). See, for example, this issue about (future) code cleanups when Go drops support for FreeBSD 9.x and below, or this commit fixing Go 1.9 on FreeBSD 9.3, where the problem is that FreeBSD 9.3 doesn't have a nicer, newer system call for pipe creation.

(While I'm accumulating links, there is are Go issues for defining the OS version deprecation policy and defining it for old architectures. The agreed on policy appears to be that people get one release's advance warning.)

For CPU architecture changes and changes like the OpenBSD case, the impact is generally going to be pervasive or at least unpredictable (Go may not generate PT_TLS ELF segments all of the time, but I have no idea when it will or won't). You should probably assume that all code generated by an updated compiler is affected and will probably not run in unsupported environments. For changes in the standard library, you might get away with an unsupported environment if you don't do anything that touches the changed area. However, this is not as simple as not using os.Pipe() (for example), because other things in the standard library (and other packages that you us) may use the changed bit. For os.Pipe() you'd need to avoid a number of things in os.exec and possibly other things elsewhere.

GoWhyOSRequirementsChange written at 23:35:49; Add Comment

An unexpected risk of using Go is it ending support for older OS versions

A few years ago I wrote The question of language longevity for new languages, where I used a concern about Go's likely longevity as the starting point to talk about this issue in general. The time since then has firmed up Go's position in general and I still quite like working in it, but recently a surprising and unexpected issue has cropped up here that is giving me some cause for thought. Namely, the degree to which Go will or won't continue to support older versions of OSes.

(By 'Go' here I mean the main Go implementation. Alternate implementations such as gccgo have their own, different set of supported OS versions and environments.)

Like most compilers, Go has a set of minimum version requirements for different OSes. It's actually fairly hard to find out what all of these are; the requirements for major binary release platforms can be found here, but requirements for platforms may only show up in, for example, the Go 1.8 release notes. Probably unsurprisingly, Go moves these minimum requirements forward every so often, usually by dropping support for OS versions that aren't officially supported any more. A couple of topical examples, from the draft Go 1.9 release notes, are that Go 1.9 will require OpenBSD 6.0 and will be the last Go release that supports FreeBSD versions before 10.3 (theoretically Go 1.8 supports versions as far back as FreeBSD 8).

(The best platform requirements documentation appears to be in the Go wiki, here and here, which was a bit hard to find.)

I'm sure that for many people, Go's deprecation of older and now unsupported OS versions is not a problem because they only ever have to deal with machines running OS versions that are still supported, even for OSes (such as OpenBSD) that have relatively short support periods. Perhaps unfortunately, I don't operate in such an environment; not for OpenBSD, and not for other OSes either. The reality is that around here there are any number of systems that don't change much (if at all) and just quietly drift out of support for one reason or another, systems that I want or may want to use Go programs on. This makes the degree to which Go will continue to support old systems somewhat of a concern for me.

On the other hand, you can certainly argue that this concern is overblown. Building Go from source and keeping multiple versions around is easy enough, and old binaries of my programs built with old Go versions are going to keep working on these old, unchanging systems. The real problems would come in if I wanted to do ongoing cross-platform development of some program and have it use features that are only in recent versions of Go or the standard library. Life gets somewhat more exciting if I use third party packages, because those packages (or the current versions of them) may depend on modern standard library things even if my own code doesn't.

(If my program doesn't use the very latest shiny things from the standard library, I can build it with the latest Go on Linux but an older Go on OpenBSD or FreeBSD or whatever.)

GoOSVersionsRisks written at 00:56:31; Add Comment

2017-08-13

Sorting out slice mutability in Go

I've recently been writing some Go code that heavily uses and mutates slices, mostly through append(), often slices passed as arguments to functions that then returned what were theoretically different, mutated slices. This left me uncertain that both I and my code were doing the right thing or if I was creating hidden bear traps for myself.

So let's start out with some completely artificial example code:

func test1(h string, hl []string) []string {
  return append(hl, h)
}

// Add h at position p, moving the current
// element there to the end.
func test2(h string, p int, hl []string) []string {
  t := append(hl, hl[p])
  t[p] = h
  return t
}

Suppose you call both of these functions with a []string slice that you want to use later in its current state, ie you want it to not be changed by either call. Is it the case that this will be true for either or both functions?

The answer turns out to be no. Both functions can mutate the original slice in visible ways. Yes, even test1() can have this effect, and here's a demonstration:

func main() {
  t := []string{"a", "b", "c",}
  t2 := test1("d", t)
  t3 := test1("fred", t2)
  _ = test1("barney", t2)
  fmt.Printf("t3[4] is %s\n", t3[4])
}

Although you would expect that t3[4] is "fred", because that's what we appended to t2 to create t3, it is actually "barney" (on the Go Playground, at least, and also on my 64-bit x86 Linux machine).

In Go, slices are a data structure built on top of arrays, as covered in Go Slice: usage and internals. When you generate a slice from an explicit array, the slice is backed by and uses that array. When you work purely with slices (including using append()), as we are here, the resulting slices are backed by anonymous arrays; these anonymous arrays are where the actual data involved in your slice is stored. These anonymous arrays may be shared between slices, and when you copy a slice (for example by calling a function that takes a slice as its argument), you do not create a new copy of the anonymous array that backs it.

Slices have a current length and a maximum capacity that they can grow to. If you call append() on a slice with no capacity left (where len(slice) is cap(slice)), append() has to create a new backing array for the slice and copy all the current elements over into it. However, if you call append() on a slice that has remaining capacity, append() simply uses a bit of the remaining capacity in the underlying backing array; you get a new slice from append(), but it is still using the same backing array. If this backing array is also used by another slice that you care about, problems can ensue.

With test2(), we have a relatively straightforward and obvious case. If append() doesn't have to create a new backing array, we'll mutate the existing one by changing the string at position p. Writing to an existing element of a slice is a clear warning sign here, and it's not too hard to look out for this in your code (and in functions that you call, such as sort.Strings).

With test1() things are more subtle. What is going on here is that when append() doesn't have to increase a slice's capacity, it ends up writing the new element to the original backing array. Our program arranges for t2's anonymous backing array to have spare capacity, so both the second and the third calls to test1() wind up writing to <anonymous-array>[4] and "fred" turns into "barney". This is alarming (at least to me), because I normally think of pure append() calls as being safe; this demonstrates that they are not.

To guard against this, you must always force the creation of a new backing array. The straightforward way to do this is:

func test1(h string, hl []string) []string {
  t := append([]string{}, hl...)
  return append(t, h)
}

(You can reduce this to a one-liner if you want to.)

A version that might be slightly more efficient would explicitly make() a new slice with an extra element's worth of capacity, then copy() the old slice to it, then finally append() or add the new value.

(Whether this version is actually more efficient depends on whether you're going to use plain append() to add even more elements to the new slice later on.)

All of this is a little bit counter-intuitive to me. I could work out what was going on and why it has to be this way, but until I started to really think about it, I thought that test1() was safe. And it sometimes is, which makes things tricky; if t2 had no extra capacity, t3 would have allocated a new backing array and everything would have been fine. When slices backed by anonymous arrays have extra capacity is an implementation detail and depends on both the exact numbers involved and the exact path of slice growth.

(The test1() case is also tricky because the mutation is not visible in the original t2 slice. In test2(), at least the original is clearly altered. In a test1() case, the two append()s to the slice might be quite separated in the code, and the damage is only visible if and when you look at the first new slice.)

PS: This implies that calling append() on the same slice in two different goroutines creates a potential data race, at least if you ever read the newly appended element.

GoSliceMutability written at 01:31:31; Add Comment

2017-07-07

Programming Bourne shell scripts is tricky, with dim corners

There have been a bunch of good comments on my entry about my views on Shellcheck, so I want to say just a bit too much to fit in a comment of my own. I'll mostly be talking about this script, where I thought the unquoted '$1' was harmless:

#!/bin/sh
echo Error: $1

Leah Neukirchen immediately pointed out something I had completely forgotten, which is that unquoted Bourne shell variable expansion also does globbing. The following two lines give you the same output:

echo *
G="*"; echo $G

This is surprising (to me), but after looking at things I'll admit that it's also useful. It gives you a straightforward way to dynamically construct glob patterns in your shell script and then expand them, without having to resort to hacks that may result in too much expansion or interpretation of special characters.

Then Vidar, the author of Shellcheck, left a comment with an interesting PS suggesting some things that my unquoted use of $1 was leaving me open to:

./testit '-e \x1b[33mTest' # Ascii input results in binary output (when sh is bash)
./testit '-n foo'          # -n disappears

This is a nice illustration of how tricky shell programming can be, because these probably don't happen but I can't say that they definitely don't happen in all Unix environments (and maybe no one can). As bugs, both of these rely on the shell splitting $1 into multiple arguments to the actual command and then echo interpreting the first word (now split into a separate argument) as a -n or -e option, changing its behavior. However, I deliberately wrote testit's use of echo so that this shouldn't happen, as $1 is only used after a non-argument option (the Error: portion).

With almost all commands in a traditional Unix, the first regular argument turns off all further option processing; everything after it will be considered an argument, no matter if it could be a valid option. Using an explicit '--' separator is only necessary if you want your first regular argument to be something that would otherwise be interpreted as an option. However, at least some modern commands on some Unixes have started accepting options anywhere on the command line, not just up to the first regular argument. If echo behaves this way, Vidar's examples do malfunction, with the -n and -e seen as actual options by echo. Having echo behave this way in your shell is probably not POSIX compatible, but am I totally sure that no Unix will ever do this? Of course not; Unixes have done some crazy shell-related things before.

Finally, Aristotle Pagaltzis mentioned, about his reflexive quoting of Bourne shell variables when he uses them:

I’m just too aware that uninvited control and meta characters happen and that word splitting is very complex semantically. [...]

This is very true, as I hope this entry helps illustrate. But for me at least there are three situations in my shell scripts. If I'm processing unrestricted input in a non-friendly environment, yes, absolutely, I had better put all variable usage in quotes for safety, because sooner or later something is going to go wrong. Generally I do and if I haven't, I'd actually like something to tell me about it (and Shellcheck would be producing a useful message here for such scripts).

(At the same time, truly safe defensive programming in the Bourne shell is surprisingly hard. Whitespace and glob characters are the easy case; newlines often cause much more heartburn, partly because of how other commands may react to them.)

If I'm writing a script for a friendly environment (for example, I'm the only person who'll probably run it) and it doesn't have to handle arbitrary input, well, I'm lazy. If the only proper way to run my script is with well-formed arguments that don't have whitespace in them, the only question is how the script is going to fail; is it going to give an explicit complaint, or is it just going to produce weird messages or errors? For instance, perhaps the only proper arguments to a script are the names of filesystems or login names, neither of which have whitespace or funny characters in them in our environment.

Finally, sometimes the code in my semi-casual script is running in a context where I know for sure that something doesn't have whitespace or other problem characters. The usual way for this to happen is for the value to come from a source that cannot (in our environment) contain such values. For a hypothetical example, consider shell code like this:

login=$(awk -F: '$3 == NNN {print $1}' /etc/passwd | sed 1q)
....
echo whatever $login whatever

This is never going to have problematic characters in $login (for a suitable value of 'never', since in theory our /etc/passwd could be terribly corrupted or there could be a RAM glitch, and yes, if I was going to (say) rm files as root based on this, $login would be quoted just in case).

This last issue points out one of the hard challenges of a Bourne shell linter that wants to only complain about probable or possible errors. To do a good job, you want to recognize as many of these 'cannot be an error' situations as possible, and that requires some fairly sophisticated understanding not just of shell scripting but of what output other commands can produce and how data flows through the script.

By the way, Shellcheck impressed me by doing some of this sort of analysis. For example, it doesn't complain about the following script:

#!/bin/sh
ADIR=/some/directory/path
#ADIR="$1"
if [ ! -d $ADIR ]; then
   echo does not exist: $ADIR
fi

(If you uncomment the line that sets ADIR from $1, Shellcheck does report problems.)

BourneShellTrickyAndDim written at 00:00:22; Add Comment

(Previous 10 or go back to July 2017 at 2017/07/06)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.