Wandering Thoughts

2018-12-07

Modern Bourne shell arithmetic is pretty pleasant

I started writing shell scripts sufficiently long ago that the only way you had to do arithmetic was to use expr. So that settled in my mind as how you had to do arithmetic in shell scripts, and since using expr is kind of painful (and it has an annoying, obscure misfeature), mostly I didn't. If I actually had to do arithmetic in a shell script I might reach for relatively heroic measures to avoid expr, like running numbers through awk. In one relatively recent occasion, I had enough calculations to do that I resorted to dc (in retrospect this was probably a mistake). However, lately I've been using shellcheck, and it's been nagging at me about my occasional use of expr, which had the effect of raising my awareness of the modern alternatives.

Today I needed to write a script that involved a decent amount of math, so I decided to finally actually use modern built-in Bourne shell arithmetic expressions for all of my calculations. The whole experience was pretty pleasant, everything worked, and the $((...)) syntax is a lot nicer and more readable than any of the other alternatives (including expr). Since $((...)) is part of the POSIX shell standard and so supported by basically anything I want to use, I'm going to both switch my habits to it and try to remember that I can use arithmetic in shell scripts now if I want to.

Since some of what I was doing in my shell script was division and percentages, I found it a little bit irksome that Bourne shell arithmetic is entirely integer arithmetic; it got in the way of writing some expressions in the natural way. For example, if you want N% of a number (where both the number and the percent are variables), you'd better not write it as:

$(( avar * (npercent/100) ))

That only works in floating point. Instead you need to restructure the expression to:

$(( (avar * npercent) / 100 ))

It's a little thing, but it was there. And since at least some Bourne shells truncate instead of round in this situation, I found myself carefully looking at every division I was doing to see how it was going to come out.

One thing I found both pleasant and interesting is how you don't write '$avar' in arithmetic expansion context, just 'avar'. Unlike almost everywhere else in Bourne shell syntax, here the Bourne shell treats a bare word as a variable instead of straight text. This is a completely sensible decision, because arithmetic expansion is a context where you're not going to use bare words. This context dependent shift is a pretty clever way to partially bridge the gulf between shells and scripting languages.

(For an example of how annoying it is to make the decision the other way, see my view of where TCL went wrong.)

PS: You might ask why it took me so long to switch. Partly it's habit, partly it's that I spent a long time writing shell scripts that sometimes had to run on Solaris machines (where /bin/sh is not a POSIX shell), and partly it's that I spent a long time thinking of arithmetic in the shell as a Bash-specific feature. It was only within the past few years that it sunk in that arithmetic expressions are actually a POSIX shell feature and so it's portable and widely supported, not a Bash-ism.

(It took me a similarly long amount of time to switch to $(...) instead of `...` for command substitution, even though the former is much superior to the latter.)

BournePleasantArithmetic written at 00:47:32; Add Comment

2018-11-29

Go 2 Generics: Interfaces are not the right model for type constraints

A significant and contentious part of the Go 2 draft generics design is its method of specifying constraints on the types that implementations of generic functions can be used with. There are excellent reasons for this; as an motivating example, I will quote the overview:

[...] In general an implementation may need to constrain the possible types that can be used. For example, we might want to define a Set(T), implemented as a list or map, in which case values of type T must be able to be compared for equality. [...]

The Go 2 generics proposal adopts a method called contracts, which are basically Go function bodies with somewhat restricted contents. From the beginning, one of the most proposed changes in generics is that type constraints should instead be represented through something like Go interfaces. I believe that this would be a mistake and that interfaces are fundamentally the wrong starting point for a model of type constraints.

First, let's observe that we can't really use stock interfaces as they stand as type constraints. This is because stock interfaces are too limited; they only let us say things about a single type and those things have fixed types (whether interfaces or concrete types). Pretty much every interface-based proposal that I've seen immediately expands generic-constraint interfaces to allow for multiple type variables that are substituted into the interface. Axel Wagner's proposal is typical:

type Graph(type Node, Edge) interface {
  Nodes(Edge) []Node
  Edges(Node) []Edge
}

However, I believe that this is still the wrong model. The fundamental issue is that interfaces are about methods, but many type constraints are about types.

An interface famously does not say things about the type itself; instead, it says things about the methods provided by the type. This provides implementations of interfaces with great freedom; in a well-known example, it's possible and even simple for functions to fulfill an interface. Providing interfaces with type variables and so on does not fundamentally change this. Instead it merely expands the range and flexibility of what one can say about methods.

However, a significant amount of what people want to do with generic functions is about the types themselves, and thus wants constraints on the types. The starting example of Set(T) is an obvious one. The implementation does not care about any methods on T; it simply wants to be able to compare T for equality. This is completely at odds with what interfaces want to talk about. Fundamentally, a significant number of generics want to operate on types themselves. There is not just Set(T), there are also often expressed desires like Max(T), Min(T), Contains(), Sort([]T), and so on.

A related issue is that interfaces are about a single type, while type constraints in generic functions are not infrequently going to be about constraints on the relationship between types. The Graph example from the overview is an example; it talks about two separate types, each of which is required to have a single method with a specific type signature:

contract Graph(n Node, e Edge) {
  var edges []Edge = n.Edges()
  var nodes []Node = e.Nodes()
}

In Axel Wagner's proposal, this is modified (as it has to be) into a single type that implements both methods. These two are not the same thing.

An example that combines both problems is the convertible contract from the draft design:

contract convertible(_ To, f From) {
  To(f)
}

This is expressing a constraint about the relationship between two types; From must be convertible into To. There is no single type and no methods in sight, and so expressing this in the interface model would require inventing both.

All of this is a sign that using interfaces to express type constraints is forcing a square peg into a round hole. It is not something that naturally fits the problem; it is simply something that Go already has. Interfaces would be a fine fit in a world where generics were about methods, but that is not the world that people really want; they want generics that go well beyond that. If Go 2 is to have generics, it should deliver that world and do so in a natural way that fits it.

Given my view that contracts are too clever in their current form, I'm not sure that contracts are right answer for type constraints for generics. But I'm convinced that starting from interfaces is definitely the wrong answer.

(This entry was sparked by a discussion with Axel Wagner on Reddit, where questions from him forced me to sit down and really think about this issue instead of relying on gut feelings and excessive hand waving.)

Sidebar: Interfaces and outside types

In their current form, interfaces are limited to applying to existing methods, for good reasons. But if type constraints must be expressed through methods, what do you do when you want to apply a type constraint to a type that does not already implement the method for that constraint? When it is your own package's type, you can add the method; however, when it is either a standard type or a type from another package, you can't.

I think it's clear that you should be able to apply generic functions to types you obtain from the outside world, for example from calling other package's functions or methods. You might reasonably wish to find the unique items from a bunch of slices that you've been given, for example, where the element type of these slices is not one your types but comes from another package or is a standard type such as int.

(It would be rather absurd to be unable to apply Max() to int values from elsewhere, and a significant departure from current Go for the language to magically add methods to int so that it could match type constraints in interfaces.)

Go2GenericsNotWithInterfaces written at 01:02:56; Add Comment

2018-11-28

Go 2 Generics: A way to make contracts more readable for people (if not programs)

In the Go 2 generics draft design, my only real issue with contracts is their readability; contracts today are too clever. Contracts have been carefully set up to cover the type constraints that people will want to express for generics, and I think that something like contracts is a better approach to type constraints for generics than something like interfaces (but that's another entry). Can we make contracts more readable without significantly changing the current Go generics proposal? I believe we can, through use of social convention and contract embedding.

For being read and written by people, the problem with contracts today is that they do not always clearly say what they mean. Instead they say it indirectly, through inferences and implicit type restrictions. To make contracts more readable, we need to provide a way to say tricky things directly. One obvious way to do this is with a standard set of embeddable contracts that would be provided in a package in the standard library. Let's call this package generic/require (because there probably will be a standard generic package of useful generic functions and so on).

The require package would contain contracts for standard and broadly useful properties of types, such as require.Orderable(), require.String(), and require.Unsigned(). When writing contract bodies it would be standard practice to use the require versions instead of writing things out by hand yourself. For example:

contract Something(t T) {
  require.Unsigned(t)
  // instead of: 1 << t
}

Similarly, you would use 'require.Integer(t)' instead of any of the various operations that implicitly restrict t to being an integer (signed or unsigned), and require.Numeric if you just needed a number type, and so on. Of course, contracts from the require package could be used on more than direct type parameters of your contract; they could be used for return values from methods, for example.

The obvious advantage of this for the reader is that you have directly expressed what you require in a clear way. For cases like require.Integer, where there are multiple ways of implementing the same restriction, you've also made it so that people don't have to wonder if there's some reason that you picked one over the other.

It's likely that not everything in contracts would be written using things from the require package. Some restrictions on types would be considered obvious (comparability is one possible case), and we would probably find that require contracts don't make other cases any easier to read or write. One obvious set of candidates for require contracts are cases where there are multiple ways of creating the same type restriction.

(In the process the Go developers might find that there were gaps in the type restrictions that you could express easily, as exposed by it being difficult or impossible to write a require contract for the restriction. For example, I don't think there's currently any easy way to say that a type must be a signed integer type.)

Use of the require package would be as optional as the use of Go's standard formatting style, and it would be encouraged in much the same way, by the tacit social push of tools. For instance, 'gofmt -s' could rewrite your contracts to use require things instead of hand-rolled alternatives, and golint or even vet could complain about 'you should use require.<X> instead of ...'. Faced with this push towards a standard and easy way of expressing type restrictions, I believe that most people would follow it, much as how gomft has been accepted and become essentially universal.

An open question is if this would add enough more readability to significantly improve contracts as they stand. I'm not sure that a number of the trickier restrictions in the full draft design could be implemented in this sort of reusable contract restriction in a way that both is usable and is amenable to a good, clear name. People may also disagree over the relative readability of, say:

contract stringer(x T) {
   var s string = x.String()
   // equivalent:
   require.String(x.String())
   // or perhaps:
   require.Returns(x.String, string)
}

At a certain point it feels like one would basically be spelling out phrases and sentences in the form of require package contracts. I'm not sure that's a real improvement, or if a more fundamental change to how contracts specify requirements is needed for true readability improvements.

Sidebar: Augmenting the language through the back door

The require package would also provide a place to insert special handling of type restrictions that can't be expressed in normal Go, in much the way that the unsafe package is built in to the compiler (along with various other things). People might find this too awkward for some sorts of restrictions, though, since using require has to at least look like normal Go and even then, certain sorts of potentially implementable things might be too magical.

Go2ContractsMoreReadable written at 02:01:13; Add Comment

2018-11-16

Go 2 Generics: Contracts are too clever

The biggest and most contentious thing in the Go 2 Draft Designs is its proposal for generics. Part of the proposal is a way to specify things about the types that generic functions can be used on, in the form of contracts. Let me quote from the overview:

[...] In general an implementation may need to constrain the possible types that can be used. For example, we might want to define a Set(T), implemented as a list or map, in which case values of type T must be able to be compared for equality. To express that, the draft design introduces the idea of a named contract. A contract is like a function body illustrating the operations the type must support. For example, to declare that values of type T must be comparable:

contract Equal(t T) {
    t == t
}

This is a clever trick (among other things it basically avoids adding any new syntax to Go to define contracts), but my view is that it is in fact too clever. Using function bodies with operations in them to define the valid types is essentially prioritizing minimal additions to the language and the compiler over everything else. The result is easy for the compiler to use (it just runs types through contract bodies in a regular typechecking process) but provides generally bad to terrible ergonomics for everything else.

In practice, contracts will be consumed and used by far more than the compiler. People will read contracts to understand error messages about 'your doesn't satisfy this contract', to understand what they need to do to create types that can be used with generic functions they're interested in, to see how to specify something for their own generic functions, to try to understand mistakes that they made in their own contracts, and to understand what a contract actually requires (as opposed to what the code comments claim it requires, because we all know about what happens to code comments eventually). IDE systems will read contracts so they can figure out what types can be suggested as type parameters for generic functions. Code analysis will read contracts for all sorts of reasons, including spotting unnecessarily requirements that the implementation doesn't actually need.

All of these parties and all of these purposes are badly served by contracts which require you to understand the language and its subtle implications at a deep level. For instance, take this contract:

contract Example(t T) {
    t.Fetch()[:5]
    t.AThing().Len()
}

The implications of both lines are reasonably subtle; the first featured in a quiz by Dave Cheney, and the second requires certain sorts of return values and what the .Len() method is on, as covered as part of addressable values in Go. I would maintain that neither are obvious to a person (certainly they aren't to me without careful research), and they might or might not be easily understood by code (certainly they're the kind of corner cases that code could easily be wrong on).

Part of the bad ergonomics of contracts here is that they look simple and straightforward while hiding subtle traps in the fine details. I've illustrated one version of that here, and the detailed draft design actually shows several other examples. I could go on with more examples (consider the wildly assorted type requirements of various arithmetic operators, for example).

Go contracts do not seem designed to be read by anything but the compiler. This is a mistake; computer languages are in large part about communicating with other people, including your future self, not with the computer. If we're in doubt, we should bias language design toward being clearly and easily read by people, not toward the compiler. Go generally has this bias, preferring clean communication over excessive cleverness. The current version of contracts seem quite at odds with this.

(This is especially striking because other parts of the Go 2 Draft Designs are about making Go clearer for people. The error handling proposal, for example, is all about making Go code read better by hiding repetitive patterns.)

PS: The Go team may have the view that only a few people will ever create generics and thus contracts, and mostly everyone else will read the excellent documentation that these very few people have carefully written. I think that the Go team is quite wrong here; I expect to see generics sprout like mushrooms across Go codebases, at least for the first few years. In general I would argue that if generics are successful, we can expect to see them widely used, which means that many generics and contracts will not be written or read by people who are deep experts with Go. A necessary consequence of this wide use will be that some amount of it will be with contracts and code created partly through superstition in various forms.

Go2ContractsTooClever written at 01:04:00; Add Comment

2018-10-15

Garbage collection and the underappreciated power of good enough

Today I read The success of Go heralds that of Rust (via). In it the author argues that Rust is poised to take over from Go because Rust is a more powerful language environment. As one of their arguments against Go, the author says:

All the remaining issues with Go stem from three design choices:

  • It’s garbage collected, rather than having compile time defined lifetimes for all its resources. This harms performance, removes useful concepts (like move semantics and destructors) and makes compile-time error checking less powerful.

[...]

Fix those things, and Go essentially becomes the language of the future. [...]

I think the author is going to be disappointed about what happens next, because in my opinion they've missed something important about how people use programming languages and what matters to people.

To put it simply, good garbage collection is good enough in practice for almost everyone and it's a lot easier than the fully correct way of carefully implementing lifetimes for all resources. And Go has good garbage collection, because the Go people considered it a critical issue and spent a lot of resources on it. This means that Go's garbage collection is not an issue in practice for most people, regardless of what the author thinks, and thus that Rust's compile time defined lifetimes are not a compelling advantage for Rust that will draw many current (or future) Go users from Go to Rust.

There are people who need the guarantees and minimal memory usage that lifetimes for memory gives you, and there are programs that blow up under any particular garbage collection scheme. But most people do not have such needs or such programs. Most programs are perfectly fine with using some more memory than they strictly need and spending some more CPU time on garbage collection, and most people are fine with this because they have other priorities than making their programs run as fast and with as minimal resource use as possible.

(These people are not being 'lazy'. They are optimizing what matters to them in their particular environment, which may well be things like speed to deployment.)

The underappreciated power of 'good enough' is that good enough is sufficient for most people and most programs; really, it's almost always sufficient. People don't always stop at good enough, but they often do, especially when achieving better than good enough is significantly harder. This is not a new thing and we have seen this over and over again in programming languages; just look at how people have flocked to Perl, Python, PHP, Ruby, and JavaScript, despite there being more powerful alternatives.

(Good enough changes over time and over scale. What is good enough at small scale is not good enough at larger scale, but many times you never need to handle that larger scale.)

What matters to most people most of the time is not perfection, it is good enough. They may go beyond that, but you should not count on it, and especially you should not count on perfection pulling lots of people away from good enough. We have plenty of proof that it doesn't.

GarbageCollectionGoodEnough written at 00:06:10; Add Comment

2018-10-06

A deep dive into the OS memory use of a simple Go program

One of the enduring mysteries of actually using Go programs is understanding how much OS-level memory they use, as opposed to the various Go-level memory metrics exposed by runtime.MemStats. OS level memory use matters because it influences things like how much real memory your program needs and how likely it is to be killed by the OS in a low-memory situation, but there has always been a disconnect between OS level information and Go level information. After researching enough to write about how Go doesn't free heap memory back to the OS, I got sufficiently curious to really dig down into the details of a very simple program and now I'm going to go through them. All of this is for Go 1.11; other Go versions have had different behavior.

Our very simple program is going to do nothing except sit there so that we can examine its memory use:

package main
func main() {
    var i uint64
    for {
        i++
    }
}

(It turns out that we could use time.Sleep() to pause without dragging in extra complications, because it's actually handled directly in the runtime, despite it nominally being in the time package.)

This simple looking program already has a complicated runtime environment, with several system goroutines operating behind the scene. It also has more memory use than you probably expect. Here's what its memory map looks like on my 64-bit Linux machine:

0000000000400000    316K r-x-- memdemo
000000000044f000    432K r---- memdemo
00000000004bb000     12K rw--- memdemo
00000000004be000    124K rw---   [ bss ]
000000c000000000  65536K rw---   [ anon ]
00007efdfc10c000  35264K rw---   [ anon ]
00007ffc088f1000    136K rw---   [ stack ]
00007ffc08933000     12K r----   [ vvar ]
00007ffc08936000      8K r-x--   [ vdso ]
ffffffffff600000      4K r-x--   [ vsyscall ]
 total           101844K

The vvar, vdso, and vsyscall mappings come from the Linux kernel; the '[ stack ]' mapping is the standard process stack created by the Linux kernel, and the first four mappings are all from the program itself (the actual compiled machine code, the read-only data, plain data, and then the zero'd data respectively). Go itself has allocated the two '[ anon ]' mappings in the middle, which are most of the program's memory use; we have one 64 MB mapping at 0x00c000000000 and one 34.4 MB mapping at 0x7efdfc10c000.

(The addresses for some of these mappings will vary from run to run.)

As described in Allocator Wrestling (see also, and), Go allocates heap memory (including the memory for goroutine stacks) in chunks of memory called spans that come from arenas. Arenas are 64 MB in size and are allocated at fixed locations; on 64-bit Linux, they start at 0x00c000000000. So this is our 64 MB mapping; it is the program's first arena, the only one necessary, which handles all normal Go memory allocation.

If we run our program under strace -e trace=%memory, we'll discover that the remaining mysterious mapping actually comes from a number of separate mmap() calls that the Linux kernel has merged together into one memory area. Here is the trace for our program:

mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfe33c000
mmap(0xc000000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
mmap(0xc000000000, 67108864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc000000000
mmap(NULL, 33554432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc33c000
mmap(NULL, 2162688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc12c000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc11c000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc10c000

So we have, in order, a 256 KB allocation, the 64 MB arena allocated at its fixed address, a 32 MB allocation, a slightly over 2 MB allocation, and two 64 KB allocations. Everything except the arena allocation is allocated at successively lower addresses next to each other and gets merged together into the single mapping starting at 0x7efdfc10c000. All of these allocations are internal allocations from the Go runtime, and I'm going to run down them in order.

The initial 256 KB allocation is for the first chunk of the Go runtime's area for persistent allocations. These are runtime things that will never be freed up and which can be (and are) allocated outside of the regular heap arenas. Various things are allocated in persistent allocations, and the persistent allocator mostly works in 256 KB chunks that it gets from the OS. Our first mmap() is thus the runtime starting to allocate from this area, which causes the allocator to get its first chunk from the OS. The memory for these persistent allocator chunks is mostly recorded in runtime.MemStats.OtherSys, although it's not the only thing that falls into that category and some persistent allocations are in different categories.

The 32 MB allocation immediately after our first arena is for the heap allocator's "L2" arena map. As the comments in runtime/malloc.go note, most 64-bit architectures (including Linux) have only a single large L2 arena map, which has to be allocated when the first arena is allocated. The next allocation, which is 2112 KB or 2 MB plus 64 KB, turns out to be for the heapArena structure for our newly allocated arena. It has two fields; the .bitmap field is 2 MB in size, and the .spans field is 64 KB (in 8192 8-byte pointers). This explains the odd size requested.

(If I'm reading the code correctly, the L2 arena map isn't accounted for in any runtime.MemStats value; this may be a bug. The heapArena structure is accounted for in runtime.MemStats.GcSys.)

The final two 64 KB allocations are for the initial version of a data structure used to keep track of all spans (set up in recordspan()) and the allocation for a data structure (gcBits) that is used in garbage collection (set up in newArenaMayUnlock()). The span tracking structure is accounted for in runtime.MemStats.OtherSys, while the gcBits stuff is in runtime.MemStats.GcSys.

As your program uses more memory, I believe that in general you can expect more arenas to be allocated from the OS, and with each arena you'll also get another arenaHeap structure. I believe that the L2 arena map is only allocated once on 64-bit Unix. You will probably periodically have larger span data structures and more gcBits structures allocated, and you will definitely periodically have new 256 KB chunks allocated for persistent allocations.

(There are probably other sources of allocations from the OS in the Go runtime. Interested parties can search through the source code for calls to sysAlloc(), persistentalloc(), and so on. In the end everything apart from arenas comes from sysAlloc(), but there are often layers of indirection.)

PS: If you want to track down this sort of thing yourself, the easiest way to do it is to run your test program under gdb, set a breakpoint on runtime.sysAlloc, and then use where every time the breakpoint is hit. On most Unixes, this is the only low level runtime function that allocates floating anonymous memory with mmap(); you can see this in, for example, the Linux version of low level memory allocation.

GoProgramMemoryUse written at 21:42:04; Add Comment

Go basically never frees heap memory back to the operating system

Over on Reddit's r/golang, I ran into an interesting question about Go's memory use as part of this general memory question:

[...] However Go is not immediately freeing the memory, at least from htop's perspective.

What can I do to A) gain insight on when this memory will be made available to the OS, [...]

The usual question about memory usage in Go programs is when things will be garbage collected (which can be tricky). However, this person wants to know when Go will return free memory back to the operating system. This is a good question partly because programs often don't do very much of this (or really we should say the versions of malloc() that programs use don't do this), for various reasons. Somewhat to my surprise, it turns out that Go basically never returns memory address space to the OS, as of Go 1.11. In htop, you can expect normal Go programs to only ever be constant sized or grow, never to shrink.

(The qualification about Go 1.11 is important, because Go's memory handling changes over time. Back in 2014 with Go 1.5 or so, Go processes used a huge amount of virtual memory, but that's changed since then.)

The Go runtime itself initially allocates memory in relatively decent sized chunks of memory called 'spans', as discussed in the big comment at the start of runtime/malloc.go (and see also this and this (also); spans are at least 8 KB, but may be larger. If a span has no objects allocated in it, it is an idle span; how many bytes are in idle spans is in runtime.MemStats.HeapIdle. If a span is idle for sufficiently long, the Go runtime 'releases' it back to the OS, although this doesn't mean what you think. Released spans are a subset of idle spans; when a span is released, it still counts as idle.

(In theory the number of bytes of idle spans released back to the operating system is runtime.MemStats.HeapReleased, but you probably want to read the comment about this in the source code of runtime/mstats.go.)

Counting released spans as idle sounds peculiar until you understand something important; Go doesn't actually give any memory address space back to the OS when a span is released. Instead, what Go does is to tell the OS that it doesn't need the contents of the span's memory pages any more and the OS can replace them with zero bytes at its whim. So 'released' here doesn't mean 'return the memory back to the OS', it means 'discard the contents of the memory'. The memory itself remains part of the process and counts as part of the process size (it may or may not count as part of the resident set size, depending on the OS), and Go can immediately use such a released idle span again if it wants to, just as it can a plain idle span.

(On Unix, releasing pages back to the OS consists of calling madvise() (Linux, FreeBSD) on them with either MADV_FREE or MADV_DONTNEED, depending on the specific Unix. On Windows, Go uses VirtualFree() with MEM_DECOMMIT. On versions of Linux with MADV_FREE, I'm not sure what happens to your RSS after doing it; some sources suggest that your RSS doesn't go down until the kernel starts actually reclaiming the pages from you, which may be some time later.)

As far as I can tell from inspecting the current runtime code, Go only very rarely returns memory that it has used back to the operating system by calling munmap() or the Windows equivalent. In particular, once Go has used memory for regular heap allocations it will never be returned to the OS even if Go has plenty of released idle memory that's been untouched for a very long time (as far as I can tell). As a result, the process virtual size that you see in tools like htop is basically a high water mark, and you can expect it to never go down. If you want to know how much memory your Go program is really using, you need to carefully look at the various bits and pieces in runtime.MemStats, perhaps exported through net/http/pprof.

GoNoMemoryFreeing written at 00:04:48; Add Comment

2018-09-28

Addressable values in Go (and unaddressable ones too)

One of the tricky concepts in Go is 'addressable values', which show up in various places in the Go specification. To understand them better, let's pull all of the scattered references to them together in one place. The best place to start is with the specification of what they are, which is covered in Address operators:

For an operand x of type T, the address operation &x generates a pointer of type *T to x. The operand must be addressable, that is, either a variable, pointer indirection, or slice indexing operation; or a field selector of an addressable struct operand; or an array indexing operation of an addressable array. As an exception to the addressability requirement, x may also be a (possibly parenthesized) composite literal. [...]

There are a number of important things that are not addressable. For example, values in a map and the return values from function and method calls are not addressable. The following are all errors:

&m["key"]
&afunc()
&t.method()

The return value of a function only becomes addressable when put into a variable:

v := afunc()
&v

(Really, it is the variable that is addressable and we have merely used a short variable declaration to initialize it with the return value.)

Since field selection and array indexing require that their structure or array also be addressable, you also can't take the address of a field or an array element from a return value. The following are errors:

&afunc().field
&afunc()[0]

In both cases, you must save the return value in a variable first, just as you need to do if you want to use & on the entire return value. However, the general rule about pointer dereferencing means that this works if the function returns a pointer to a structure or an array. To make your life more confusing the syntax in the caller is exactly the same, so whether or not '&afunc().field' is valid depends on whether afunc() returns a pointer to a structure or the structure itself.

(If you have code that does '&afunc().field', this means that you can get fun and hard to follow errors if someone decides to change afunc()'s return type from 'pointer to structure' to 'structure'. Current Go reports only 'cannot take the address of afunc().field', which fails to explain why.)

Functions themselves are not addressable, so this is an error:

func ifunc() int { return 1 }
&ifunc

Functions point out a difference between Go and C, which is that Go has the concept of pointer-like things that are not pointers as the language sees them. You cannot apply & to functions, but you can assign them to variables; however, the resulting type is not formally a pointer:

v := ifunc
fmt.Printf("%T\n", v)

The type printed here is 'func() int', not a pointer type. Of course you can now take the address of v, at which point you have the type '*func() int'.

By itself, taking the address of things just creates pointers that you can dereference either explicitly with * or implicitly with selectors. The larger importance of addressable things in Go shows up where else they get used in the language.

The most important place that addressability shows up in the specification is in Assignments:

Each left-hand side operand must be addressable, a map index expression, or (for = assignments only), the blank identifier. [...]

(Map index expressions must be specially added here because they aren't addressable.)

In other words, addressability is the primary definition of what can be assigned to. It covers all forms of assignment, including assignment in for statements with range clauses and ++ and -- statements. This implies that if you widen the definition of addressability in Go, by default you widen what can be assigned to; it will include your newly widened stuff.

Because structure fields and array indexes require their containing thing to be addressable, you cannot directly assign to fields or array elements in structures or arrays returned by functions. Both of these are errors:

sfunc().field = ....
afunc()[0] = ....

However, because pointer indirection is addressable, a function that returns a pointer to a structure (instead of a structure) can be used this way:

spfunc().field = ...

This is syntactically legal but may not actually make sense, unless you have another reference to the underlying structure somewhere.

Because all slice index expressions are addressable, if you have a function that returns a slice, you can directly assign to a portion of the returned slice without capturing the slice in a variable:

slicefunc()[:5] = ...

If the slice and its backing array are newly generated by slicefunc() and are otherwise unreferenced, this will quietly discard everything afterward through garbage collection; the only point of your assignment is that it might panic under some situations.

Since map index expressions are a special exception to the addressability requirements, you can assign to an index in a map that's just been returned by a function:

mfunc()["key"] = ...

This has the same potential problem as with the slice-returning function above.

Next, slice expressions sometimes require addressability:

[...] If the sliced operand is an array, it must be addressable and the result of the slice operation is a slice with the same element type as the array. [...]

Taking slices of strings, existing slices, or pointers to arrays does not require that the value you are slicing be addressable; only actual arrays are special. This is the root cause of Dave Cheney's pop quiz that I recently wrote about, because it means that if a function returns an actual array, you cannot immediately slice its return value; you must assign the return value to a variable first in order to make it addressable.

Given that you can take slices of unaddressable slices, just not of unaddressable arrays, it seems pretty likely that this decision is a deliberate pragmatic one to avoid requiring Go to silently materialize heap storage for cases like arrays that are return values. If you do this through a variable, it is at least more explicit that you have a full return object sitting around and that the slice is not necessarily making most of it go away.

Finally, we have Method calls (at the bottom of Calls) and their twins Method values. Both will do one special thing with addressable values:

[...] If x is addressable and &x's method set contains m, x.m() is shorthand for (&x).m(): [...]

And:

As with method calls, a reference to a non-interface method with a pointer receiver using an addressable value will automatically take the address of that value: t.Mp is equivalent to (&t).Mp.

Because return values are not addressable by themselves, you cannot chain method calls from a non-pointer return value to a pointer receiver method. In other words, you can't do this:

// T.Mv() returns a T, not a *T
// and we have a *T.Mp() method
t.Mv().Mp()

This is equivalent to '(&t.Mv()).Mp()' and as we've seen, '&t.Mv()' isn't allowed in general. To make this work, you have to assign the return value to a variable x and then do x.Mp(); at that point, 'x' is addressable and so the implicit '&x' is valid.

The error message you currently get here for an unaddressable value is a little bit unusual, although it may change in the future to be clearer. If you have 'valfunc().Mp()', you get two error messages for this (reporting the same location):

[...]: cannot call pointer method on valfunc()
[...]: cannot take the address of valfunc()

(Gccgo 8.1.1 reports 'method requires a pointer receiver', which is basically the same thing but doesn't tell you why you aren't getting the automatic pointer receiver that you expected.)

Note that the method Mp() is not part of the method set of T; it's merely accessible and callable if you have an addressable value of T. This is different from the method sets of pointer receiver types, *T, which always include all the methods of their underlying value receiver type T.

Intuitively, Go's requirement of addressability here is reasonable. Pointer receiver methods are generally used to mutate their receiver, but if Go called a method here, the mutation would be immediately lost because the actual value is only a temporary thing since it's not being captured by any variable.

(As usual, doing the work to write this entry has given me a much clearer understanding of this corner of Go, which is one reason I wrote it. Addressability uncovers some corners that I may actively run into someday, especially around how return values are not immediately usable for everything.)

PS: Given its role in determining what can be assigned to, addressability in Go is somewhat similar to lvalues in C. However, for various reasons C's definition of lvalues has to be much more complicated than Go's addressability. I expect that Go's simplicity here is deliberate and by design, since Go's creators were what you could call 'thoroughly familiar with C'.

Sidebar: The reflect package and addressability

Addressability is also used in the reflect package, where a number of operations that mirror Go language features also have requirements for addressability. The obvious example is Value.Addr, which mirrors &, and has addressability requirements explained in Value.CanAddr. Note that Value.CanAddr's requirements are stricter than those of &'s:

A value is addressable [only] if it is an element of a slice, an element of an addressable array, a field of an addressable struct, or the result of dereferencing a pointer.

In order to get an addressable reflect.Value outside of a slice, you must pass in a pointer and then dereference the pointer with Value.Elem:

var s struct { i int }
v1 := reflect.ValueOf(s)
v2 := reflect.ValueOf(&s)
v3 := v2.Elem()
fmt.Println(v1.CanAddr(), v2.CanAddr(), v3.CanAddr())

This prints 'false false true'. Only the pointer that has been dereferenced through the reflect package is addressable, although s itself is addressable in the language and v1 has the same reflect.Type as v3.

In thinking about this, I can see the package's logic here. By requiring you to start with a pointer, reflect insures that what you're manipulating through it has an outside existence. When you call reflect.ValueOf(s), what you really get is a copy of s. If reflect allowed you to address and change that copy, you might well wind up confused about why s itself never changed or various other aspects of the situation.

(The rule of converting a non-pointer value to any interface is that Go makes a copy of the value and then the resulting interface value refers to the copy. You can see this demonstrated here.)

GoAddressableValues written at 23:07:59; Add Comment

2018-09-27

Learning about Go's unaddressable values and slicing

Dave Cheney recently posted another little Go pop quiz on Twitter, and as usual I learned something interesting from it. Let's start with his tweet:

#golang pop quiz: what does this program print?

package main
import (
    "crypto/sha1"
    "fmt"
)

func main() {
    input := []byte("Hello, playground")
    hash := sha1.Sum(input)[:5]
    fmt.Println(hash)
}

To my surprise, here is the answer:

./test.go:10:28: invalid operation sha1.Sum(input)[:5] (slice of unaddressable value)

There are three reasons why we are getting this error. To start with, the necessary enabling factor is that sha1.Sum() has an unusual return value. Most things that return some bytes return a slice, and this code would have worked with slices. But sha1.Sum() returns that odd beast, a fixed-size array ([20]byte, to be specific), and since Go is return by value, that means it really does return a 20 byte array to main(), not, say, a pointer to it.

That leaves us with the concept of unaddressable values, which are the opposite of addressable values. The careful technical version is in the Go specification in Address operators, but the hand waving summary version is that most anonymous values are not addressable (one big exception is composite literals). Here the return value of sha1.Sum() is anonymous, because we're immediately slicing it. Had we stored it in a variable and thus made it non-anonymous, the code would have worked:

    tmp := sha1.Sum(input)
    hash := tmp[:5]

The final piece of the puzzle is why slicing was an error. That's because slicing an array specifically requires that the array be addressable (this is covered at the end of Slice expressions). The anonymous array that is the return value of sha1.Sum() is not addressable, so slicing it is rejected by the compiler.

(Storing the return value into our tmp variable makes it addressable. Well, it makes tmp and the value in it addressable; the return value from sha1.Sum() sort of evaporates after it's copied into tmp.)

I don't know why the designers of Go decided to put this restriction on what values are addressable, although I can imagine various reasons. For instance, allowing the slicing operation here would require Go to silently materialize heap storage to hold sha1.Sum()'s return value (and then copy the value to it), which would then live on for however long the slice did.

(Since Go returns all values on the stack, as described in "The Go low-level calling convention on x86-64", this would require a literal copy of the data. This is not a big deal for the 20-byte result of sha1.Sum(); I'm pretty sure that people routinely return and copy bigger structures.)

PS: A number of things through the Go language specification require or only operate on addressable things. For example, assignment mostly requires addressability.

Sidebar: Method calls and addressability

Suppose you have a type T and also some methods defined on *T, eg *T.Op(). Much like Go allows you to do field references without dereferencing pointers, it allows you to call the pointer methods on a non-pointer value:

var x T
x.Op()

Here Go makes this shorthand for the obvious '(&x).Op()' (this is covered in Calls, at the bottom). However, because this shortcut requires taking the address of something, it requires addressability. So, what you can't do is this:

// afunc() returns a T
afunc().Op()

// But this works:
var x T = afunc()
x.Op()

I think I've seen people discuss this Go quirk of method calls, but at the time I didn't fully understand what was going on and what exactly made a method call not work because of it.

(Note that this shorthand conversion is fundamentally different from how *T has all of the methods of T, which has come up before.)

GoUnaddressableSlice written at 00:36:04; Add Comment

2018-09-21

Your databases always have a schema

Over on Mastodon, I had a database opinion:

Obvious database hot take: your stored data always has a schema, unless your code neither reads nor writes it as anything except an opaque blob. What NoSQL changes is how many places that schema exists and how easy it is to have multiple schemas for your data.

(SQL allows multiple schemas too. You just change what fields mean in your code, reuse them, extend them, etc etc.)

One schema that your data has is implicit in what data fields your code reads and writes, what it puts in them (both at the mechanical level of data types and at a higher level), and its tacit knowledge of how those fields relate to each other. If you're actually doing anything with your data, this schema necessarily exists. Of course, it may be a partial schema; your stored data may include fields that are neither read nor written by your code, and even for fields that you read and write, your current code may not reflect the full requirements and restrictions that you have in mind for the data.

SQL databases make part of your schema explicit and enforced in the database, but it is only the part that can be sensibly described and checked there. Generally there are plenty of constraints on your data that are not checked at the SQL level for various reasons (and they may not even be checked in your code). As a result, you can bypass the nominal (SQL) schema of your database by reusing and repurposing database fields in ways that SQL doesn't check for or enforce. This in-SQL schema is in addition to the implicit schema that's in your code.

(You can tell that your code's implicit schema exists even with an SQL database, even if your code auto-reads the SQL schemas, by asking what happens if the DBAs decide to rename a bunch of tables and a bunch of fields in those tables, maybe dropping some and adding others. It is extremely likely that the entire program will explode until someone fixes the code to match the new database arrangement. In other words, you have two copies of your schema, one in the database and one in your code, and those copies had better agree.)

Since your schema lives partly in your code, different pieces of code can have different schema for the same data. Given that you can bypass the SQL schema, this is true whether or not you're using a NoSQL database with no schema; NoSQL just makes it easier and perhaps more likely to happen. In some ways NoSQL is more honest than SQL, because it tells you straight up that it's entirely up to your code to have and maintain a schema. Certainly in NoSQL your code is the only place with a schema, and so you have a chance to definitely only have one schema for your data instead of two.

On the other hand, one advantage of SQL is that you have a central point that explicitly documents and enforces at least some of your schema. You don't have to try to reverse engineer even the basics of your schema out of your code, and you know that there is at least basic agreement about data facts on ground (for example, what tables and what fields there are, what basic types can go in those fields, and perhaps what relationships there definitely are between various fields via constraints and foreign keys).

(I've been thinking this thought for some time and was pushed over the edge today by reading yet another article about how SQL databases were better than NoSQL ones partly because they mean you have a schema for your data. As mentioned, I think that there are advantages to having your schema represented in your database, but it is absolutely not the case that NoSQL has no schema or that SQL means you only have one schema.)

DatabasesAlwaysSchemas written at 20:58:49; Add Comment

(Previous 10 or go back to September 2018 at 2018/09/13)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.