Wandering Thoughts

2018-10-15

Garbage collection and the underappreciated power of good enough

Today I read The success of Go heralds that of Rust (via). In it the author argues that Rust is poised to take over from Go because Rust is a more powerful language environment. As one of their arguments against Go, the author says:

All the remaining issues with Go stem from three design choices:

  • It’s garbage collected, rather than having compile time defined lifetimes for all its resources. This harms performance, removes useful concepts (like move semantics and destructors) and makes compile-time error checking less powerful.

[...]

Fix those things, and Go essentially becomes the language of the future. [...]

I think the author is going to be disappointed about what happens next, because in my opinion they've missed something important about how people use programming languages and what matters to people.

To put it simply, good garbage collection is good enough in practice for almost everyone and it's a lot easier than the fully correct way of carefully implementing lifetimes for all resources. And Go has good garbage collection, because the Go people considered it a critical issue and spent a lot of resources on it. This means that Go's garbage collection is not an issue in practice for most people, regardless of what the author thinks, and thus that Rust's compile time defined lifetimes are not a compelling advantage for Rust that will draw many current (or future) Go users from Go to Rust.

There are people who need the guarantees and minimal memory usage that lifetimes for memory gives you, and there are programs that blow up under any particular garbage collection scheme. But most people do not have such needs or such programs. Most programs are perfectly fine with using some more memory than they strictly need and spending some more CPU time on garbage collection, and most people are fine with this because they have other priorities than making their programs run as fast and with as minimal resource use as possible.

(These people are not being 'lazy'. They are optimizing what matters to them in their particular environment, which may well be things like speed to deployment.)

The underappreciated power of 'good enough' is that good enough is sufficient for most people and most programs; really, it's almost always sufficient. People don't always stop at good enough, but they often do, especially when achieving better than good enough is significantly harder. This is not a new thing and we have seen this over and over again in programming languages; just look at how people have flocked to Perl, Python, PHP, Ruby, and JavaScript, despite there being more powerful alternatives.

(Good enough changes over time and over scale. What is good enough at small scale is not good enough at larger scale, but many times you never need to handle that larger scale.)

What matters to most people most of the time is not perfection, it is good enough. They may go beyond that, but you should not count on it, and especially you should not count on perfection pulling lots of people away from good enough. We have plenty of proof that it doesn't.

GarbageCollectionGoodEnough written at 00:06:10; Add Comment

2018-10-06

A deep dive into the OS memory use of a simple Go program

One of the enduring mysteries of actually using Go programs is understanding how much OS-level memory they use, as opposed to the various Go-level memory metrics exposed by runtime.MemStats. OS level memory use matters because it influences things like how much real memory your program needs and how likely it is to be killed by the OS in a low-memory situation, but there has always been a disconnect between OS level information and Go level information. After researching enough to write about how Go doesn't free heap memory back to the OS, I got sufficiently curious to really dig down into the details of a very simple program and now I'm going to go through them. All of this is for Go 1.11; other Go versions have had different behavior.

Our very simple program is going to do nothing except sit there so that we can examine its memory use:

package main
func main() {
    var i uint64
    for {
        i++
    }
}

(It turns out that we could use time.Sleep() to pause without dragging in extra complications, because it's actually handled directly in the runtime, despite it nominally being in the time package.)

This simple looking program already has a complicated runtime environment, with several system goroutines operating behind the scene. It also has more memory use than you probably expect. Here's what its memory map looks like on my 64-bit Linux machine:

0000000000400000    316K r-x-- memdemo
000000000044f000    432K r---- memdemo
00000000004bb000     12K rw--- memdemo
00000000004be000    124K rw---   [ bss ]
000000c000000000  65536K rw---   [ anon ]
00007efdfc10c000  35264K rw---   [ anon ]
00007ffc088f1000    136K rw---   [ stack ]
00007ffc08933000     12K r----   [ vvar ]
00007ffc08936000      8K r-x--   [ vdso ]
ffffffffff600000      4K r-x--   [ vsyscall ]
 total           101844K

The vvar, vdso, and vsyscall mappings come from the Linux kernel; the '[ stack ]' mapping is the standard process stack created by the Linux kernel, and the first four mappings are all from the program itself (the actual compiled machine code, the read-only data, plain data, and then the zero'd data respectively). Go itself has allocated the two '[ anon ]' mappings in the middle, which are most of the program's memory use; we have one 64 MB mapping at 0x00c000000000 and one 34.4 MB mapping at 0x7efdfc10c000.

(The addresses for some of these mappings will vary from run to run.)

As described in Allocator Wrestling (see also, and), Go allocates heap memory (including the memory for goroutine stacks) in chunks of memory called spans that come from arenas. Arenas are 64 MB in size and are allocated at fixed locations; on 64-bit Linux, they start at 0x00c000000000. So this is our 64 MB mapping; it is the program's first arena, the only one necessary, which handles all normal Go memory allocation.

If we run our program under strace -e trace=%memory, we'll discover that the remaining mysterious mapping actually comes from a number of separate mmap() calls that the Linux kernel has merged together into one memory area. Here is the trace for our program:

mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfe33c000
mmap(0xc000000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
mmap(0xc000000000, 67108864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc000000000
mmap(NULL, 33554432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc33c000
mmap(NULL, 2162688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc12c000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc11c000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc10c000

So we have, in order, a 256 KB allocation, the 64 MB arena allocated at its fixed address, a 32 MB allocation, a slightly over 2 MB allocation, and two 64 KB allocations. Everything except the arena allocation is allocated at successively lower addresses next to each other and gets merged together into the single mapping starting at 0x7efdfc10c000. All of these allocations are internal allocations from the Go runtime, and I'm going to run down them in order.

The initial 256 KB allocation is for the first chunk of the Go runtime's area for persistent allocations. These are runtime things that will never be freed up and which can be (and are) allocated outside of the regular heap arenas. Various things are allocated in persistent allocations, and the persistent allocator mostly works in 256 KB chunks that it gets from the OS. Our first mmap() is thus the runtime starting to allocate from this area, which causes the allocator to get its first chunk from the OS. The memory for these persistent allocator chunks is mostly recorded in runtime.MemStats.OtherSys, although it's not the only thing that falls into that category and some persistent allocations are in different categories.

The 32 MB allocation immediately after our first arena is for the heap allocator's "L2" arena map. As the comments in runtime/malloc.go note, most 64-bit architectures (including Linux) have only a single large L2 arena map, which has to be allocated when the first arena is allocated. The next allocation, which is 2112 KB or 2 MB plus 64 KB, turns out to be for the heapArena structure for our newly allocated arena. It has two fields; the .bitmap field is 2 MB in size, and the .spans field is 64 KB (in 8192 8-byte pointers). This explains the odd size requested.

(If I'm reading the code correctly, the L2 arena map isn't accounted for in any runtime.MemStats value; this may be a bug. The heapArena structure is accounted for in runtime.MemStats.GcSys.)

The final two 64 KB allocations are for the initial version of a data structure used to keep track of all spans (set up in recordspan()) and the allocation for a data structure (gcBits) that is used in garbage collection (set up in newArenaMayUnlock()). The span tracking structure is accounted for in runtime.MemStats.OtherSys, while the gcBits stuff is in runtime.MemStats.GcSys.

As your program uses more memory, I believe that in general you can expect more arenas to be allocated from the OS, and with each arena you'll also get another arenaHeap structure. I believe that the L2 arena map is only allocated once on 64-bit Unix. You will probably periodically have larger span data structures and more gcBits structures allocated, and you will definitely periodically have new 256 KB chunks allocated for persistent allocations.

(There are probably other sources of allocations from the OS in the Go runtime. Interested parties can search through the source code for calls to sysAlloc(), persistentalloc(), and so on. In the end everything apart from arenas comes from sysAlloc(), but there are often layers of indirection.)

PS: If you want to track down this sort of thing yourself, the easiest way to do it is to run your test program under gdb, set a breakpoint on runtime.sysAlloc, and then use where every time the breakpoint is hit. On most Unixes, this is the only low level runtime function that allocates floating anonymous memory with mmap(); you can see this in, for example, the Linux version of low level memory allocation.

GoProgramMemoryUse written at 21:42:04; Add Comment

Go basically never frees heap memory back to the operating system

Over on Reddit's r/golang, I ran into an interesting question about Go's memory use as part of this general memory question:

[...] However Go is not immediately freeing the memory, at least from htop's perspective.

What can I do to A) gain insight on when this memory will be made available to the OS, [...]

The usual question about memory usage in Go programs is when things will be garbage collected (which can be tricky). However, this person wants to know when Go will return free memory back to the operating system. This is a good question partly because programs often don't do very much of this (or really we should say the versions of malloc() that programs use don't do this), for various reasons. Somewhat to my surprise, it turns out that Go basically never returns memory address space to the OS, as of Go 1.11. In htop, you can expect normal Go programs to only ever be constant sized or grow, never to shrink.

(The qualification about Go 1.11 is important, because Go's memory handling changes over time. Back in 2014 with Go 1.5 or so, Go processes used a huge amount of virtual memory, but that's changed since then.)

The Go runtime itself initially allocates memory in relatively decent sized chunks of memory called 'spans', as discussed in the big comment at the start of runtime/malloc.go (and see also this and this (also); spans are at least 8 KB, but may be larger. If a span has no objects allocated in it, it is an idle span; how many bytes are in idle spans is in runtime.MemStats.HeapIdle. If a span is idle for sufficiently long, the Go runtime 'releases' it back to the OS, although this doesn't mean what you think. Released spans are a subset of idle spans; when a span is released, it still counts as idle.

(In theory the number of bytes of idle spans released back to the operating system is runtime.MemStats.HeapReleased, but you probably want to read the comment about this in the source code of runtime/mstats.go.)

Counting released spans as idle sounds peculiar until you understand something important; Go doesn't actually give any memory address space back to the OS when a span is released. Instead, what Go does is to tell the OS that it doesn't need the contents of the span's memory pages any more and the OS can replace them with zero bytes at its whim. So 'released' here doesn't mean 'return the memory back to the OS', it means 'discard the contents of the memory'. The memory itself remains part of the process and counts as part of the process size (it may or may not count as part of the resident set size, depending on the OS), and Go can immediately use such a released idle span again if it wants to, just as it can a plain idle span.

(On Unix, releasing pages back to the OS consists of calling madvise() (Linux, FreeBSD) on them with either MADV_FREE or MADV_DONTNEED, depending on the specific Unix. On Windows, Go uses VirtualFree() with MEM_DECOMMIT. On versions of Linux with MADV_FREE, I'm not sure what happens to your RSS after doing it; some sources suggest that your RSS doesn't go down until the kernel starts actually reclaiming the pages from you, which may be some time later.)

As far as I can tell from inspecting the current runtime code, Go only very rarely returns memory that it has used back to the operating system by calling munmap() or the Windows equivalent. In particular, once Go has used memory for regular heap allocations it will never be returned to the OS even if Go has plenty of released idle memory that's been untouched for a very long time (as far as I can tell). As a result, the process virtual size that you see in tools like htop is basically a high water mark, and you can expect it to never go down. If you want to know how much memory your Go program is really using, you need to carefully look at the various bits and pieces in runtime.MemStats, perhaps exported through net/http/pprof.

GoNoMemoryFreeing written at 00:04:48; Add Comment

2018-09-28

Addressable values in Go (and unaddressable ones too)

One of the tricky concepts in Go is 'addressable values', which show up in various places in the Go specification. To understand them better, let's pull all of the scattered references to them together in one place. The best place to start is with the specification of what they are, which is covered in Address operators:

For an operand x of type T, the address operation &x generates a pointer of type *T to x. The operand must be addressable, that is, either a variable, pointer indirection, or slice indexing operation; or a field selector of an addressable struct operand; or an array indexing operation of an addressable array. As an exception to the addressability requirement, x may also be a (possibly parenthesized) composite literal. [...]

There are a number of important things that are not addressable. For example, values in a map and the return values from function and method calls are not addressable. The following are all errors:

&m["key"]
&afunc()
&t.method()

The return value of a function only becomes addressable when put into a variable:

v := afunc()
&v

(Really, it is the variable that is addressable and we have merely used a short variable declaration to initialize it with the return value.)

Since field selection and array indexing require that their structure or array also be addressable, you also can't take the address of a field or an array element from a return value. The following are errors:

&afunc().field
&afunc()[0]

In both cases, you must save the return value in a variable first, just as you need to do if you want to use & on the entire return value. However, the general rule about pointer dereferencing means that this works if the function returns a pointer to a structure or an array. To make your life more confusing the syntax in the caller is exactly the same, so whether or not '&afunc().field' is valid depends on whether afunc() returns a pointer to a structure or the structure itself.

(If you have code that does '&afunc().field', this means that you can get fun and hard to follow errors if someone decides to change afunc()'s return type from 'pointer to structure' to 'structure'. Current Go reports only 'cannot take the address of afunc().field', which fails to explain why.)

Functions themselves are not addressable, so this is an error:

func ifunc() int { return 1 }
&ifunc

Functions point out a difference between Go and C, which is that Go has the concept of pointer-like things that are not pointers as the language sees them. You cannot apply & to functions, but you can assign them to variables; however, the resulting type is not formally a pointer:

v := ifunc
fmt.Printf("%T\n", v)

The type printed here is 'func() int', not a pointer type. Of course you can now take the address of v, at which point you have the type '*func() int'.

By itself, taking the address of things just creates pointers that you can dereference either explicitly with * or implicitly with selectors. The larger importance of addressable things in Go shows up where else they get used in the language.

The most important place that addressability shows up in the specification is in Assignments:

Each left-hand side operand must be addressable, a map index expression, or (for = assignments only), the blank identifier. [...]

(Map index expressions must be specially added here because they aren't addressable.)

In other words, addressability is the primary definition of what can be assigned to. It covers all forms of assignment, including assignment in for statements with range clauses and ++ and -- statements. This implies that if you widen the definition of addressability in Go, by default you widen what can be assigned to; it will include your newly widened stuff.

Because structure fields and array indexes require their containing thing to be addressable, you cannot directly assign to fields or array elements in structures or arrays returned by functions. Both of these are errors:

sfunc().field = ....
afunc()[0] = ....

However, because pointer indirection is addressable, a function that returns a pointer to a structure (instead of a structure) can be used this way:

spfunc().field = ...

This is syntactically legal but may not actually make sense, unless you have another reference to the underlying structure somewhere.

Because all slice index expressions are addressable, if you have a function that returns a slice, you can directly assign to a portion of the returned slice without capturing the slice in a variable:

slicefunc()[:5] = ...

If the slice and its backing array are newly generated by slicefunc() and are otherwise unreferenced, this will quietly discard everything afterward through garbage collection; the only point of your assignment is that it might panic under some situations.

Since map index expressions are a special exception to the addressability requirements, you can assign to an index in a map that's just been returned by a function:

mfunc()["key"] = ...

This has the same potential problem as with the slice-returning function above.

Next, slice expressions sometimes require addressability:

[...] If the sliced operand is an array, it must be addressable and the result of the slice operation is a slice with the same element type as the array. [...]

Taking slices of strings, existing slices, or pointers to arrays does not require that the value you are slicing be addressable; only actual arrays are special. This is the root cause of Dave Cheney's pop quiz that I recently wrote about, because it means that if a function returns an actual array, you cannot immediately slice its return value; you must assign the return value to a variable first in order to make it addressable.

Given that you can take slices of unaddressable slices, just not of unaddressable arrays, it seems pretty likely that this decision is a deliberate pragmatic one to avoid requiring Go to silently materialize heap storage for cases like arrays that are return values. If you do this through a variable, it is at least more explicit that you have a full return object sitting around and that the slice is not necessarily making most of it go away.

Finally, we have Method calls (at the bottom of Calls) and their twins Method values. Both will do one special thing with addressable values:

[...] If x is addressable and &x's method set contains m, x.m() is shorthand for (&x).m(): [...]

And:

As with method calls, a reference to a non-interface method with a pointer receiver using an addressable value will automatically take the address of that value: t.Mp is equivalent to (&t).Mp.

Because return values are not addressable by themselves, you cannot chain method calls from a non-pointer return value to a pointer receiver method. In other words, you can't do this:

// T.Mv() returns a T, not a *T
// and we have a *T.Mp() method
t.Mv().Mp()

This is equivalent to '(&t.Mv()).Mp()' and as we've seen, '&t.Mv()' isn't allowed in general. To make this work, you have to assign the return value to a variable x and then do x.Mp(); at that point, 'x' is addressable and so the implicit '&x' is valid.

The error message you currently get here for an unaddressable value is a little bit unusual, although it may change in the future to be clearer. If you have 'valfunc().Mp()', you get two error messages for this (reporting the same location):

[...]: cannot call pointer method on valfunc()
[...]: cannot take the address of valfunc()

(Gccgo 8.1.1 reports 'method requires a pointer receiver', which is basically the same thing but doesn't tell you why you aren't getting the automatic pointer receiver that you expected.)

Note that the method Mp() is not part of the method set of T; it's merely accessible and callable if you have an addressable value of T. This is different from the method sets of pointer receiver types, *T, which always include all the methods of their underlying value receiver type T.

Intuitively, Go's requirement of addressability here is reasonable. Pointer receiver methods are generally used to mutate their receiver, but if Go called a method here, the mutation would be immediately lost because the actual value is only a temporary thing since it's not being captured by any variable.

(As usual, doing the work to write this entry has given me a much clearer understanding of this corner of Go, which is one reason I wrote it. Addressability uncovers some corners that I may actively run into someday, especially around how return values are not immediately usable for everything.)

PS: Given its role in determining what can be assigned to, addressability in Go is somewhat similar to lvalues in C. However, for various reasons C's definition of lvalues has to be much more complicated than Go's addressability. I expect that Go's simplicity here is deliberate and by design, since Go's creators were what you could call 'thoroughly familiar with C'.

Sidebar: The reflect package and addressability

Addressability is also used in the reflect package, where a number of operations that mirror Go language features also have requirements for addressability. The obvious example is Value.Addr, which mirrors &, and has addressability requirements explained in Value.CanAddr. Note that Value.CanAddr's requirements are stricter than those of &'s:

A value is addressable [only] if it is an element of a slice, an element of an addressable array, a field of an addressable struct, or the result of dereferencing a pointer.

In order to get an addressable reflect.Value outside of a slice, you must pass in a pointer and then dereference the pointer with Value.Elem:

var s struct { i int }
v1 := reflect.ValueOf(s)
v2 := reflect.ValueOf(&s)
v3 := v2.Elem()
fmt.Println(v1.CanAddr(), v2.CanAddr(), v3.CanAddr())

This prints 'false false true'. Only the pointer that has been dereferenced through the reflect package is addressable, although s itself is addressable in the language and v1 has the same reflect.Type as v3.

In thinking about this, I can see the package's logic here. By requiring you to start with a pointer, reflect insures that what you're manipulating through it has an outside existence. When you call reflect.ValueOf(s), what you really get is a copy of s. If reflect allowed you to address and change that copy, you might well wind up confused about why s itself never changed or various other aspects of the situation.

(The rule of converting a non-pointer value to any interface is that Go makes a copy of the value and then the resulting interface value refers to the copy. You can see this demonstrated here.)

GoAddressableValues written at 23:07:59; Add Comment

2018-09-27

Learning about Go's unaddressable values and slicing

Dave Cheney recently posted another little Go pop quiz on Twitter, and as usual I learned something interesting from it. Let's start with his tweet:

#golang pop quiz: what does this program print?

package main
import (
    "crypto/sha1"
    "fmt"
)

func main() {
    input := []byte("Hello, playground")
    hash := sha1.Sum(input)[:5]
    fmt.Println(hash)
}

To my surprise, here is the answer:

./test.go:10:28: invalid operation sha1.Sum(input)[:5] (slice of unaddressable value)

There are three reasons why we are getting this error. To start with, the necessary enabling factor is that sha1.Sum() has an unusual return value. Most things that return some bytes return a slice, and this code would have worked with slices. But sha1.Sum() returns that odd beast, a fixed-size array ([20]byte, to be specific), and since Go is return by value, that means it really does return a 20 byte array to main(), not, say, a pointer to it.

That leaves us with the concept of unaddressable values, which are the opposite of addressable values. The careful technical version is in the Go specification in Address operators, but the hand waving summary version is that most anonymous values are not addressable (one big exception is composite literals). Here the return value of sha1.Sum() is anonymous, because we're immediately slicing it. Had we stored it in a variable and thus made it non-anonymous, the code would have worked:

    tmp := sha1.Sum(input)
    hash := tmp[:5]

The final piece of the puzzle is why slicing was an error. That's because slicing an array specifically requires that the array be addressable (this is covered at the end of Slice expressions). The anonymous array that is the return value of sha1.Sum() is not addressable, so slicing it is rejected by the compiler.

(Storing the return value into our tmp variable makes it addressable. Well, it makes tmp and the value in it addressable; the return value from sha1.Sum() sort of evaporates after it's copied into tmp.)

I don't know why the designers of Go decided to put this restriction on what values are addressable, although I can imagine various reasons. For instance, allowing the slicing operation here would require Go to silently materialize heap storage to hold sha1.Sum()'s return value (and then copy the value to it), which would then live on for however long the slice did.

(Since Go returns all values on the stack, as described in "The Go low-level calling convention on x86-64", this would require a literal copy of the data. This is not a big deal for the 20-byte result of sha1.Sum(); I'm pretty sure that people routinely return and copy bigger structures.)

PS: A number of things through the Go language specification require or only operate on addressable things. For example, assignment mostly requires addressability.

Sidebar: Method calls and addressability

Suppose you have a type T and also some methods defined on *T, eg *T.Op(). Much like Go allows you to do field references without dereferencing pointers, it allows you to call the pointer methods on a non-pointer value:

var x T
x.Op()

Here Go makes this shorthand for the obvious '(&x).Op()' (this is covered in Calls, at the bottom). However, because this shortcut requires taking the address of something, it requires addressability. So, what you can't do is this:

// afunc() returns a T
afunc().Op()

// But this works:
var x T = afunc()
x.Op()

I think I've seen people discuss this Go quirk of method calls, but at the time I didn't fully understand what was going on and what exactly made a method call not work because of it.

(Note that this shorthand conversion is fundamentally different from how *T has all of the methods of T, which has come up before.)

GoUnaddressableSlice written at 00:36:04; Add Comment

2018-09-21

Your databases always have a schema

Over on Mastodon, I had a database opinion:

Obvious database hot take: your stored data always has a schema, unless your code neither reads nor writes it as anything except an opaque blob. What NoSQL changes is how many places that schema exists and how easy it is to have multiple schemas for your data.

(SQL allows multiple schemas too. You just change what fields mean in your code, reuse them, extend them, etc etc.)

One schema that your data has is implicit in what data fields your code reads and writes, what it puts in them (both at the mechanical level of data types and at a higher level), and its tacit knowledge of how those fields relate to each other. If you're actually doing anything with your data, this schema necessarily exists. Of course, it may be a partial schema; your stored data may include fields that are neither read nor written by your code, and even for fields that you read and write, your current code may not reflect the full requirements and restrictions that you have in mind for the data.

SQL databases make part of your schema explicit and enforced in the database, but it is only the part that can be sensibly described and checked there. Generally there are plenty of constraints on your data that are not checked at the SQL level for various reasons (and they may not even be checked in your code). As a result, you can bypass the nominal (SQL) schema of your database by reusing and repurposing database fields in ways that SQL doesn't check for or enforce. This in-SQL schema is in addition to the implicit schema that's in your code.

(You can tell that your code's implicit schema exists even with an SQL database, even if your code auto-reads the SQL schemas, by asking what happens if the DBAs decide to rename a bunch of tables and a bunch of fields in those tables, maybe dropping some and adding others. It is extremely likely that the entire program will explode until someone fixes the code to match the new database arrangement. In other words, you have two copies of your schema, one in the database and one in your code, and those copies had better agree.)

Since your schema lives partly in your code, different pieces of code can have different schema for the same data. Given that you can bypass the SQL schema, this is true whether or not you're using a NoSQL database with no schema; NoSQL just makes it easier and perhaps more likely to happen. In some ways NoSQL is more honest than SQL, because it tells you straight up that it's entirely up to your code to have and maintain a schema. Certainly in NoSQL your code is the only place with a schema, and so you have a chance to definitely only have one schema for your data instead of two.

On the other hand, one advantage of SQL is that you have a central point that explicitly documents and enforces at least some of your schema. You don't have to try to reverse engineer even the basics of your schema out of your code, and you know that there is at least basic agreement about data facts on ground (for example, what tables and what fields there are, what basic types can go in those fields, and perhaps what relationships there definitely are between various fields via constraints and foreign keys).

(I've been thinking this thought for some time and was pushed over the edge today by reading yet another article about how SQL databases were better than NoSQL ones partly because they mean you have a schema for your data. As mentioned, I think that there are advantages to having your schema represented in your database, but it is absolutely not the case that NoSQL has no schema or that SQL means you only have one schema.)

DatabasesAlwaysSchemas written at 20:58:49; Add Comment

2018-09-13

I don't like getters and setters and prefer direct field access

One of the great language divides and system design debates is between direct access to fields in objects and doing things through getter and setter functions. In this debate, so far I come down firmly on the side of direct field access. I have at least two reasons for this.

The obvious reason to dislike getters and setters is that they're bureaucracy and litter. We've all seen codebases that have a whole collection of tiny methods whose only purpose is to get or set a field, and they only exist because someone said they had to (sometimes this is the language, sometimes this is someone's coding standard). Using these methods is annoying, writing these methods is annoying (and I consider that IDEs can automate this to be a danger sign), and just having them around cluttering up the code is annoying. Direct field access wipes all of this extra clutter away.

The more subtle reason I don't like getters and setters is that they obscure what's actually going on. A getter could be a simple field access that will be inlined into your code in many situations (in some languages), or it could be a whole series of expensive operations that talk to a database and so on. The difference between these two extremes matters quite a bit when you're actually writing code that uses the getter, so one way or another you have to know. With direct field access, generally what you see is what you get; direct field access is just that, and method calls are for things that clearly involve some computation (and perhaps a lot of it, you'll want to consult the documentation).

This straightforward honesty matters, because part of the purpose of an 'API' boundary is to communicate to other people (perhaps you could argue that this is its entire purpose). Getters and setters are mumbling; direct field access is speaking clearly (at least in theory).

At this point it's traditional to raise the possibility that you'll need to change the internal implementation of your objects (or structures) without changing the API that other people use. Using getters and setters theoretically allows you this freedom, while direct field access doesn't. One part of my views here is that performance is part of your API, at least as far as slowing things down goes. Because performance is part of your API, in practice how much you can change your getters and setters is already fairly constrained. If you turn a fast getter or setter into a slow one, people will be unhappy with you and they will probably have to change their code.

All of this gives me certain biases in larger language design issues. For one relevant example, I rather wish that Go interfaces included fields, not just methods (although I sort of see why they don't).

Sidebar: The downside of fields in Go interfaces

If you allow interfaces to have fields, not just methods, anything implementing such an interface must have those fields. In other words, such interfaces can only be implemented by types that are structs (or pointers to structs, which is the more likely case). Method-only interfaces have the advantage that they can be implemented by any type, including weird types such as functions or the empty struct.

GettersSettersDislike written at 01:03:44; Add Comment

2018-09-07

My view of the current state of Go's dependency management (as of Go 1.11)

Over on Mastodon, someone I follow was curious about the state of Go's dependency management and I had some opinions. Rather than just leave them on Mastodon, I'm going to also put them here with a bunch of additional commentary and clarifications of my toot-forced terseness.

I started with a rundown of what I see as the current positives (+) and negatives (-):

I would say Go is at about the 50% level, depending on what you want from dependency management:

+ you can lock & build with specific versions of dependencies (whether or not they've bought into versioning)
+ some packages are publishing a go.mod now (the Go version of Cargo.toml)

- lots of packages with no go.mod yet
- no crates.io equivalent
- can't just 'go get' a program with locked versions
- some other rough edges still
- the whole thing is officially still 'preliminary'.

(I mentioned Cargo.toml and crates.io because the person I was responding to is familiar with Rust.)

If what you want is 'I can make reproducible, stable versions of my own software', then I think dependency management is basically there, but you're going to be seeing a lot of git hashes as 'version numbers'. The tooling is mostly there, but the strong ecology around it has not yet formed and probably won't be for a while (Go 1.11 itself will take time to propagate around the ecology, and that's the starting point).

You can use Go 1.11's module versioning today to get stable and reproducible builds for your own packages, with controllable and easily reversible upgrades of your dependencies. However your dependencies are generally not yet versioned themselves, so what you're really doing is locking yourself to specific VCS commit identifiers (usually git hashes, because most Go packages seem to use git). With no version numbers (semantic or otherwise), upgrading dependencies is sort of a shot in the dark and you never really know what you're getting unless you actively look at the dependency in question.

Some Go packages have started to turn themselves into modules by publishing a go.mod file. However, this is not yet very common. At the moment I have 170 separate repos under $HOME/go/src ('go list' expands this to 900 packages, but many of those are internal), and only 14 of them (from 8 different authors) have go.mod files. With so little use of go.mod so far, module versioning is mostly about freezing your build and not very much about automatically giving you a 'known to work' set of dependent package versions. If you want to use package A and package A uses package B, you're mostly going to be hoping that the current git version of package A works with the current git version of package B (and then freezing those git commits). This is in contrast to a fully developed dependency management ecology (such as you have in Rust), where package A would tell you what version of package B it needed.

What I meant about go-get'ing programs is that currently there's no way to run 'go get <program>' and have this respect the program's go.mod, if it has one (as I sort of mentioned in this entry). Many people won't care about this, but if you want to distribute a program and have people build it with the specific dependent package versions you've set in your go.mod, well, it's possible right now but you have to tell people to clone your repo and then run 'go build' inside it, instead of the more convenient 'go get <myprogram>'. This is a known issue but it's not likely to be fixed before Go 1.12.

Crates.io is the central registry for the Rust package ecosystem. Go has no equivalent; the closest for finding packages is probably godoc.org or Go-Search, but then you get to pull the packages from wherever they live right now. If you want convenient one-stop shopping for packages that do <X> to use in your own code, Go doesn't have a good story for that right now. Instead you're left to do your own research and to try to sort through the various markers of quality and usage that you can find on godoc.org and Go-Search. Often this means that there is no clear, obvious, easily found community consensus choice for doing a particular job; instead information on 'you want to use <X> to do <Y>' spreads through casual conversation, blog posts, and superstition.

(An extreme example of this is getopt replacements. Everyone has their own favorite, eg.)

All of this is why I say that Go dependency management is at about the 50% level. The raw mechanics are mostly present, with some limitations, but the ecology is not, both at the low level of having versioned packages and packages with go.mods, and at the high level of having a coherent, organized package ecosystem instead of the current ad-hoc clutter. Go is certainly not at the level of Rust, where you can go to crates.io to search for a readline library and wind up on rustyline in short order, with a whole bunch of useful information in one spot.

None of this is particularly surprising. Go 1.11 itself is only a few weeks old, so it would be a little bit startling if there was already a wholesale adoption of go.mod, package versioning, and so on across the Go package world. Many people probably haven't even upgraded to Go 1.11 itself yet. Give Go's module versioning a year or two and then we can see where we are.

(The issues around a central registry are complicated, and I'm tempted to say that you probably need to build this into your cultural environment almost from the start. I'm thus not sure Go will ever be able to grow one, at least not without a fair amount of trauma and community drama.)

Go1.11VersioningViews written at 23:39:19; Add Comment

2018-09-04

Some views on the Go 2 Error Inspection early draft proposal

The big news in Go lately has been the announcement of Go 2 Draft Designs for improved error handling, error values, and generics. The details of the proposed improvements to error values have been split into two parts, error printing and error inspection. Today I have some views on the error inspection draft (and you should also read the error values problem overview). Broadly, I like the proposal but I feel that it doesn't go far enough and the result of not doing so is going to be inconvenience and people reinventing the wheel repeatedly. So here are some rambling thoughts on the whole thing.

The error inspection draft proposes a standard interface for unwrapping nested errors and then some standard functions to conveniently do things with (wrapped) errors, errors.Is(), which checks to see if a specific error is somewhere in your wrapped error, and errors.As(), which tries to recover a specific type of error from your wrapped error. A standard way of wrapping errors and then checking through the error chain deals with my issue where Go's net package has undocumented wrapped errors that are a pain to deal with. In the new world, if I want to see if an error is ultimately because of an EHOSTUNREACH from some system call, I would write something like:

if errors.Is(err, syscall.EHOSTUNREACH) {
   ....
}

I like this code. It's short and it directly expresses what I'm checking. Also, this automatically works with any level of error nesting; I can be dealing with a *net.OpError that has been wrapped by my own code for better reporting of the details, and it all just works.

As a side note, in a world with automatic error unwrapping via errors.Is() and errors.As(), it becomes even more important to actively and explicitly document what errors can be wrapped inside your errors. People will actively want to know what error types there's any point to look for, and making them dump your errors or read your source code to find out is just annoying them. This should lead to significantly more documentation on this in the Go standard library.

In a world with errors.Is(), it will be extremely annoying if people needlessly erase errors by turning them into text strings by running them through the current fmt.Errorf(). As a result, I believe strongly that there should be a standard interface for doing the equivalent of fmt.Errorf() for wrapping errors in a way that is transparent to errors.Is(). I am neutral on the overview's open question of whether this should be a clever version of fmt.Errorf(), but certainly going that way would get fast adoption.

Given that errors.Is() makes checking sentinel values one of the easiest things you can do with errors (especially wrapped errors), I expect to see a significant push towards more error sentinel values. Today, I see a reasonable amount of code that doesn't have constant sentinel values and always makes errors with fmt.Errorf(); in a future world with errors.Is(), I expect there to be a lot more sentinel error values, even if they're immediately wrapped up in another layer of errors that provide those custom messages.

Now let's talk about where I feel this proposal doesn't go far enough and as a result leaves people to reinvent wheels. The first annoyance comes in if you have multiple sentinel errors that you want to check against. For example, my check of EHOSTUNREACH is actually incomplete; I should really check ENETUNREACH and ETIMEDOUT as well, and perhaps some others. The current proposal has no API for this, and as a result I expect that we'll see people repeatedly writing their own version of a multi-error check. I believe that there should be a standard API for this; the natural interface is a varargs version of errors.Is(), looking like this:

if errors.IsOneOf(err, syscall.EHOSTUNREACH, syscall.ENETUNREACH, syscall.ETIMEDOUT) {
   ....
}

All of that is all very well and good for checking sentinel values, but what if you want to check that you have a network timeout error? In the current proposal, you wind up writing something like this:

if ne, ok := errors.As(*net.OpError)(err); ok && ne.Temporary() {
   ....
}

We're going through all of this work simply to see if this is a temporary network error, which is a simple 'is this a ...' query. In a world where errors themselves could answer 'Is()' queries, it would make sense for the net package to have a net.TemporaryError sentinel value and then to simply write this check as the much more natural:

if errors.Is(err, net.TemporaryError) {
   ....
}

I expect that pretty much every boolean 'is this a ....' method call on errors today would likely be more ergonomic if it was expressed as a sentinel value this way.

Under the hood, no net error would ever actually be the net.TemporaryError sentinel, because they have to wrap other errors. Instead they would have an Is() check method that would say that they matched net.TemporaryError if .Temporary() was true.

This doesn't make errors.As() unneeded, because there are things you can want to recover from specific types of errors other than boolean answers to questions. One case, drawn from my prior experience, is knowing what system call failed:

oe, ok := errors.As(*os.SyscallError)(err); 
if ok && oe.Syscall == "connect" {
   ....
}

You would also want to use errors.As() or something like it if you had an interface of your own that you wanted to check against. For example:

type temperror interface {
    Temporary() bool
}

if te, ok := errors.As(temperror)(err); ok && te.Temporary() {
   ....
}

Here we're asking 'is it a temporary error in general' instead of our earlier question of 'is it a temporary network error'. There are a number of error classes even in the standard library that have this interface (including syscall.Errno itself), so you might well want to match a wrapped error this way.

Now, this example pushes somewhat against my errors.Is(err, net.TemporaryError) example, because you can't generalize a sentinel value the way you can a method call (which you can turn into an interface, as here). If Go people specifically want to preserve this general style of error method interface, then it probably makes sense to avoid generalizing sentinels. Once general sentinels are available, people may be tempted to implement 'is this a temporary error' and the like purely through sentinels, which would make the error method interface less and less useful.

Perhaps the Go standard library should document some common error method patterns, such as .Temporary() and .Timeout(), partly to encourage people to implement them more often. In a world with errors.As(), where it is much easier to check if something in a wrapped stack of errors thinks the error is temporary or whatever, we might see people doing this much more often than I think they do today.

PS: Whatever exactly happens with Go error inspection, I think it's going to cause a real change in how people both create and deal with Go errors in future code. People are highly motivated to do whatever is easiest, so we can expect them to steadily adopt whatever approach is easiest to use with error inspection. This holds true regardless of what you want them to do with the new way. I also suspect that we aren't going to like some of the hacks that people come up with to bend errors.Is() and so on to what they want to do.

Go2ErrorInspectionViews written at 23:45:00; Add Comment

2018-08-29

How I recently used vendoring in Go

Go 1.11 comes with experimental support for modules, which are more or less Russ Cox's 'vgo' proposal. Initial versions of this proposal were strongly against Go's current feature of vendor directories and wanted to completely replace them. Later versions seem to have toned that down, but my impression is that the Go people still don't like vendoring very much. I've written before about my sysadmin's perspective on vendoring and vgo, where I wanted something that encapsulated all of the build dependencies in a single directory tree that I could treat as a self-contained artifact and that didn't require additional daemons or complicated configuration to use. However, I recently used vendoring for another case, one where I don't think Go's current module support would have worked as well.

For reasons beyond the scope of this entry, I wanted a program that counted up how many files (well, inodes) were used in a directory hierarchy, broken down by the sub-hierarchy; this is essentially the file count version of my standard du based space usage breakdown (where I now use this more convenient version). Since this basically has to manipulate strings in a big map, writing it in Go was a natural decision. Go has filepath.Walk in the standard library, but for extra speed and efficiency I turned to godirwalk. Everything worked great right up until I tried to cross-compile my new program for Solaris (okay, Illumos, but it's the same thing as far as Go is concerned) so I could run it on our fileservers. That's when I found out that godirwalk doesn't support Solaris, ultimately for the reason that Solaris doesn't support file type information in directory entries.

I was able to hack around this with some effort, but the result is a private, modified version of godirwalk that's only ever going to be used by my dircount program, and then only for as long as I care about running dircount on OmniOS (when I stop caring about that, dircount can use the official version). I definitely don't want this to be the apparent official version of godirwalk in my $GOPATH/src hierarchy, and this is not really something that Go modules can solve easily. Traditional Go vendoring solves it neatly and directly; I just put my hacked up version of godirwalk in vendor/, where it will be automatically used by dircount and not touched by anything else (well, provided that I build dircount in my $GOPATH). When or if I don't want to build with my hacked godirwalk, I can rename the vendor directory temporarily and run 'go build'.

(According to the current documentation, the closest I could come with modules is to replace the official godirwalk with my own version that I would have to set up in some tree somewhere. This replacement would be permanent until I edited go.mod; I couldn't switch back and forth easily.)

This isn't a use I'd initially thought of for vendoring, but in retrospect it's an obvious one. Vendoring makes convenient private copies of packages; normally you use this to freeze package versions, but you can just as well use this to apply and freeze your own modifications. Probably I'll run into other cases of this in the future.

(I will elide a discussion of whether this sort of local change to upstream packages is a good idea or whether you should really rename them into a new package name (and thus Go modules are forcing you to do the right thing).)

GoVendoringUsage written at 00:16:31; Add Comment

(Previous 10 or go back to August 2018 at 2018/08/14)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.