2015-09-07
Getting gocode based autocompletion working for Go in GNU Emacs
The existing guides and documentation for this are terrible and
incomplete for someone who is not already experienced with GNU Emacs
Lisp packages (which describes me), so here is what worked for me. I'm
going to assume that your $GOPATH is $HOME/go and that $GOPATH/bin
is on your $PATH.
- get gocode itself:
go get github.com/nsf/gocode
To work in GNU Emacs,
gocodeneeds an auto-completion package; it recommends auto-complete, so that's what I decided to use. If you have that already you're done, but I didn't. At this point you might be tempted to go to the auto-complete website and try to follow directions from there, but you actually don't want to do this because there's an easier way to install it. - The easiest way to install auto-complete and its prerequisites
is through MELPA, which is an additional package
repo for Emacs Lisp packages on top of the default ELPA. To enable MELPA, you need to add a stanza
to your
.emacsfollowing its getting started guide, generally:(require 'package) (add-to-list 'package-archives '("melpa" . "https://melpa.org/packages/")) (package-initialize) - make sure you have a
$HOME/.emacs.ddirectory. You probably do. - (re)start Emacs and run M-x list-packages. Navigate to
auto-completeand get it installed. If you're running GNU Emacs in X, you can just click on its name and then on the [Install] button; if you're running in a terminal window, navigating to each thing and then hitting Return on it does the same. This will installauto-completeand its prerequisite packagepopup, the latter of which is not mentioned on the auto-complete site.It's possible to install auto-complete manually, directly from the site or more accurately from the github repo's release page. Do it from MELPA instead; it's easier and less annoying. If you install manually you'll have to use MELPA to install
popupitself. - Set up the
.emacsstanza for gocode:(add-to-list 'load-path "~/go/src/github.com/nsf/gocode/emacs") (require 'go-autocomplete) (require 'auto-complete-config) (ac-config-default)
This deliberately uses the go-autocomplete.el from
gocode's Go package (and uses it in place), instead of one you might get through eg MELPA. I like this because it means that if (and when) I updategocode, I automatically get the correct and latest version of its Emacs Lisp as well.
Restarting GNU Emacs should then get you autocompletion when writing Go code. You may or may not like how it works and want to keep it; I haven't made up my mind yet. Its usage in X appears to be pretty intuitive but I haven't fully sorted out how it works in text mode (the major way seems to be hitting TAB to cycle through possible auto-completions it offers you).
(Plenty of people seem to like it, though, and I decided I wanted to play with the feature since I've never used a smart IDE-like environment before.)
See also Package management in Emacs: The Good, the Bad, and the Ugly and Emacs: How to Install Packages Using ELPA, MELPA, Marmalade. There are probably other resources too; my Emacs inexperience is showing here.
(As usual, I've written this because if I ever need it again I'll hate myself for not having written it down, especially since the directions here are the result of a whole bunch of missteps and earlier inferior attempts. The whole messy situation led to a Twitter rant.)
2015-09-03
How I've decided to coordinate multiple git repos for a single project
I'm increasingly using git for my own projects (partly because I
keep putting them on Github),
and this has brought up a problem. On the one hand, I like linear
VCS histories (even if they're lies); I don't plan on having branches
be visible in the history of my own repos unless it's clearly
necessary. On the other hand, I routinely have multiple copies of
my repos spread across multiple machines. In theory I always keep
all repos synchronized with each other before I start working in
one and make commits. In practice, well, not necessarily, and the
moment I screw that up a straightforward git pull/push workflow
to propagate changes around creates merges.
My current solution goes like this. First, I elect one repo as the
primary repo; this is the repo which I use to push changes to Github,
for example. To avoid merge commits ever appearing in it, I set it
to only allow fast-forward merges when I do 'git pull', with:
git config pull.ff only
This insures that if the primary repo and a secondary repo wind up with different changes, a pull from the secondary into the primary will fail instead of throwing me into creating a merge commit that I don't want. To avoid creating merge commits when I pull the primary into secondaries, all other repos are set to rebase on pulls following my standard recipe. This is exactly what I want; if I pull new changes from the primary into a secondary, any changes in the secondary are rebased on top of the primary's stuff and linear history is preserved. I can then turn around and pull the secondary's additional changes back into the primary as a fast-forward.
If I use 'git push' to move commits from one repo to another I'm
already safe by default, because git push normally refuses to do
anything except fast-forward updates of the remote. If it complains,
the secondary repo involved needs a rebase. I can either do the
rebase with 'git pull' in the secondary repo, or in the primary
repo I can push to the remote tracking branch in the secondary with
'git push <machine>:<directory> master:origin/master' and then
do a 'git rebase' on the secondary.
(Using a push from the primary usually means that my ssh activity flows the right way. And if I'm pushing frequently I should configure a remote for the secondary or something. I'm not quite hep on git repo remotes and remote tracking branches just yet, though, so that's going to take a bit of fumbling around when I get to it.)
2015-08-31
CGo's Go string functions explained
As plenty of its documentation will tell you, cgo provides four functions to convert between Go and C types by making copies of the data. They are tersely explained in the CGo documentation; too tersely, in my opinion, because the documentation only covers certain things by implication and omits two very important glaring cautions. Because I made some mistakes here I'm going to write out a longer explanation.
The four functions are:
func C.CString(string) *C.char func C.GoString(*C.char) string func C.GoStringN(*C.char, C.int) string func C.GoBytes(unsafe.Pointer, C.int) []byte
C.CString() is the equivalent of C's strdup() and copies your
Go string to a C char * that you can pass to C functions, just as
documented. The one annoying thing is that because of how Go and CGo
types are defined, calling C.free will require a cast:
cs := C.CString("a string")
C.free(unsafe.Pointer(cs))
Note that Go strings may contain embedded 0 bytes and C strings may not. If your Go string contains one and you call C.CString(), C code will see your string truncated at that 0 byte. This is often not a concern, but sometimes text isn't guaranteed to not have null bytes.
C.GoString() is also the equivalent of strdup(), but for going
the other way, from C strings to Go strings. You use it on struct
fields and other things that are declared as C char *'s, aka
*C.char in Go, and (as we'll see) pretty much nothing else.
C.GoStringN() is the equivalent of C's memmove(), not to any
normal C string function. It copies the entire length of the C
buffer into a Go string, and it pays no attention to null bytes.
More exactly, it copies them too. If you have a struct field that
is declared as, say, 'char field[64]' and you call C.GoStringN(&field,
64), the Go string you get will always be 64 characters long and
will probably have a bunch of 0 bytes at the end.
(In my opinion this is a bug in cgo's documentation. It claims that GoStringN takes a C string as the argument, but it manifestly does not, as C strings are null-terminated and GoStringN does not stop at null bytes.)
C.GoBytes() is a version of C.GoStringN() that returns a []byte
instead of a string. Since it doesn't claim to be taking a C string
as the argument, it's clearer that it is simply a memory copy of
the entire buffer.
If you are copying something that is not actually a null terminated
C string but is instead a memory buffer with a size, C.GoStringN()
is exactly what you want; it avoids the traditional C problem of
dealing with 'strings' that aren't actually C strings. However, none of these functions are what
you want if you are dealing with size-limited C strings in the
form of struct fields declared as 'char field[N]'.
The traditional semantics of a fixed size string field in structs,
fields that are declared as 'char field[N]' and described as
holding a string, is that the string is null terminated if and only
if there is room, ie if the string is at most N-1 characters long.
If the string is exactly N characters long, it is not null terminated.
This is a fruitful source of bugs even in C code
and is not a good API, but it is an API that we are generally stuck
with. Any time you see such a field and the documentation does not
expressly tell you that the field contents are always null terminated,
you have to assume that you have this sort of API.
Neither C.GoString() nor C.GoStringN() deal correctly with these
fields. Using GoStringN() is the less wrong option; it will merely
leave you with N-byte Go strings with plenty of trailing 0 bytes
(which you may not notice for some time if you usually just print
those fields out; yes, I've done this). Using the tempting GoString()
is actively dangerous, because it internally does a strlen() on
the argument; if the field lacks a terminating null byte, the
strlen() will run away into memory beyond it. If you're lucky you
will just wind up with some amount of trailing garbage in your Go
string. If you're unlucky, your Go program will take a segmentation
fault as strlen() hits unmapped memory.
(In general, trailing garbage in strings is the traditional sign that you have an unterminated C string somewhere.)
What you actually want is the Go equivalent of C's strndup(),
which guarantees to copy no more than N bytes of memory but will
stop before then if it finds a null byte. Here is my version of it,
with no guarantees:
func strndup(cs *C.char, len int) string {
s := C.GoStringN(cs, C.int(len))
i := strings.IndexByte(s, 0)
if i == -1 {
return s
}
return C.GoString(cs)
}
This code does some extra work in order to minimize extra memory
usage due to how Go strings can hold memory.
You may want to take the alternate approach of returning a slice
of the GoStringN() string. Really sophisticated code might decide
which of the two options to use based on the difference between i
and len.
Update: Ian Lance Taylor showed me the better version:
func strndup(cs *C.char, len int) string {
return C.GoStringN(cs, C.int(C.strnlen(cs, C.size_t(len))))
}
Yes, that's a lot of casts. That's the combination of Go and CGo typing for you.
Turning, well copying blobs of memory into Go structures
As before, suppose (not entirely hypothetically) that you're writing a
package to connect Go up to something that will provide it with
blobs of memory that are actually C structs; these might be mmap()'d
files, information from a library, or whatever. Once you have a
compatible Go struct, you still have to
get the data from a C struct (or raw memory) to the Go struct.
One way to do this is to manually write your own struct copy function
that does it field by field (eg 'io.Field = ks_io.field' for
each field). As with defining the Go structs by hand, this is tedious
and potentially error prone. You can do it and you'll probably have
to if the C struct contains unions or other hard to deal with things,
but we'd like an easier approach. Fortunately there are two good
ones for two different cases. In both cases we will wind up copying
the C struct or the raw memory to a Go struct variable that is an
exact equivalent of the C struct (or at
least we hope it is).
The easy case is when we're dealing with a fixed struct that we
have a known Go type for. Assuming that we have a C void * pointer
to the original memory area called ks.ks_data, we can adopt the
C programmer approach and write:
var io IO io = *((*IO)(ks.ks_data)) return &io
This casts ks.ks_data to a pointer to an IO struct and then
dereferences it to copy the struct itself into the Go variable we
made for this. Depending on the C type of ks_data, you may need
to use the hammer of unsafe.Pointer() here:
io = *((*IO)(unsafe.Pointer(ks.ks_data)))
At this point, some people will be tempted to skip the copying and
just return the 'casted-to-*IO' ks.ks_data pointer. You don't
want to do this, because if you return a Go pointer to C data,
you're coupling Go and C memory management lifetimes. The C
memory must not be freed or reused for something else for as long
as Go retains at least one pointer to it, and there is no way for
you to find out when the last Go reference goes away so that you
can free the C memory. It's much simpler to treat 'C memory' as
completely disjoint from 'Go memory'; any time you want to move
some information across the boundary, you must copy it. With copying
we know we can free ks.ks_data safely the moment the copy is
done and the Go runtime will handle the lifetime of the io variable
for us.
The more difficult case is when we don't know what structs we're
dealing with; we're providing the access package, but it's the
callers who actually know the structs are. This situation might
come up in a package for accessing kernel stats, where drivers or
other kernel systems can export custom stats structs. Our access
package can provide specific support for known structs, but we
need an escape hatch for when the callers knows that some specific
kernel system is providing a 'struct whatever' and it wants to
retrieve that (probably into an identical Go struct created through
cgo).
The C programmer approach to this problem is memmove(). You can
write memmove() in Go with sufficiently perverse use of the
unsafe package, but you don't want to. Instead we can use the
reflect package to create a generic version of the specific 'cast
and copy' code we used above. How to do this wasn't obvious to me
until I did a significant amount of flailing around with the package,
so I'm going to go through the logic of what we're doing in detail.
We'll start with our call signature:
func (k *KStat) CopyTo(ptri interface{}) error { ... }
CopyTo takes a pointer to a Go struct and copies our C memory in
ks.ks_data into the struct. I'm going to omit the reflect-based
code to check ptri to make sure it's actually a pointer to a
suitable struct in the interests of space, but you shouldn't in
real code. Also, there are a whole raft of qualifications you're
going to want to impose on what types of fields that struct can
contain if you want to at least pretend that your package is somewhat
memory safe.
To actually do the copy, we first need to turn this ptri interface
value into a reflect.Value that is the destination struct itself:
ptr := reflect.ValueOf(ptri) dst := ptr.Elem()
We now need to cast ks.ks_data to a Value with the type 'pointer to
dst's type'. This is most easily done by creating a new pointer of the
right type with the address taken from ks.ks_data:
src := reflect.NewAt(dst.Type(), unsafe.Pointer(ks.ks_data))
This is the equivalent of 'src := ((*IO)(ks.ks_data))' in the
type-specific version. Reflect.NewAt is there for doing just
this; its purpose is to create pointers for 'type X at address Y',
which is exactly the operation we need.
Having created this pointer, we then dereference it to copy the
data into dst:
dst.Set(reflect.Indirect(src))
This is the equivalent of 'io = *src' in the type-specific
version. We're done.
In my testing, this approach is surprisingly robust; it will deal
with even structs that I didn't expect it to (such as ones with
unexported fields). But you probably don't want to count on that;
it's safest to give CopyTo() straightforward structs with only
exported fields.
On the whole I'm both happy and pleasantly surprised by how easy
it turned out to be to use the reflect package here; I expected it
to require a much more involved and bureaucratic process. Getting
to this final form involved a lot of missteps and unnecessarily
complicated approaches, but the final form itself is about as minimal
as I could expect. A lot of this is due to the existence of
reflect.NewAt(), but there's also that Value.Set() works fine
even on complex and nested types.
(Note that while you could use the reflect-based version even for the first, fixed struct type case, my understanding is that the reflect package has not insignificant overheads. By contrast the hard coded fixed struct type code is about as minimal and low overhead as you can get; it should normally compile down to basically a memory copy.)
Sidebar: preserving Go memory safety here
I'm not fully confident that I have this right, but I think that to
preserve memory safety in the face of this memory copying you must
insure that the target struct type does not contain any embedded
pointers, either explicit ones or ones implicitly embedded into types
like maps, chans, interfaces, strings, slices, and so on. Fixed-size
arrays are safe because in Go those are just fixed size blocks of
memory.
If you copy a C struct containing pointers into a Go struct containing
pointers, what you're doing is the equivalent of directly returning
the 'casted-to-*IO' ks.ks_data pointer. You've allowed the
creation of a Go object that points to C memory and you now have
the same C and Go memory lifetime issues. And if some of the pointers
are invalid or point to garbage memory, not only is normal Go code
at risk of bad things but it's possible that the Go garbage collector
will wind up trying to dereference them and take a fault.
(This makes it impossible to easily copy certain sorts of C structures into Go structures. Fortunately such structures rarely appear in this sort of C API because they often raise awkward memory lifetime issues even in C.)
2015-08-30
Getting C-compatible structs in Go with and for cgo
Suppose, not entirely hypothetically, that you're writing a
package to connect Go up to something that will provide it blobs
of memory that are C structs. These structs might be the results
of making system calls or they might be just informational things
that a library provides you. In either case you'd like to pass these
structs on to users of your package so they can do things with them.
Within your package you can use the cgo provided C.<whatever>
types directly. But this is a bit annoying (they don't have native
Go types for things like integers, which makes interacting with
regular Go code a mess of casts) and it doesn't help other code
that imports your package. So you need native Go structs, somehow.
One way is to manually define your own Go version of the C struct. This
has two drawbacks; it's tedious (and potentially error-prone),
and it doesn't guarantee that you'll wind up with exactly the
same memory layout that C has (the latter is often but not always
important). Fortunately there is a better approach, and that is to use
cgo's -godefs functionality to more or less automatically generate
struct declarations for you. The result isn't always perfect but it
will probably get you most of the way.
The starting point for -godefs is a cgo Go source file that
declares some Go types as being some C types. For example:
// +build ignorepackage kstat // #include <kstat.h> import "C" type IO C.kstat_io_t type Sysinfo C.sysinfo_t const Sizeof_IO = C.sizeof_kstat_io_t const Sizeof_SI = C.sizeof_sysinfo_t
(The consts are useful for paranoid people so you can later
cross-check the unsafe.Sizeof() of your Go types against the size
of the C types.)
If you run 'go tool cgo -godefs <file>.go', it will print out to
standard output a bunch of standard Go type definitions with exported
fields and everything. You can then save this into a file and use
it. If you think the C types may change, you should leave the
generated file alone so you won't have a bunch of pain if you have
to regenerate it; if the C types are basically fixed, you can
annotate the generated output with eg godoc comments. Cgo worries
about matching types and it will also insert padding where it existed
in the original C struct.
(I don't know what it does if the original C struct is impossible to reconstruct in Go, for instance if Go requires padding where C doesn't. Hopefully it complains. This hope is one reason you may want to check those sizeofs afterwards.)
The big -godefs limitation is the same limitation as cgo has in
general: it has no real support for C unions, since Go doesn't have
them. If your C struct has unions, you're on your own to figure out
how to deal with them; I believe cgo translates them as appropriate
sized uint8 arrays, which is not too useful to actually access
the contents.
There are two wrinkles here. Suppose you have one struct type that embeds another struct type:
struct cpu_stat {
struct cpu_sysinfo cpu_sysinfo;
struct cpu_syswait cpu_syswait;
struct vminfo cpu_vminfo;
}
Here you have to give cgo some help, by creating Go level versions of the embedded struct types before the main struct type:
type Sysinfo C.struct_cpu_sysinfo type Syswait C.struct_cpu_syswait type Vminfo C.struct_cpu_vminfo type CpuStat C.struct_cpu_stat
Cgo will then be able to generate a proper Go struct with embedded Go
structs in CpuStat. If you don't do this, you get a CpuStat struct type
that has incomplete type information; the 'Sysinfo' et al fields in it
will refer to types called _Ctype_... that aren't defined anywhere.
(By the way, I do mean 'Sysinfo' here, not 'Cpu_sysinfo'. Cgo is smart enough to take that sort of commonly seen prefix off of struct field names. I don't know what its algorithm is for doing this, but it's at least useful.)
The second wrinkle is embedded anonymous structs:
struct mntinfo_kstat {
....
struct {
uint32_t srtt;
uint32_t deviate;
} m_timers[4];
....
}
Unfortunately cgo can't deal with these at all. This is issue
5253, and you have two
options. The first is that at the moment, the proposed CL fix still applies to
src/cmd/cgo/gcc.go and works (for me). If you don't want to build
your own Go toolchain (or if the CL no longer applies and works),
the other solution is to create a new C header file that has a
variant of the overall struct that de-anonymizes the embedded struct
by creating a named type for it:
struct m_timer {
uint32_t srtt;
uint32_t deviate;
}
struct mntinfo_kstat_cgo {
....
struct m_timer m_timers[4];
....
}
Then in your Go file:
... // #include "myhacked.h" ... type MTimer C.struct_m_timer type Mntinfo C.struct_mntinfo_kstat_cgo
Unless you made a mistake, the two C structs should have the same
sizes and layouts and thus be totally compatible with each other.
Now you can use -godefs on your version, remembering to make an
explicit Go type for m_timer due to the first wrinkle. If you
feel bold (and you don't think you'll need to regenerate things),
you can then reverse this process in the generated Go file,
re-anonymizing the MTimer type into the overall struct (since
Go supports that perfectly well). Since you're not changing the
actual contents, just where types are declared, the result should
be layout-identical to the original.
PS: the file that's input to -godefs is set to not be built by
the normal 'go build' process because it is only used for this
godefs generation. If it gets included in the build, you'll get
complaints about multiple definitions of your (Go) types. The
corollary to this is that you don't need to have this file and any
supporting .h files in the same directory as your regular .go
files for the package. You can put them in a subdirectory, or keep
them somewhere entirely separate.
(I think the only thing the package line does in the godefs
.go file is set the package name that cgo will print in the
output.)
2015-08-17
Why languages like 'declare before use' for variables and functions
I've been reading my way through Lisp as the Maxwell's equations of software and ran into this 'problems for the author' note:
As a general point about programming language design it seems like it would often be helpful to be able to define procedures in terms of other procedures which have not yet been defined. Which languages make this possible, and which do not? What advantages does it bring for a programming language to be able to do this? Are there any disadvantages?
(I'm going to take 'defined' here as actually meaning 'declared'.)
To people with certain backgrounds (myself included), this question has a fairly straightforward set of answers. So here's my version of why many languages require you to declare things before you use them. We'll come at it from the other side, by asking what your language can't do if it allows you to use things before declaring them.
(As a digression, we're going to assume that we have what I'll call an unambiguous language, one where you don't need to know what things are declared as in order to know what a bit of code actually means. Not all languages are unambiguous; for example C is not (also). If you have an ambiguous language, it absolutely requires 'declare before use' because you can't understand things otherwise.)
To start off, you lose the ability to report a bunch of errors at the time you're looking at a piece of code. Consider:
lvar = .... res = thang(a, b, lver, 0)
In basically all languages, we can't report the lver for lvar
typo (we have to assume that lver is an unknown global variable),
we don't know if thang is being called with the right number of
arguments, and we don't even know if thang is a function instead
of, say, a global variable. Or if it even exists; maybe it's a typo
for thing. We can only find these things out when all valid
identifiers must have been declared; in fully dynamic languages
like Lisp and Python, that's 'at the moment where we reach this
line of code during execution'. In other languages we might be able
to emit error messages only at the end of compiling the source file,
or even when we try to build the final program and find missing or
wrong-typed symbols.
In languages with typed variables and arguments, we don't know if
the arguments to thang() are the right types and if thang()
returns a type that is compatible with res. Again we'll only be
able to tell when we have all identifiers available. If we want to
do this checking before runtime, the compiler (or linker) will have
to keep track of the information involved for all of these pending
checks so that it can check things and report errors once thang()
is defined.
Some typed languages have features for what is called 'implicit
typing', where you don't have to explicitly declare the types of
some things if the language can deduce them from context. We've
been assuming that res is pre-declared as some type, but in an
implicit typing language you could write something like:
res := thang(a, b, lver, 0) res = res + 20
At this point, if thang() is undeclared, the type of res is also
unknown. This will ripple through to any code that uses res, for
example the following line here; is that line valid, or is res perhaps
a complex structure that can in no way have 10 added to it? We can't
tell until later, perhaps much later.
In a language with typed variables and implicit conversions between
some types, we don't know what type conversions we might need in
either the call (to convert some of the arguments) or the return
(to convert thang()'s result into res's type). Note that in
particular we may not know what type the constant 0 is. Even
languages without implicit type conversions often treat constants
as being implicitly converted into whatever concrete numeric type
they need to be in any particular context. In other words, thang()'s
last argument might be a float, a double, a 64-bit unsigned integer,
a 32-bit signed integer, or whatever, and the language will convert
the 0 to it. But it can only know what conversion to do once
thang() is declared and the types of its arguments are known.
This means that a language with any implicit conversions at all
(even for constants like 0) can't actually generate machine code
for this section until thang() is declared even under the best
of circumstances.
However, life is usually much worse for code generation than this.
For a start, most modern architectures pass and return floating
point values in different ways than integer values, and they may
pass and return more complex values in a third way. Since we don't
know what type thang() returns (and we may not know what types
the arguments are either, cf lver), we basically can't generate
any concrete machine code for this function call at the time we
parse it even without implicit conversions. The best we can do is
generate something extremely abstract with lots of blanks to be
filled in later and then sit on it until we know more about
thang(), lver, and so on.
(And implicit typing for res will probably force a ripple effect
of abstraction on code generation for the rest of the function, if
it doesn't prevent it entirely.)
This 'extremely abstract' code generation is in fact what things like Python bytecode are. Unless the bytecode generator can prove certain things about the source code it's processing, what you get is quite generic and thus slow (because it must defer a lot of these decisions to runtime, along with checks like 'do we have the right number of arguments').
So far we've been talking about thang() as a simple function call.
But there are a bunch of more complicated cases, like:
res = obj.method(a, b, lver, 0) res2 = obj1 + obj2
Here we have method calls and operator overloading. If obj, obj1,
and/or obj2 are undeclared or untyped at this point, we don't
know if these operations are valid (the actual obj might not have
a method() method) or what concrete code to generate. We need to
generate either abstract code with blanks to be filled in later or
code that will do all of the work at runtime via some sort of
introspection (or both, cf Python bytecode).
All of this prepares us to answer the question about what sort of languages require 'declare before use': languages that want to do good error reporting or (immediately) compile to machine code or both without large amounts of heartburn. As a pragmatic matter, most statically typed languages require declare before use because it's simpler; such languages either want to generate high quality machine code or at least have up-front assurances about type correctness, so they basically fall into one or both of those categories.
(You can technically have a statically typed language with up-front
assurances about type correctness but without declare before use;
the compiler just has to do a lot more work and it may well wind
up emitting a pile of errors at the end of compilation when it can
say for sure that lver isn't defined and you're calling thang()
with the wrong number and type of arguments and so on. In practice
language designers basically don't do that to compiler writers.)
Conversely, dynamic languages without static typing generally don't
require declare before use. Often the language is so dynamic that
there is no point. Carefully checking the call to thang() at the
time we encounter it in the source code is not entirely useful if
the thang function can be completely redefined (or deleted) by
the time that code gets run, which is the case in languages like
Lisp and Python.
(In fact, given that thang can be redefined by the time the code
is executed we can't even really error out if the arguments are
wrong at the time when we first see the code. Such a thing would
be perfectly legal Python, for example, although you really shouldn't
do that.)
2015-08-04
A lesson to myself: commit my local changes in little bits
For quixotic reasons, I recently updated my own local version of dmenu to the upstream version, which had moved on since I last did this (most importantly, it gained support for Xft fonts). Well, the upstream version plus my collection of bugfixes and improvements. In the process of doing this I have (re)learned a valuable lesson about how I want to organize my local changes to upstream software.
My modifications to dmenu predate my recent decision to commit local changes instead of just carrying them uncommitted on top of the repo. So the first thing I did was to just commit them all in a single all in one changeset, then fetch upstream and rebase. This had rebase conflicts, of course, so I merged them and built the result. This didn't entirely work; some of my modifications clearly hadn't taken. Rather than try to patch the current state of my modifications, I decided to punt and do it the right way; starting with a clean copy of the current upstream, I carefully separated out each of my modifications and added them as separate changes and commits. This worked and wasn't particularly much effort (although there was a certain amount of tedium).
Now, a certain amount of the improvement here is simply that I was porting all of my changes into the current codebase instead of trying to do a rebase merge. This is always going to give you a better chance to evaluate and test things. But that actually kind of points to a problem; because I had my changes in a single giant commit, everything was tangled together and I couldn't see quite clearly enough to do the rebase merge right. Making each change independently made things much clearer and easier to follow, and I suspect that that would have been true even in a merge. The result is also easier for me to read in the future, since each change is now something I can inspect separately.
All of this is obvious to people who've been dealing with VCSes and local modifications, of course. And in theory I knew it too, because I've read all of those homilies to good organization of your changes. I just hadn't stubbed my toe on doing it the 'wrong' way until now (partly because I hadn't been committing changes at all until recently).
(Of course, this is another excellent reason to commit local changes instead of carrying them uncommitted. Uncommitted local changes are basically intrinsically all mingled together.)
Having come to my senses here, I have a few more programs with local hacks that I need to do some change surgery on.
(I've put my version of dmenu up on github, where you can see and cherry pick separate changes if desired. I expect to rebase this periodically, when upstream updates and I notice and care. As before, I have no plans to try to push even my bugfixes to the official release, but interested parties are welcome to try to push them upstream.)
Sidebar: 'git add -p' and this situation
In theory I could have initially committed my big ball of local
changes as separate commits with 'git add -p'. In practice this
would have required disentangling all of the changes from each
other, which would have required understanding code I hadn't touched
for two years or so. I was too impatient at the
start to do that; I hoped that 'commit and rebase' would be good
enough. When it wasn't, restarting from scratch was easier because
it let me test each modification separately as I made it.
Based on this, my personal view is that I'm only going to use 'git
add -p' when I've recently worked on the code and I'm confident
that I can accurately split changes up without needing to test the
split commits to make sure each is correct on its own.
2015-07-29
My workflow for testing Github pull requests
Every so often a Github-based project I'm following has a pending pull
request that might solve a bug or otherwise deal with something I care
about, and it needs some testing by people like me. The simple case is
when I am not carrying any local changes; it is adequately covered by
part of Github's Checking out pull requests locally
(skip to the bit where they talk about 'git fetch'). A more elaborate
version is:
git fetch origin pull/<ID>/head:origin/pr/<ID> git checkout pr/<ID>
That creates a proper remote branch and then a local branch that tracks it, so I can add any local changes to the PR that I turn out to need and then keep track of them relative to the upstream pull request. If the upstream PR is rebased, well, I assume I get to delete my remote and then re-fetch it and probably do other magic. I'll cross that bridge when I reach it.
The not so simple case is when I am carrying local changes on top of the upstream master. In the fully elaborate case I actually have two repos, the first being a pure upstream tracker and the second being a 'build' repo that pulls from the first repo and carries my local changes. I need to apply some of my local changes on top of the pull request while skipping others (in this case, because some of them are workarounds for the problem the pull request is supposed to solve), and I want to do all of this work on a branch so that I can cleanly revert back to 'all of my changes on top of the real upstream master'.
The workflow I've cobbled together for this is:
- Add the Github master repo if I haven't already done so:
git remote add github https://github.com/zfsonlinux/zfs.git - Edit
.git/configto add a new 'fetch =' line so that we can also fetch pull requests from thegithubremote, where they will get mapped to the remote branchesgithub/pr/NNN. This will look like:[remote "github"]
fetch = +refs/pull/*/head:refs/remotes/github/pr/*
[...](This comes from here.)
- Pull down all of the pull requests with '
git fetch github'.I think an alternate to configuring and fetching all pull requests is the limited version I did in the simple case (changing
origintogithubin both occurrences), but I haven't tested this. At the point that I have to do this complicated dance I'm in a 'swatting things with a hammer' mode, so pulling down all PRs seems perfectly fine. I may regret this later. - Create a branch from
masterthat will be where I build and test the pull request (plus my local changes):git checkout -b pr-NNNIt's vitally important that this branch start from
masterand thus already contain my local changes. - Do an interactive rebase relative to the upstream pull request:
git rebase -i github/pr/NNNThis incorporates the pull request's changes 'below' my local changes to master, and with
-iI can drop conflicting or unneeded local changes. Effectively it is much like what happens when you do a regular 'git pull --rebase' onmaster; the changes ingithub/pr/NNNare being treated as upstream changes and we're rebasing my local changes on top of them. - Set the upstream of the pr-NNN branch to the actual Github pull
request branch:
git branch -u github/pr/NNNThis makes '
git status' report things like 'Your branch is ahead of ... by X commits', where X is the number of local commits I've added.
If the pull request is refreshed, my current guess is that I will
have to fully discard my local pr-NNN branch and restart from
fetching the new PR and branching off master. I'll undoubtedly
find out at some point.
Initially I thought I should be able to use a sufficiently clever
invocation of 'git rebase' to copy some of my local commits from
master on to a new branch that was based on the Github pull
request. With work I could get the rebasing to work right; however,
it always wound up with me on (and changing) the master branch,
which is not what I wanted. Based on this very helpful page on
what 'git rebase' is really doing, what
I want is apparently impossible without explicitly making a new
branch first (and that new branch must already include my local
changes so they're what gets rebased, which is why we have to branch
from master).
This is probably not the optimal way to do this, but having hacked my way through today's git adventure game I'm going to stop now. Feel free to tell me how to improve this in comments.
(This is the kind of thing I write down partly to understand it and partly because I would hate to have to derive it again, and I'm sure I'll need it in the future.)
Sidebar: Why I use two repos in the elaborate case
In the complex case I want to both monitor changes in the Github
master repo and have strong control over what I incorporate into
my builds. My approach is to routinely do 'git pull' in the pure
tracking repo and read 'git log' for new changes. When it's time
to actually build, I 'git pull' (with rebasing) from the tracking repo into the build
repo and then proceed. Since I'm pulling from the tracking repo,
not the upstream, I know exactly what changes I'm going to get in
my build repo and I'll never be surprised by a just-added upstream
change.
In theory I'm sure I could do this in a single repo with various tricks, but doing it in two repos is much easier for me to keep straight and reliable.
2015-07-07
The Git 'commit local changes and rebase' experience is a winning one
I mentioned recently that I'd been persuaded to change my ways from leaving local changes uncommitted in my working repos to committing them and rebasing on pulls. When I started this, I didn't expect it to be any real change from the experience of pulling with uncommitted changes and maybe stashing them every so often and so on; I'd just be doing things the proper and 'right' Git way (as everyone told me) instead of the sloppy way.
I was wrong. Oh, certainly the usual experience is the same; I do a
'git pull', I get my normal pull messages and stats output, and
Git adds a couple of lines at the end about automatically rebasing
things. But with local commits and rebasing, dealing with conflicts
after a pull is much better. This isn't because I have fewer or
simpler changes to merge, it's simply because the actual user interface
and process is significantly nicer. There's very little fuss and muss;
I fire up my editor on a file or two, I look for the '<<<<' markers, I
sort things out, I can get relatively readable diffs, and then I can
move on smoothly.
(And the messages from git during rebasing are actually quite helpful.)
Re-applying git stashes that had conflicts with the newly pulled
code was not as easy or as smooth, at least for the cases that I
dealt with. My memory is that it was harder to see my changes and
harder to integrate them, and also sometimes I had to un-add things
from the index that git stash had apparently automatically added
for me. I felt far less in control of the whole process than I do
now with rebasing.
(And with rebasing, the git reflog means that if I need to I can revert my repo to the pre-pull state and see exactly how things were organized in the old code and what the code did with my changes integrated. Sometimes this is vital if there's been a significant restructuring of upstream code. In the past with git stash, I've been lucky because I had an intact pre-pull copy of the repo (with my changes) on a second machine.)
I went into this expecting to be neutral on the change to 'commit and rebase on pulls'. I've now wound up quite positive on it; I actively like and prefer to be fixing up a rebase to fixing up a git stash. Rebasing really is better, even if I just have a single small and isolated change.
(And thank you to the people who patiently pushed me towards this.)
2015-07-03
Some notes on my 'commit local changes and rebase' Git workflow
A month or so ago I wrote about how I don't commit changes in my working repos and in reaction to it several people argued that I ought to change my way. Well, never let it be said that I can't eventually be persuaded to change my ways, so since then I've been cautiously moving to committing my changes and rebasing on pulls in a couple of Git repos. I think I like it, so I'm probably going to make it my standard way of working with Git in the future.
The Git configuration settings I'm using are:
git config pull.rebase true git config rebase.stat true
The first just makes 'git pull' be 'git pull --rebase'. If I
wind up working with multiple branches in repos, I may need to set
this on a per-branch basis or something; so far I just track
origin/master so it works for me. The second preserves the normal
'git pull' behavior of showing a summary of updates, which I find
useful for keeping an eye on things.
One drawback of doing things this way is that 'git pull' will now
abort if there are also uncommitted changes in the repo, such as I
might have for a very temporary hack or test. I need to remember
to either commit such changes or do 'git stash' before I pull.
(The other lesson here is that I need to learn how to manipulate rebase commits so I can alter, amend, or drop some of them.)
Since I've already done this once: if I have committed changes in
a repo without this set, and use 'git pull' instead of 'git pull
--rebase', one way to abort the resulting unwanted merge is 'git
reset --hard HEAD'. Some sources suggest 'git reset --merge' or
'git merge --abort' instead. But really I should set pull rebasing
to on the moment I commit my own changes to a repo.
(There are a few repos around here that now need this change.)
I haven't had to do a bisection on a
commit-and-rebase repo yet, but I suspect that bisection won't go
well if I actually need my changes in all versions of the repo that
I build and test. If I wind up in this situation I will probably
temporarily switch to uncommitted changes and use of 'git stash',
probably in a scratch clone of the upstream master repo.
(In general I like cloning repos to keep various bits of fiddling around in them completely separate. Sure, I probably could mix various activities in one repo without having things get messed up, but a different directory hierarchy that I delete afterwards is the ultimate isolation and it's generally cheap.)