2014-11-26
Using go get alone is a bad way to keep track of interesting packages
When I was just starting with Go, I kept running into interesting Go
packages that I wanted to keep track of and maybe use someday. 'No
problem', I thought, 'I'll just go get them so I have them sitting
around and maybe I'll look at them too'.
Please allow yourself to learn from my painful experience here and don't
do this. Specifically, don't rely on 'go get' as your only way to keep
track of packages you want to keep an eye on, because in practice doing
so is a great way to forget what those packages are. There's no harm in
go get'ing packages you want to have handy to look through, but do
something in addition to keep track of what packages you're interested
in and why.
At first, there was nothing wrong with what I was doing. I could
easily look through the packages and even if I didn't, they sat
there in $GOPATH/src so I could keep track of them. Okay, they
were about three levels down from $GOPATH/src itself, but no big
deal. Then I started getting interested in Go programs like vegeta, Go Package Store,
and delve, plus I was
installing and using more mundane
programs like goimports
and golint.
The problem with all of these is that they have dependencies of
their own, and all of these dependencies wind up in $GOPATH/src too.
Pretty soon my Go source area was a dense thicket of source trees
that intermingled programs, packages I was interested in in their
own right, and dependencies of these first two.
After using Go seriously for not very long I've wound up with far too
many packages and repos in $GOPATH/src to keep any sort of track of,
and especially to remember off the top of my head which packages I was
interested in. Since I was relying purely on go get to keep track of
interesting Go packages, I have now essentially lost track of most of
them. The interesting packages I wanted to keep around because I might
use them have become lost in the noise of the dependencies, because I
can't tell one from the other without going through all 50+ of the repos
to read their READMEs.
As you might guess, I'd be much better off if I'd kept an explicit list of the packages I found interesting in some form. A text file of URLs would be fine; adding notes about what they did and why I thought they were interesting would be better. That would make it trivial to sort out the wheat from the chaff that's just there because of dependencies.
(These days I've switched to doing this for new interesting packages I
run across, but there's some number of packages from older times that
are lost somewhere in the depths of $GOPATH/src.)
PS: This can happen with programs too, but at least there tends to
be less in $GOPATH/bin than in $GOPATH/src so it's easier to
keep track of them. But if you have an ever growing $GOPATH/bin
with an increasing amount of programs you don't actually care about,
there's the problem again.
2014-11-21
Lisp and data structures: one reason it hasn't attracted me
I've written before about some small scale issues with reading languages that use Lisp style syntax, but I don't think I've said what I did the other day on Twitter, which is that the syntax of how Lisp languages are written is probably the primary reason that I slide right off any real interest in them. I like the ideas and concepts of Lisp style languages, the features certainly sound neat, and I often use all of these in other languages when I can, but actual Lisp syntax languages have been a big 'nope' for a long time.
(I once wrote some moderately complex Emacs Lisp modules, so I'm not coming from a position of complete ignorance on Lisp. Although my ELisp code didn't exactly make use of advanced Lisp features.)
I don't know exactly why I really don't like Lisp syntax and find it such a turn-off, but I had an insight on Twitter. One of the things about the syntax of S-expressions is that they very clearly are a data structure. Specifically, they are a list. In effect this gives lists (yes, I know, they're really cons cells) a privileged position in the language. Lisp is lists; you cannot have S-expressions without them. Other languages are more neutral on what they consider to be fundamental data structures; there is very little in the syntax of, say, C that privileges any particular data structure over another.
(Languages like Pyhton privilege a few data structures by giving them explicit syntax for initializers, but that's about it. The rest is in the language environment, which is subject to change.)
Lisp is very clearly in love with lists. If it's terribly in love with lists, it doesn't feel as if it can be fully in love with other data structures; whether or not it's actually true, it feels like other data structures are going to be second class citizens. And this matters to how I feel about the language, because lists are often not the data structure I want to use. Even being second class in just syntax matters, because syntactic sugar matters.
(In case it's not clear, I do somewhat regret that Lisp and I have never clicked. Many very smart people love Lisp a lot and so it's clear that there are very good things there.)
2014-11-16
States in a state machine aren't your only representation of state
I think in terms of state machines a lot; they're one of my standard approaches to problems and I wind up using them quite a bit. I've recently been planning out a multi-threaded program that has to coordinate back and forth between threads as they manipulate the state of host authentication. At first I had a simple set of states, then I realized that these simple states only covered the main flow of events and needed to be more and more complicated, and then I had a blinding realization:
Not all state needs to be represented as state machine states.
When you have a state machine it is not so much tempting as obvious to represent every variation in the state of your entities as another state machine state. But if you do this, then like me you may wind up with an explosion of states, many of which are extremely similar to each other and more or less handled the same way. This isn't necessary. Instead, it's perfectly sensible to represent certain things as flags, additional or detailed status fields, or the like. If you want to mark something as going to be deleted once it's unused, there's no need to add new states to represent this if you can just add a flag. If you have three or four different ways for something to fail and they all lead to basically the same processing, well, you don't need three or four different states for 'failed X way'; you can have just one 'failed' state and then another field with the details of why.
Off the top of my head now, I think that states are best for things
that have a different flow of processing (ideally a significantly
different flow). The more both the origin state and the processing
of two 'states' resembles each other, the less they need to be
separate states and the more the difference can be captured in a
different variable or field (and then handled in the code with only
some ifs).
(On the other hand, if two different states were handled the same way but came from different origin states and transitioned to different destination states, I think I'd definitely keep them as separate states and just share the common code somehow. This would preserve the clarity of state flow in the system. Although if two separate states needed exactly the same handling in the code, I might think I was overlooking something about the states in general.)
2014-11-02
A drawback in how DWiki parses its wikitext
In my initial installment on how DWiki parses its wikitext I said that one important thing DWiki does is that it has two separate parsers:
[...] One parser handles embedded formatting in running text (things like fonts, links, and so on) and the other one handles all of the line oriented block level structures like paragraphs, headers, blockquotes, lists, etc. What makes it work is that the block level parser doesn't parse running text immediately for multi-line things like paragraphs; [...]
This sounds great and in general it is perfectly fine, but it does turn out to impose one restriction on your wiki dialect: it doesn't support block-level constructs that require looking ahead into running text. To work right, this requires that all block level constructs can be recognized before you have to start parsing running text, which means that they have to all come at the start of the line.
This doesn't sound like a particularly onerous restriction on your wikitext dialect, but it actually causes DWiki heartburn in one spot. In my wikitext dialect, definition lists are written as:
- first the <dt> text: And post-colon is the <dd> text.
This looks like a perfectly natural way to write a definition list
entry, but phrased this way it requires block level parsing to look
ahead into the line to recognize and find the ':' that separates the
<dt> text from the <dd> text. Now suppose that you want to have a link
to an outside website in the <dt> text, which of course is going to
contain a ':' in the URL. Oops. Similar issues come up if you just
want a : in the <dt> text for some reason. As a result DWiki's parsing
of definition lists basically disallows a lot of stuff in the <dt> text,
which has led me to not use them very much.
(The other problem with this definition is that it restricts the <dt> text to a single line.)
I think that this may also cause problems for natural looking tables. Most of the ways of writing natural tables are going to rely on interior whitespace to create visible columns and thus demonstrate that this is a table. Looking ahead in what would otherwise be running text to spot runs of whitespace is less dangerous than trying to find a character in a line, but it still breaks this pure separation.
(I didn't think of this issue when I wrote my first entry with its enthusiastic praise; sadly it's a corner case that's easy to forget about most of the time.)
Sidebar: DWiki's table parsing also sort of breaks the rule
DWiki doesn't need to look ahead in running text to know that it's processing a table, but it does have to peek into the running text to find column dividers. This is at least impure, but so far I think it's less annoying than the definition list case; in practice the column dividers don't seem to naturally occur in my table text so far. Still, it's not an easy problem and I'd like a better solution.
(One approach is to be able to tell the running text parser to stop if it runs into a certain character sequence in unquoted text. I think that this works best if you have an incremental parser for running text that can be fed input, parse it as much as possible, and then suspend itself to wait for more.)