2014-01-26
Why writing sysadmin tools in Go is getting attractive
I don't find Go as nice a language as Python, but it is not terrible; in the past I've called it 'Python with performance'. What makes it especially attractive for sysadmin tools, at least for me, is how it simplifies deployment. Python deployment has three problems: the actual Python versions on various different systems, the availability of additional modules on those systems (and their versions), and actually packaging up and shipping around a modular Python program (cf, and loading zipfiles is not quite a full solution). As a compiled language without any real use of shared libraries, Go makes all of these go away.
Getting away from modularity concerns is frankly a big issue. Python makes modular programs just awkward enough that it pushes me away from them for small things, even if the structure of the problem would make modularity sensible. Because it's a compiled language, Go obviates all of these issues; regardless of how many pieces I split the source up into, Go will compile it down to a single self-contained binary that is all I have to ship around and run. Closely related to modularity concerns is use of third party modules and packages. Again Go needs these only at compile time and they can be in my own build area; I don't have to worry about what's installed on target systems or available to be installed through their local package manager. If it makes my program better to use some Go package, I can.
I also don't have to worry about Python versions any more or in fact the development environment in general because under most circumstances Go will cross-compile from my preferred environment. Deployment targets can be as bare-bones as they come and it doesn't matter to me because I can sit in a full environment with a current Go version, git, a fully set up Emacs, and so on and so forth. I do need to do some things on the deploy targets when testing, debugging, or tuning performance but nowhere near as much as I otherwise would.
As I sort of discussed in another context, all of these issues more or less go away if you're working on heavyweight things. Heavyweight things already have complex deployments in general and they often want to drag along their own custom environments even (or especially) in interpreted languages like Python; you may well run a big important Python app in its own virtualenv with its own build of Python and whatever collection of third party modules you want. But this is not the world that sysadmin tools generally operate in. Sysadmin tools are almost always small things that don't justify a lot of heavyweight work (and don't have complex deployment needs).
2014-01-20
A thought about the popularity of server-side JavaScript
One of the things going on these days is that an increasing amount of server-side web related programming is being done in JavaScript, specifically in node.js. Various sorts of people have various reactions to this, some of them negative. I've got some half-formed thoughts about this from an outsider's perspective, but for today I want to stick to one observation that stands out:
Node.js JavaScript is likely the fastest dynamically typed language that you can use on Unix today.
If you want dynamically typed plus speed, your options are not great on Unix right now. Straight Ruby, Python, and Perl do not really cut it. You could try JRuby or Jython to see if the JVM speed for Java has rubbed off on them, but that requires diving into the complexity swamp of the JVM and those implementations are somewhat second class citizens of those languages. Otherwise, for real speed you are looking at a statically typed compiled language: Java, C/C++, maybe Go.
(And the environment you get with node.js is attractive beyond simply speed. Node.js gets you a dynamically typed language on Unix that uses a world-class JIT engine for speed, is quite popular and thus well supported, and that has what I understand is an excellent ecology of packages for doing various things. It doesn't require compilation and has a repl for on the fly exploration. It just requires you to write JavaScript.)
PS: If you look really hard there is probably another dynamically typed language that runs on Unix and does so as fast as node.js (I suspect that there are some Lisp implementations that are in the same ballpark). But I don't think there will be anything that's anywhere near as well known or popular as node.js. And frankly that makes node.js a much safer choice.
2014-01-19
Some thoughts on structured logging, especially in and for databases
In a comment on my entry on having an audit log, dozzie recommended structured logging. Roughly speaking, instead of just logging formatted text message you log all of the various pieces of the same information in a way that's explicitly labeled, encoded, and serialized somehow. These days, for example, you might encode things as JSON.
In part this is a great ongoing debate between two sides of logging. The advantage of pure textual logs (whether split into database columns or smashed together in one line) is that they are easy to create and (much) easier to read without additional tools. The advantage of structured logs is that they are easier to analyse reliably, especially over time (ie, over the same time when a text message might change format repeatedly). They have the drawback that they need tools to make them comprehensible; the more information you load into structured log messages, the more you need one.
(Encoding to JSON and then pretty-printing it doesn't work once you're putting enough information into log messages that the JSON dump of a single one is twenty or thirty lines. Take it from a sysadmin; I'm just not going to be able to do very much by staring at that much data.)
What this does is it makes textual logging the simple, low effort way. Yes, it might be best to have comprehensive structured logging and a suite of tools and code to work with those logs, but in real life that may not be the choice you get because of the perfection trap. Text logging is better than no logging.
Databases add complications to this in two ways, one for using database for log storage and one for recording DB-related things. If you're recording your audit logs in a database (as I am), you have to worry about the audit table schema. One big issue is what information is common enough to be broken out into separate structured fields versus slammed into what are basically blob fields (for text or serialized structured data). Separate fields allow in-database searching and filtering (at least in ordinary DB engines that don't, eg, look inside JSON blobs for you) but they exact a cost in flexibility and possibly schema complexity. At the extreme you probably don't want your audit table schema to have fifty fields, most of them unused in most records, just so that you can copy specific fields from other DB tables as you make audit entries.
The other side is actually recording database entries. At one level this looks simple in structured logging; if you're logging changes to a database, you can just JSON-encode the whole DB record you're affecting and then record it (perhaps in before and after versions). But this is not sufficient because of foreign key relationships. Your straightforward JSON encoding may capture, say, 'sponsor 385', but the sponsor column is a foreign key and when you're looking at the logs (much) later you don't necessarily know who sponsor 385 was. At the best you would have to painstakingly replay the audit logs to reconstruct the state of the database at the time so you can figure out who it was. At the worst case the information is simply unrecoverable. So for real logging you need to peer through things like foreign key relations to capture the important extra data in your log records. In some schemas you may need to peer through multiple levels of relationships.
Text based logging makes this explicit because you see the message right in front of you. If you write a log message that says in part 'sponsor 385', everyone who reads the message is going to immediately ask you who that is. You get your nose rubbed in to the issue and you have to explicitly consider how to make the information you're logging comprehensible by itself.
2014-01-18
Your web application should have an audit log
One of the smartest things I did when I was writing our web based account request management system was giving it an audit log. Pretty much every time the database gets changed, the web app writes an audit record about it that captures all of the high level details (which user or what automated process, from what IP if applicable, doing what, and so on). There have been two advantages of having this audit log.
The first, obviously, is that it tells you what happened (and why). There have been at least three general situations where this is useful. The obvious one is when you're trying to determine what happened, ie who did what when, so that you can tell people about it. The less obvious one (for me) is checking to make sure that certain things actually did happen, generally things done by automated processes. Finally, the audit log is a great place to get an overview of what's been happening on the system since it's a single spot that sees all activity.
(There are other uses of audit logs, for example generating certain sorts of usage information based on people's logged activities.)
The more subtle advantage of having an audit log in this application is
that it's simply reassuring, even if we never actually need it. If
we come in some day and the entire thing is a complete mess, I know
that we have a reasonable chance of sorting out what happened and maybe
why. If a professor has questions or concerns about something, we can
see what happened and at least take reasonable guesses about why. We are
not going to be left looking at web server logs of POST requests from
hither and yon and trying to figure out what might have happened.
In order to make all of this work, the audit log needs to capture not just the database-level changes that were made but also the surrounding context. In my case this is what authenticated user or automated process did the action, what IP address they came from, what part of the application they're using (you can often do a particular low-level DB operation in more than once place), and so on. The important thing is that you be able to reconstruct not just what happened but who was doing what at the time.
(I don't actually log the full DB-level changes, just what I consider to be the important information from them. Possibly I should and maybe next time I will; there has been a time or two when I wanted somewhat more information than the audit log provided. On the other hand it's a lot easier to not have to figure out how to encode or embed everything into the audit log.)
PS: There are much more elaborate and complete ways to audit database changes and capture information. My perception is that they generally take more work than simply writing audit log records every time you change things. In our web app the audit log is just another DB table that is basically plain text and the records are generated by hand.
PPS: Don't put foreign key fields into your audit records. Really. Learn from my mistakes.