2013-05-31
Understanding the MongoDB code that people are laughing at
Recently my corner of the Twitter-verse has been moderately aflame
with a number of people pointing and laughing at a chunk of MongoDB
source code; this example is
typical. At first I automatically accepted that the code was laughable
(and you have to admit that seeing a call to random() in database
client code is odd) but when the link kept cropping up I got curious
enough to start reading the surrounding code. Unfortunately the
more I read the more sane the code looked. Since the
details don't exactly fit in a tweet, it's time to write a more
substantial explanation of the code and why it's sane.
Let's start with the code that people are directly laughing about (or so I assume since this is the line that people have been linking to):
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
In the interests of cleverness, someone has made this if condition
much too hard to read. Rewritten to be clear, it is '!_ok &&
Math.random() <= 0.1'. If _ok is false, this if and its
immediate return will fire about 10% of the time. If _ok is true,
it will never fire.
Now let's look at the surrounding code, or at least bits of it. The context goes like this:
abstract class ConnectionStatus {
....
static abstract class UpdatableNode {
....
public CommandResult update() {
CommandResult res = null;
try {
... try to talk to a server ...
if (!_ok) {
getLogger().log(Level.INFO, "Server seen up: " + _addr);
}
_ok = true
...
} catch(Exception e) {
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
... log a message about the
the server being down ...
_ok = false;
}
return res;
}
...
Let me translate this: the effect of the if with the Math.random()
is that if there's an error and the server has already been marked
as down, about 10% of the time the code doesn't bother to log a
message that it's still down (and doesn't record the actual exception
caught as part of that log message). If the server is marked up and
there's an error, a message is always logged. In all cases res
is returned, possibly as null, and I believe no exception is
propagated; I assume that this is an intended part of the API.
(There may actually be a bug here; it may be intended that the message is skipped 90% of the time when the server is already marked as down, not 10%.)
This code does not strike me as laughable, crazy, or worthy of mockery. It may have some defects and you may disagree with what it's doing, but it sure seems to be sane code in general. You probably have worse things in some of your projects (I know that I do in mine). My personal view is that what the code is doing is perfectly sensible to avoid flooding logs, but since I don't know MongoDB I don't know how much log volume this would produce (or avoid) and how important that log information would be during problem diagnosis.
(Note that the code will always log 'server seen down' and 'server seen up' when it first notices the server down and when it first notices the server back up. The only question is how many 'server still down' messages it logs or should log between those two, in a range somewhere between 'none' and 'one for every attempt to talk to the server'.)
If there is a fault in this code it is not the use of Math.random()
in database client code, it is that the if condition has been written
in such an overly clever way. And if people want to point and laugh at
overly clever code, well, there's a lot of fish in a lot of barrels (our
own generally included).
Update: C J Silverio pointed me to the initial commit for this code. The initial commit is much simpler and makes it clear that the current 'log 90%' state is a bug.
2013-05-27
Understanding SQL placeholders
As I got wrong once before, the correct way to secure your SQL against SQL injection attacks is not to try to quote all your arguments (even if you do it in a library) but to use SQL placeholders instead. This raises the obvious question of what are SQL placeholders and how do they differ from correctly quoting your SQL arguments.
The simple answer is that SQL placeholders are transmitted
to the SQL server separately from your SQL statements. To
make a terrible analogy, SQL placeholders are much like using
exec*() and separate arguments to run Unix programs securely. In both cases the underlying
protocol itself insures that arguments can never be confused
with the SQL command or the program to be run.
(Now, a caution: in order for this to work the client libraries, the SQL server, and the communication protocol between the two all have to support SQL placeholders. I believe that everything you want to use as a SQL database these days properly supports placeholders, but I haven't exhaustively checked.)
SQL placeholders are also part of prepared SQL statements. Prepared statements are basically pre-parsed, pre-optimized SQL statements; you send the generic version of the statement to the SQL server, it sets it up once, and then you can repeatedly run it with different specific placeholder values filled in. This avoids the overhead of both the parsing (which I suspect is generally small) and the query optimization (which might be significant for relatively complex queries), but of course it may only pay off if you repeat the same prepared query enough times.
(At the same time any extra overhead may be insignificant for a small application that isn't stressing database performance very much in the first place.)
Single use prepared statements can apparently be slower than non-prepared ones under some circumstances; I suspect that this because you have to make two round trips to the SQL server instead of one before getting your answer. I don't know if any common SQL server can do SQL placeholders outside of prepared statements. Since you really want to use SQL placeholders for security under most circumstances, there isn't really very much you can do about this.
(This is partly the kind of entry I write to get things straight in my own head.)
Sidebar: client library support is tricky
Many languages and environments these days have common, database independent 'database access' APIs (Python has its DB-API, for example). These APIs are usually defined with placeholders for all of the obvious reasons. However this doesn't guarantee that a particular implementation of the DB API for a particular database actually implements SQL placeholders (even if the underlying protocol and database support them). Hopefully the documentation will tell you if the implementation actually supports SQL placeholders or just seems to and is faking them with quoting behind the scenes.
(If not, your SQL database may report statistics that will let you determine this.)
You might ask yourself why on earth anyone would implement a non-placeholder client library if all of the other pieces support SQL placeholders and prepared statements and so on. My impression is that people do it because the non-placeholder communication protocol is easier to implement than supporting prepared statements and placeholders. If they're trying to get some interface, any interface off the ground, it's easier to send straight SQL and hope they get all of the quoting correct.
(Some web searches suggest that there are client libraries like this for young platforms.)
2013-05-15
Why I've so far been neglecting functional programming languages
Functional programming languages are in many ways the latest hotness and so for years I've been making off and on runs at things like yet another explanation of monads (which I think I sort of understand by now) and similar topics. Despite this, so far I've been almost completely uninterested in actually trying to write a functional program or exploring a FP language.
The big problem for me is that as far as I can tell, the kind of programs I usually work with are exactly the kind of programs that functional programming is stereotypically a bad fit with. The stereotype I've absorbed is that functional programming is quite a good fit for computation but not a good fit for IO, because IO intrinsically has side effects. Unfortunately most of what I write is all about IO and has little or no computation. Bashing a squarish peg into a roundish hole is unlikely to tell me anything particularly meaningful about nice the language is to work in; what I really need is a roundish peg, a computational problem, and those are relatively scarce around here.
(It's possible that I'm not looking hard enough. For example, I do periodically want to do things like log analysis or event reassembly, where the original data could just as well be a predefined data structure in the program instead of processed from logfiles on disk. I suspect that a functional language would handle these fine, maybe better than ad-hoc hackery in awk, Python, or whatever. If I was really crazy I would try rewriting the logic in our ZFS spares handling system in an FP language to see if it got clearer; it's fundamentally a series of transformations of a tree and then some analysis of the result. The result might even be more testable.)
2013-05-13
My language irritations with Go (so far) and why I'm wrong about them
The great thing about an evolving language is that if you're slow enough about writing up your irritations with it, some of them can wind up fixed (or part fixed). So this list is somewhat shorter than it was when I originally wrote my first Go program, and none of the irritations are major. Also, I will reluctantly concede that Go has good engineering reasons for all of them.
My largest single irritation is that break acts on switch and
select; I expected it to act only on any enclosing control structure,
so that you could write something like:
for {
select {
case <-mchan:
// message silently swallowed
case <-schan:
break
}
Instead you have to invent a boolean loop condition. I understand why Go
does this; it enables you to exit early out of a switch or select
case instead of having to wrap everything in ever increasing levels of
nesting. This is likely especially important because Go uses explicit
error checking (which would otherwise force those nested if blocks).
The issue that got partially fixed is Go's return requirements. When I wrote the original version
of my program the natural form of one function was a big switch with
a number of specific cases and then a default: to catch the rest;
however, the original rules required a surplus return at the end of
the function, which irritated me by forcing me to move the default case
to the end of the function, obscuring the logic. The Go 1.1 changes make
my particular case okay but I believe there remain cases where you need
an unreachable ending return (or panic) to make the compiler happy.
You can make an argument that the original and current state of affairs
are good software engineering. If the compiler did true reachability
analysis it'd increase the number of cases where an innocent looking
change to some part of the code would suddenly make the return
coverage not be complete and thus produce potentially odd messages about
missing returns. The current brute force rules protect against this
and lead Go programmers to write in a certain sort of consistent style.
My final issue is my perennial one of being unable to cleanly cancel IO being done by goroutines, breaking them out of things so that they can see a death signal from outside. You can argue that this is a bug in the runtime, but the problem with this is that everything that calls an IO operation then needs to be aware of this particular error case (and catch it, and propagate it up the call stack in whatever way is appropriate). A good start to making it a bug in the runtime would be for the runtime to define a specific error for 'IO attempted on closed connection' and for absolutely everything to use it.
(As it stands, the net package doesn't even define a publicly
visible error instance for this case, although it does define one
internally. It's my personal view that this beautifully illustrates why
this is a general language problem; while you can 'solve' it in code,
it requires absolutely everyone to get it right and, well, they clearly
don't.)
Again this is a software engineering tradeoff. Both the semantics and the runtime implementation of goroutines are undoubtedly vastly simplified because you don't have to worry about being able to signal or cancel a goroutine from outside itself. Outside of the program exiting, all of the interaction that a goroutine has with the outside world are initiated by itself, on its own terms. This makes it much easier to reason about the effects of a goroutine, especially if it's careful not to use global state.