2014-02-28
Arguments for explicit block delimiters in programming languages
Suppose that you have a language where indentation levels are significant; either they explicitly delimit blocks (as in Python) or mismatched indentation is an error. I maintain that this is a good thing in a language in general because it makes human semantics match computer semantics. One of the big questions about such a language is whether it should have explicit block delimiters or rely purely on indentation levels.
A lot of people will likely argue for the Python approach of implicit delimiters, ie that blocks are created and exited by indentation alone. Although I like Python, I actually tilt towards at least allowing and perhaps requiring explicit delimiters. I see two advantages to explicit delimiters.
First, explicit delimiters are in fact explicit. They provide
confirmation that you did indeed intend to create a new block or
to exit one and that you didn't just accidentally get the indentation
wrong on a line. This is going to be especially important in a
language where you can create new block levels at any time (not
merely after a language construct such as if or while) and such
blocks can have semantic meaning such as containing block-local
variables.
(Note that Python is not such a language. Unlike C, where you can create
a block anywhere just by dropping in { and }, Python blocks can only
be started after specific statements.)
Without explicit delimiters there are a number of ways for block errors to creep in, even in languages like Python. I feel that the explicitness of block delimiters is definitely suited for languages that are designed with an eye towards software engineering pragmatics in the line of Go's design; in such languages you'd rather make the programmers do a bit of extra work than have an error slip by.
Second, explicit delimiters enable you have to have blocks that are smaller than a single line and to have several such blocks on one line. Consider:
if (acheck()) { b = 10; fred(); } else { b = 30; barney(); }
You can't do this without explicit block delimiters. Of course, some people will probably say that you shouldn't do this even with block delimiters; if you agree, this is not a very compelling point and may in fact be an argument against explicit block delimiters.
I personally like such compact blocks, in part because of issues like anonymous functions (or real lambdas) where you may well want to put several short statements on a single line and in fact explicitly package them up as a block for the lambda.
(I believe that lacking a good syntax for such inline code blocks is one
reason that Python has such a limited lambda operator.)
2014-02-23
The problem with indentation in programming languages
You may have heard that Apple has just released a really important security update to fix SSL aka TLS certificate verification. The best description of the bug I've seen comes from Adam Langley in Apple's SSL/TLS bug. The core source for the bug is, extracted from the function:
...
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
...
This points to the core problem with indentation. As we should all know, computer languages must communicate with two separate audiences; they're meaningful to both computers (who must execute them) and to programmers (who must read and modify them). The issue here is simple:
Indentation is semantically meaningful to people but is semantically meaningless to (most) compilers et al.
When I say that indentation is semantically meaningful to people I mean that people read things into indentation. When we're reading code we generally assume that the indentation matches the program block structure and thus that we can infer block structure from indentation level. This is a shorthand and like all shorthands it can be both clearly wrong (and thus discarded, slowing us down) or overridden if we read closely. But I think that history has shown that people can and will miss mismatches between indentation and actual program code flow and will miss bugs as a result.
(See for example Dave Jones' replies in this twitter conversation.)
I can think of two reasonably workable solutions to these; call them
the Python and the Go solutions. The Python
solution is to make indentation semantically meaningful even in the
language, thereby making the programmer view and the computer view
of the code match up. The problem is that this seems to be widely
unpopular with programmers (for no good reason). The Go solution is to
create a canonical formatting style for the language and then strongly
encourage everyone to routinely convert their code to it. If you run
code like this through gofmt it will correct the mis-indentation and
thus give you a higher chance of spotting the problem. Make it a rule
that all code must be canonicalized through gofmt before a change is
committed and there you go.
(I don't think that 'produce warnings about mismatching indentation' is a winning solution, although I might be wrong. At the least I don't think any of the major C compilers have added such a warning. My gut's view is that warnings are going to be less effective than other measures.)
My personal opinion is that some variant of the Python solution is
the right one. I don't object if you want to still have explicit
block delimiters for various reasons, but I think it should be a
fatal error if the indentation does not match the block structure.
Add a tool like gofmt if you want to allow people to write sloppy
code and then have it fixed automatically before they feed it to
the real language environment.
PS: Yes, mismatching indentation is not the only problem with Apple's
code here; two gotos in a row ought to look at least a little bit
odd regardless of their indentation (and even if they were in the
same block context, eg if they were both inside an 'if (..) { ... }').
2014-02-19
The reasoning problem with describing things with a programming language
Every so often systems need to have things described to them. Configuration files are one obvious example but there are plenty of others, such as what to do to build software, what to do when packages are installed or removed, and what to do when you're starting and stopping services. When you're creating such a system it's often tempting to make the description take the form of a program written in some programming language, partly because this is often the easy way to create domain specific languages and partly because often it's easy to think of what you are doing as less describing things and more making things happen.
Unfortunately there is a problem with this. The more you describe things through a programming language (and the more general the programming language is), the less your system can determine things about the description from the outside. Actually, let's call the description what it is: a script. Often you can't determine very much about what the script does without in some way executing the script itself. If you're unlucky, you can't determine much without the script actually running for real and doing whatever it does. And even if you have a good 'dry run' mode you generally can't capture why the script did what it did except at a very low level of 'the script made this series of conditional decisions before it acted'.
What this leaves you with is a relatively black box and the problem with black boxes is that there's not much you can do to reason about them. For example, let's take package management. Suppose you have a bunch of similar packages to install or update and they have a bunch of postinstall scripts between them. Are there common operations across these postinstall scripts that you can perform only once, such as an expensive regeneration of a system-wide MIME database? You don't know. You don't even necessarily know this if the postinstall scripts are all identical (depending in part on package manager semantics).
While there are cases where you need the full arbitrary power of a programming language, you should try very hard to avoid resorting to a programming language for everything. White-box as much of your descriptions as possible and reserve the general programming language as a last ditch escape hatch. If people are using the programming language very much it's a sign that your descriptions lack enough power and you need to do something about that.
(There is also a usability problem with using programming languages for configurations and descriptions.)
Sidebar: gradiations of describing things
The best way of describing things is declaratively: you directly declare X, Y, and Z.
The second best way is a procedural language that winds up making declarative statements. You run a chunk of code and it ends up declaring X, Y, and Z. Configuration files written in programming languages often wind up doing this (where the 'declaring' may be in the form of, say, setting specific variables). Among other issues, this suffers from the problem that you know the end declarations but you don't know very much about why they wound up that way.
The worst way is a procedural language that takes actions; the code just does X, Y, and Z. Here you can only discover what is being described by running the code and watching what it does (if you even bother watching, as opposed to just standing back and letting it act).
2014-02-03
The big attraction of SQLite
Recently I came an obvious in retrospect realization about what is to me the big and game changing feature of SQLite. I'll put it this way:
SQLite gives you a powerful relational database that doesn't require you to run a server.
Running a separate database server is all sorts of hassle. You have to configure it, set up access to it, arrange for it to be started when the system boots up and then kept running, monitor it to make sure that it's still running, and so on and so forth. Often this will need either the involvement of your system administrator or even more hacking on scripts and configurations to run your own copy of the entire database server infrastructure. Then multiply this work for development and test database servers (because you're not developing or testing against the production database, are you), and so on and so forth.
(One of the many effects that this has is that it's simply not worth using a SQL relational database for small projects even when that might be the easiest approach if you had one handy. It's just too much bureaucracy.)
SQLite throws all of that hassle out of the window. There is no server and basically no configuration. You 'configure' SQLite by telling it the filename to store things in and access control is done by ordinary Unix permissions. Need a development or test database? Change the filename of the database in your configuration and you're done. SQLite can even run entirely in memory for the ultimate disposable quick database.
(And if you're developing as a different Unix user than the production application runs on, permissions will probably insure that you can't accidentally touch the production database even if you screw up. Not to mention the ease of setting up a copy of the production database; most of the time you can just copy the SQLite database file.)
At the macro level what this has done is make it practical to give even small low-hassle things access to a powerful relational database (and as a bonus it speaks SQL and you can do ad-hoc queries against it). If you think you need relational features there is now basically no reason not to use them. And it will probably be relatively easy to scale your program up to using a full sized server based SQL database if you turn out to need one of the big guns after all.
Or the real summary version: SQLite has made SQL databases into a lightweight thing.
(Well, of course. It's right there in the name. I'm slow so things take some time to sink in.)
I've been conditioned to think of SQL databases as very heavyweight things, but SQLite has changed this. With this in mind I think I'm going to be much more willing to assume and use a relational database in future projects, rather than try to glue together some sort of ad-hoc equivalent based on eg plain files. I'm not entirely convinced that I'll carry through on this because I like plain text files, but there's an inflection point where the filesystem makes an increasingly bad database engine (also).
PS: Yes of course if you need a fully featured SQL engine, SQLite is not really the thing for you. The same is true if you need something that stands up to high concurrency and high load and so on. I tend to think of SQLite as a relational database engine that happens to speak basic SQL for convenience.