2013-05-27
Understanding SQL placeholders
As I got wrong once before, the correct way to secure your SQL against SQL injection attacks is not to try to quote all your arguments (even if you do it in a library) but to use SQL placeholders instead. This raises the obvious question of what are SQL placeholders and how do they differ from correctly quoting your SQL arguments.
The simple answer is that SQL placeholders are transmitted
to the SQL server separately from your SQL statements. To
make a terrible analogy, SQL placeholders are much like using
exec*()
and separate arguments to run Unix programs securely. In both cases the underlying
protocol itself insures that arguments can never be confused
with the SQL command or the program to be run.
(Now, a caution: in order for this to work the client libraries, the SQL server, and the communication protocol between the two all have to support SQL placeholders. I believe that everything you want to use as a SQL database these days properly supports placeholders, but I haven't exhaustively checked.)
SQL placeholders are also part of prepared SQL statements. Prepared statements are basically pre-parsed, pre-optimized SQL statements; you send the generic version of the statement to the SQL server, it sets it up once, and then you can repeatedly run it with different specific placeholder values filled in. This avoids the overhead of both the parsing (which I suspect is generally small) and the query optimization (which might be significant for relatively complex queries), but of course it may only pay off if you repeat the same prepared query enough times.
(At the same time any extra overhead may be insignificant for a small application that isn't stressing database performance very much in the first place.)
Single use prepared statements can apparently be slower than non-prepared ones under some circumstances; I suspect that this because you have to make two round trips to the SQL server instead of one before getting your answer. I don't know if any common SQL server can do SQL placeholders outside of prepared statements. Since you really want to use SQL placeholders for security under most circumstances, there isn't really very much you can do about this.
(This is partly the kind of entry I write to get things straight in my own head.)
Sidebar: client library support is tricky
Many languages and environments these days have common, database independent 'database access' APIs (Python has its DB-API, for example). These APIs are usually defined with placeholders for all of the obvious reasons. However this doesn't guarantee that a particular implementation of the DB API for a particular database actually implements SQL placeholders (even if the underlying protocol and database support them). Hopefully the documentation will tell you if the implementation actually supports SQL placeholders or just seems to and is faking them with quoting behind the scenes.
(If not, your SQL database may report statistics that will let you determine this.)
You might ask yourself why on earth anyone would implement a non-placeholder client library if all of the other pieces support SQL placeholders and prepared statements and so on. My impression is that people do it because the non-placeholder communication protocol is easier to implement than supporting prepared statements and placeholders. If they're trying to get some interface, any interface off the ground, it's easier to send straight SQL and hope they get all of the quoting correct.
(Some web searches suggest that there are client libraries like this for young platforms.)
Our situation with ZFS and 4 Kb physical sector disks
While I wrote up the general state of affairs with ZFS and 'advanced format' disks I've never described how this affects us in specific. The short version is that we are not in as much trouble as we might otherwise be because we're running ancient and somewhat under-functional software. You are in the maximal amount of trouble if your version of ZFS will refuse to add 4K sector disks to old pools and you have no way to lie to ZFS (or the kernel in general) about what the physical sector size of your disks is. Our situation is mostly the reverse of this.
First, our version of Solaris (Solaris 10 update 8 plus some patches)
turns out to be so old that it doesn't even know about physical sector
size as distinct from logical sector size. This is good in that it will
not even notice that it's mixing dissimilar disks but bad in that we now
have no way of creating new pools or vdevs with ashift=12
. Second,
our ISCSI target software
doesn't export information about the physical sector size of disks that
it's making visible so even if our version of Solaris was aware of 4K
disks, it wouldn't see any.
The upshot of this is that we can freely add 4K disks to our existing
pools. The performance impact of this is not currently clear to me,
partly because
our environment is somewhat peculiar in ways that
make me think we'll experience less impact than normal people in this
situation. The bad news is that my initial testing on streaming IO shows
a visible difference in write performance, although not a huge one (I
need to put together a good random write test before I can have opinions
on that).
In the short term, we'll survive if we have to replace 512b disks with
4K disks; some things may run slower but so far it doesn't look like
they will be catastrophically slow. In the long term we need to replace
the entire fileserver infrastructure and migrate all of the data to new
pools created with ashift=12
. We'd like to do it before we have to buy
too many 4K disks as replacement disks for existing pools.
(We always knew we had to replace the existing hardware and update the software someday, but it used to be less urgent and we expected that we could keep the pools intact and thus do the upgrade with minimal user impact. Our current vague timeline is now to do this sometime in 2014, depending on when we can get money for hardware and so on.)
PS: ZFS continues to be our best option for a replacement fileserver infrastructure, although we don't know what OS it'll be running on. Linux btrfs is the only other possible competitor and it's nowhere near ready yet. Our budget is unlikely to allow us to purchase any canned appliance-like solution.