seek straight once and for all
Earlier today I wanted to lightly damage a disk in a test ZFS pool in order to make sure that some of our status monitoring code was working right when ZFS was recovering from checksum failures. The reason I wanted to do light damage is that under normal circumstances, if you do too much damage to a disk, ZFS declares the disk bad and ejects it from your pool entirely; I didn't want this to happen.
So I did something like this:
for i in $(seq 128 256 10240); do dd if=/dev/urandom of=<disk> bs=128k count=4 skip=$i done
The intent was to poke 512 KB of random data into the disk at a number of different places, with the goal of both hopefully overwriting space that was actually in use and not overwriting too much of it. This turned out to actually not do very much and I spent some time scratching my head before the penny dropped.
skip before and honestly, I wasn't thinking clearly here.
What I actually wanted to use was
seek. The difference is this:
skipskips over initial data in the input, while
seekskips over initial data in the output.
(Technically I think
skip usually silently consumes the initial input
data you asked it to skip over, although
dd may try to
inputs that seem to support it.
seek definitely must
dd will error out if you ask it to
seek on something that doesn't
lseek(), like a pipe.)
What I was really doing with my
dd command was throwing away
increasing amounts of data from
/dev/urandom and then repeatedly
writing 512 KB (of random data) over the start of the disk. This was
nowhere near what I intended and certainly didn't have the effects
on ZFS that I wanted.
I guess the way for me to remember this is 'skip initial data from the input, seek over space in the output'. Hopefully it will stick after this experience in toe stubbing.
Sidebar: the other thing I initially did wrong
The test pool was full of test files, which I had created by copying
/dev/zero into files. My initial
dd was also using
to overwrite disk blocks. It struck me that I was likely to be
mostly overwriting file data blocks full of zeroes with more zeroes,
which probably wasn't going to cause checksum failures.
Why languages like 'declare before use' for variables and functions
I've been reading my way through Lisp as the Maxwell's equations of software and ran into this 'problems for the author' note:
As a general point about programming language design it seems like it would often be helpful to be able to define procedures in terms of other procedures which have not yet been defined. Which languages make this possible, and which do not? What advantages does it bring for a programming language to be able to do this? Are there any disadvantages?
(I'm going to take 'defined' here as actually meaning 'declared'.)
To people with certain backgrounds (myself included), this question has a fairly straightforward set of answers. So here's my version of why many languages require you to declare things before you use them. We'll come at it from the other side, by asking what your language can't do if it allows you to use things before declaring them.
(As a digression, we're going to assume that we have what I'll call an unambiguous language, one where you don't need to know what things are declared as in order to know what a bit of code actually means. Not all languages are unambiguous; for example C is not (also). If you have an ambiguous language, it absolutely requires 'declare before use' because you can't understand things otherwise.)
To start off, you lose the ability to report a bunch of errors at the time you're looking at a piece of code. Consider:
lvar = .... res = thang(a, b, lver, 0)
In basically all languages, we can't report the
typo (we have to assume that
lver is an unknown global variable),
we don't know if
thang is being called with the right number of
arguments, and we don't even know if
thang is a function instead
of, say, a global variable. Or if it even exists; maybe it's a typo
thing. We can only find these things out when all valid
identifiers must have been declared; in fully dynamic languages
like Lisp and Python, that's 'at the moment where we reach this
line of code during execution'. In other languages we might be able
to emit error messages only at the end of compiling the source file,
or even when we try to build the final program and find missing or
In languages with typed variables and arguments, we don't know if
the arguments to
thang() are the right types and if
returns a type that is compatible with
res. Again we'll only be
able to tell when we have all identifiers available. If we want to
do this checking before runtime, the compiler (or linker) will have
to keep track of the information involved for all of these pending
checks so that it can check things and report errors once
Some typed languages have features for what is called 'implicit
typing', where you don't have to explicitly declare the types of
some things if the language can deduce them from context. We've
been assuming that
res is pre-declared as some type, but in an
implicit typing language you could write something like:
res := thang(a, b, lver, 0) res = res + 20
At this point, if
thang() is undeclared, the type of
res is also
unknown. This will ripple through to any code that uses
example the following line here; is that line valid, or is
a complex structure that can in no way have 10 added to it? We can't
tell until later, perhaps much later.
In a language with typed variables and implicit conversions between
some types, we don't know what type conversions we might need in
either the call (to convert some of the arguments) or the return
thang()'s result into
res's type). Note that in
particular we may not know what type the constant
0 is. Even
languages without implicit type conversions often treat constants
as being implicitly converted into whatever concrete numeric type
they need to be in any particular context. In other words,
last argument might be a float, a double, a 64-bit unsigned integer,
a 32-bit signed integer, or whatever, and the language will convert
0 to it. But it can only know what conversion to do once
thang() is declared and the types of its arguments are known.
This means that a language with any implicit conversions at all
(even for constants like
0) can't actually generate machine code
for this section until
thang() is declared even under the best
However, life is usually much worse for code generation than this.
For a start, most modern architectures pass and return floating
point values in different ways than integer values, and they may
pass and return more complex values in a third way. Since we don't
know what type
thang() returns (and we may not know what types
the arguments are either, cf
lver), we basically can't generate
any concrete machine code for this function call at the time we
parse it even without implicit conversions. The best we can do is
generate something extremely abstract with lots of blanks to be
filled in later and then sit on it until we know more about
lver, and so on.
(And implicit typing for
res will probably force a ripple effect
of abstraction on code generation for the rest of the function, if
it doesn't prevent it entirely.)
This 'extremely abstract' code generation is in fact what things like Python bytecode are. Unless the bytecode generator can prove certain things about the source code it's processing, what you get is quite generic and thus slow (because it must defer a lot of these decisions to runtime, along with checks like 'do we have the right number of arguments').
So far we've been talking about
thang() as a simple function call.
But there are a bunch of more complicated cases, like:
res = obj.method(a, b, lver, 0) res2 = obj1 + obj2
Here we have method calls and operator overloading. If
obj2 are undeclared or untyped at this point, we don't
know if these operations are valid (the actual
obj might not have
method() method) or what concrete code to generate. We need to
generate either abstract code with blanks to be filled in later or
code that will do all of the work at runtime via some sort of
introspection (or both, cf Python bytecode).
All of this prepares us to answer the question about what sort of languages require 'declare before use': languages that want to do good error reporting or (immediately) compile to machine code or both without large amounts of heartburn. As a pragmatic matter, most statically typed languages require declare before use because it's simpler; such languages either want to generate high quality machine code or at least have up-front assurances about type correctness, so they basically fall into one or both of those categories.
(You can technically have a statically typed language with up-front
assurances about type correctness but without declare before use;
the compiler just has to do a lot more work and it may well wind
up emitting a pile of errors at the end of compilation when it can
say for sure that
lver isn't defined and you're calling
with the wrong number and type of arguments and so on. In practice
language designers basically don't do that to compiler writers.)
Conversely, dynamic languages without static typing generally don't
require declare before use. Often the language is so dynamic that
there is no point. Carefully checking the call to
thang() at the
time we encounter it in the source code is not entirely useful if
thang function can be completely redefined (or deleted) by
the time that code gets run, which is the case in languages like
Lisp and Python.
(In fact, given that
thang can be redefined by the time the code
is executed we can't even really error out if the arguments are
wrong at the time when we first see the code. Such a thing would
be perfectly legal Python, for example, although you really shouldn't