There are two facets to dd
usage
Recently I shared a modern Unix superstition on the Fediverse:
Is it superstition that I do 'dd if=... bs=<whatever> | cat >/dev/null' instead of just having 'of=/dev/null' because I'm cautious about some version of dd optimizing that command a little too much? Probably.
There are various things you could say about this, but thinking about
it has made me realize that in practice, there are two facets to dd
,
what you could call two usage cases, and they're somewhat in conflict
with each other.
The first facet is dd as a way to copy data around. If you view dd this way, it's fine if some combination of dd, your C library, and the kernel optimize how this data copying is done. For example, if dd is reading or writing a file to or from a network socket, in many cases it would be desirable to directly connect the file and the network socket inside the kernel so that you don't have to flow data through user level. If you're using dd to copy data, you generally don't care exactly how it happens, you just want the result.
(Dd traditionally has some odd behavior around block sizes, but many people using dd to copy data don't actually want this behavior or care about it.)
The second facet is dd as a way to cause specific IO to happen. If you view dd this way, it is absolutely not safe for the collective stack to optimize how the data is copied. You want dd to do exactly the IO that you asked for, and not change that. If you read from a file and write to /dev/null you don't want dd to connect the file and /dev/null in the kernel and then the kernel to optimize this to do no IO. Reading the file (or the disk) was the entire point.
My impression is that historically, dd originated in the first usage case; it was created around the time of V5 Unix (cf, also) in order to "convert and copy a file" in the words of the V6 dd manual page. System administrators later pressed it into use for the second facet, because it allowed for relatively precise control and it seemed like a safe command that was unlikely to choke on odd sources of input or output or do anything unpredictable with the data it read and wrote.
You can criticize this, but Unix didn't and still doesn't have a
standard tool that's explicitly about performing certain IOs. Maybe
it should have one, since dd can be awkward to use for highly-specific
IO. Also, at the time that system administrators
started assuming that dd would perform their IO as 'written', I
don't think anyone expected the degree of cleverness that modern
Unix utilities and kernels exhibit (cf this note about GNU coreutils
cat
and GNU grep apparently optimizing the case of its output being
/dev/null for a long time).
|
|