Wandering Thoughts archives

2023-03-18

There are two facets to dd usage

Recently I shared a modern Unix superstition on the Fediverse:

Is it superstition that I do 'dd if=... bs=<whatever> | cat >/dev/null' instead of just having 'of=/dev/null' because I'm cautious about some version of dd optimizing that command a little too much? Probably.

There are various things you could say about this, but thinking about it has made me realize that in practice, there are two facets to dd, what you could call two usage cases, and they're somewhat in conflict with each other.

The first facet is dd as a way to copy data around. If you view dd this way, it's fine if some combination of dd, your C library, and the kernel optimize how this data copying is done. For example, if dd is reading or writing a file to or from a network socket, in many cases it would be desirable to directly connect the file and the network socket inside the kernel so that you don't have to flow data through user level. If you're using dd to copy data, you generally don't care exactly how it happens, you just want the result.

(Dd traditionally has some odd behavior around block sizes, but many people using dd to copy data don't actually want this behavior or care about it.)

The second facet is dd as a way to cause specific IO to happen. If you view dd this way, it is absolutely not safe for the collective stack to optimize how the data is copied. You want dd to do exactly the IO that you asked for, and not change that. If you read from a file and write to /dev/null you don't want dd to connect the file and /dev/null in the kernel and then the kernel to optimize this to do no IO. Reading the file (or the disk) was the entire point.

My impression is that historically, dd originated in the first usage case; it was created around the time of V5 Unix (cf, also) in order to "convert and copy a file" in the words of the V6 dd manual page. System administrators later pressed it into use for the second facet, because it allowed for relatively precise control and it seemed like a safe command that was unlikely to choke on odd sources of input or output or do anything unpredictable with the data it read and wrote.

You can criticize this, but Unix didn't and still doesn't have a standard tool that's explicitly about performing certain IOs. Maybe it should have one, since dd can be awkward to use for highly-specific IO. Also, at the time that system administrators started assuming that dd would perform their IO as 'written', I don't think anyone expected the degree of cleverness that modern Unix utilities and kernels exhibit (cf this note about GNU coreutils cat and GNU grep apparently optimizing the case of its output being /dev/null for a long time).

unix/DdTwoFacets written at 22:01:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.