diff --git a/src/uu/dd/BENCHMARKING.md b/src/uu/dd/BENCHMARKING.md new file mode 100644 index 000000000..57a19bf5c --- /dev/null +++ b/src/uu/dd/BENCHMARKING.md @@ -0,0 +1,62 @@ +# Benchmarking dd + +`dd` is a utility used for copying and converting files. It is often used for +writing directly to devices, such as when writing an `.iso` file directly to a +drive. + +## Understanding dd + +At the core, `dd` has a simple loop of operation. It reads in `blocksize` bytes +from an input, optionally performs a conversion on the bytes, and then writes +`blocksize` bytes to an output file. + +In typical usage, the performance of `dd` is dominated by the speed at which it +can read or write to the filesystem. For those scenarios it is best to optimize +the blocksize for the performance of the devices being read/written to. Devices +typically have an optimal block size that they work best at, so for maximum +performance `dd` should be using a block size, or multiple of the block size, +that the underlying devices prefer. + +For benchmarking `dd` itself we will use fast special files provided by the +operating system that work out of RAM, `/dev/zero` and `/dev/null`. This reduces +the time taken reading/writing files to a minimum and maximises the percentage +time we spend in the `dd` tool itself, but care still needs to be taken to +understand where we are benchmarking the `dd` tool and where we are just +benchmarking memory performance. + +The main parameter to vary for a `dd` benchmark is the blocksize, but benchmarks +testing the conversions that are supported by `dd` could also be interesting. + +`dd` has a convenient `count` argument, that will copy `count` blocks of data +from the input to the output, which is useful for benchmarking. + +## Blocksize Benchmarks + +When measuring the impact of blocksize on the throughput, we want to avoid +testing the startup time of `dd`. `dd` itself will give a report on the +throughput speed once complete, but it's probably better to use an external +tool, such as `hyperfine` to measure the performance. + +Benchmarks should be sized so that they run for a handful of seconds at a +minimum to avoid measuring the startup time unnecessarily. The total time will +be roughly equivalent to the total bytes copied (`blocksize` x `count`). + +Some useful invocations for testing would be the following: + +``` +hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null" +hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null" +hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null" +``` + +Choosing what to benchmark depends greatly on what you want to measure. +Typically you would choose a small blocksize for measuring the performance of +`dd`, as that would maximize the overhead introduced by the `dd` tool. `dd` +typically does some set amount of work per block which only depends on the size +of the block if conversions are used. + +As an example, https://github.com/uutils/coreutils/pull/3600 made a change to +reuse the same buffer between block copies, avoiding the need to reallocate a +new block of memory for each copy. The impact of that change mostly had an +impact on large block size copies because those are the circumstances where the +memory performance dominated the total performance.