mirror of
https://github.com/RGBCube/uutils-coreutils
synced 2025-08-01 05:27:45 +00:00
Merge pull request #3600 from patricksjackson/dd-reuse-buffer
dd: reuse buffer for the most common cases
This commit is contained in:
commit
e62ff8cd0b
2 changed files with 82 additions and 10 deletions
62
src/uu/dd/BENCHMARKING.md
Normal file
62
src/uu/dd/BENCHMARKING.md
Normal file
|
@ -0,0 +1,62 @@
|
||||||
|
# Benchmarking dd
|
||||||
|
|
||||||
|
`dd` is a utility used for copying and converting files. It is often used for
|
||||||
|
writing directly to devices, such as when writing an `.iso` file directly to a
|
||||||
|
drive.
|
||||||
|
|
||||||
|
## Understanding dd
|
||||||
|
|
||||||
|
At the core, `dd` has a simple loop of operation. It reads in `blocksize` bytes
|
||||||
|
from an input, optionally performs a conversion on the bytes, and then writes
|
||||||
|
`blocksize` bytes to an output file.
|
||||||
|
|
||||||
|
In typical usage, the performance of `dd` is dominated by the speed at which it
|
||||||
|
can read or write to the filesystem. For those scenarios it is best to optimize
|
||||||
|
the blocksize for the performance of the devices being read/written to. Devices
|
||||||
|
typically have an optimal block size that they work best at, so for maximum
|
||||||
|
performance `dd` should be using a block size, or multiple of the block size,
|
||||||
|
that the underlying devices prefer.
|
||||||
|
|
||||||
|
For benchmarking `dd` itself we will use fast special files provided by the
|
||||||
|
operating system that work out of RAM, `/dev/zero` and `/dev/null`. This reduces
|
||||||
|
the time taken reading/writing files to a minimum and maximises the percentage
|
||||||
|
time we spend in the `dd` tool itself, but care still needs to be taken to
|
||||||
|
understand where we are benchmarking the `dd` tool and where we are just
|
||||||
|
benchmarking memory performance.
|
||||||
|
|
||||||
|
The main parameter to vary for a `dd` benchmark is the blocksize, but benchmarks
|
||||||
|
testing the conversions that are supported by `dd` could also be interesting.
|
||||||
|
|
||||||
|
`dd` has a convenient `count` argument, that will copy `count` blocks of data
|
||||||
|
from the input to the output, which is useful for benchmarking.
|
||||||
|
|
||||||
|
## Blocksize Benchmarks
|
||||||
|
|
||||||
|
When measuring the impact of blocksize on the throughput, we want to avoid
|
||||||
|
testing the startup time of `dd`. `dd` itself will give a report on the
|
||||||
|
throughput speed once complete, but it's probably better to use an external
|
||||||
|
tool, such as `hyperfine` to measure the performance.
|
||||||
|
|
||||||
|
Benchmarks should be sized so that they run for a handful of seconds at a
|
||||||
|
minimum to avoid measuring the startup time unnecessarily. The total time will
|
||||||
|
be roughly equivalent to the total bytes copied (`blocksize` x `count`).
|
||||||
|
|
||||||
|
Some useful invocations for testing would be the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null"
|
||||||
|
hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null"
|
||||||
|
hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null"
|
||||||
|
```
|
||||||
|
|
||||||
|
Choosing what to benchmark depends greatly on what you want to measure.
|
||||||
|
Typically you would choose a small blocksize for measuring the performance of
|
||||||
|
`dd`, as that would maximize the overhead introduced by the `dd` tool. `dd`
|
||||||
|
typically does some set amount of work per block which only depends on the size
|
||||||
|
of the block if conversions are used.
|
||||||
|
|
||||||
|
As an example, https://github.com/uutils/coreutils/pull/3600 made a change to
|
||||||
|
reuse the same buffer between block copies, avoiding the need to reallocate a
|
||||||
|
new block of memory for each copy. The impact of that change mostly had an
|
||||||
|
impact on large block size copies because those are the circumstances where the
|
||||||
|
memory performance dominated the total performance.
|
|
@ -414,6 +414,10 @@ where
|
||||||
let (prog_tx, rx) = mpsc::channel();
|
let (prog_tx, rx) = mpsc::channel();
|
||||||
thread::spawn(gen_prog_updater(rx, i.print_level));
|
thread::spawn(gen_prog_updater(rx, i.print_level));
|
||||||
|
|
||||||
|
// Create a common buffer with a capacity of the block size.
|
||||||
|
// This is the max size needed.
|
||||||
|
let mut buf = vec![BUF_INIT_BYTE; bsize];
|
||||||
|
|
||||||
// The main read/write loop.
|
// The main read/write loop.
|
||||||
//
|
//
|
||||||
// Each iteration reads blocks from the input and writes
|
// Each iteration reads blocks from the input and writes
|
||||||
|
@ -427,7 +431,7 @@ where
|
||||||
// best buffer size for reading based on the number of
|
// best buffer size for reading based on the number of
|
||||||
// blocks already read and the number of blocks remaining.
|
// blocks already read and the number of blocks remaining.
|
||||||
let loop_bsize = calc_loop_bsize(&i.count, &rstat, &wstat, i.ibs, bsize);
|
let loop_bsize = calc_loop_bsize(&i.count, &rstat, &wstat, i.ibs, bsize);
|
||||||
let (rstat_update, buf) = read_helper(&mut i, loop_bsize)?;
|
let rstat_update = read_helper(&mut i, &mut buf, loop_bsize)?;
|
||||||
if rstat_update.is_empty() {
|
if rstat_update.is_empty() {
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
@ -600,7 +604,11 @@ impl Write for Output<io::Stdout> {
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Read helper performs read operations common to all dd reads, and dispatches the buffer to relevant helper functions as dictated by the operations requested by the user.
|
/// Read helper performs read operations common to all dd reads, and dispatches the buffer to relevant helper functions as dictated by the operations requested by the user.
|
||||||
fn read_helper<R: Read>(i: &mut Input<R>, bsize: usize) -> std::io::Result<(ReadStat, Vec<u8>)> {
|
fn read_helper<R: Read>(
|
||||||
|
i: &mut Input<R>,
|
||||||
|
buf: &mut Vec<u8>,
|
||||||
|
bsize: usize,
|
||||||
|
) -> std::io::Result<ReadStat> {
|
||||||
// Local Helper Fns -------------------------------------------------
|
// Local Helper Fns -------------------------------------------------
|
||||||
fn perform_swab(buf: &mut [u8]) {
|
fn perform_swab(buf: &mut [u8]) {
|
||||||
for base in (1..buf.len()).step_by(2) {
|
for base in (1..buf.len()).step_by(2) {
|
||||||
|
@ -609,27 +617,29 @@ fn read_helper<R: Read>(i: &mut Input<R>, bsize: usize) -> std::io::Result<(Read
|
||||||
}
|
}
|
||||||
// ------------------------------------------------------------------
|
// ------------------------------------------------------------------
|
||||||
// Read
|
// Read
|
||||||
let mut buf = vec![BUF_INIT_BYTE; bsize];
|
// Resize the buffer to the bsize. Any garbage data in the buffer is overwritten or truncated, so there is no need to fill with BUF_INIT_BYTE first.
|
||||||
|
buf.resize(bsize, BUF_INIT_BYTE);
|
||||||
|
|
||||||
let mut rstat = match i.cflags.sync {
|
let mut rstat = match i.cflags.sync {
|
||||||
Some(ch) => i.fill_blocks(&mut buf, ch)?,
|
Some(ch) => i.fill_blocks(buf, ch)?,
|
||||||
_ => i.fill_consecutive(&mut buf)?,
|
_ => i.fill_consecutive(buf)?,
|
||||||
};
|
};
|
||||||
// Return early if no data
|
// Return early if no data
|
||||||
if rstat.reads_complete == 0 && rstat.reads_partial == 0 {
|
if rstat.reads_complete == 0 && rstat.reads_partial == 0 {
|
||||||
return Ok((rstat, buf));
|
return Ok(rstat);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Perform any conv=x[,x...] options
|
// Perform any conv=x[,x...] options
|
||||||
if i.cflags.swab {
|
if i.cflags.swab {
|
||||||
perform_swab(&mut buf);
|
perform_swab(buf);
|
||||||
}
|
}
|
||||||
|
|
||||||
match i.cflags.mode {
|
match i.cflags.mode {
|
||||||
Some(ref mode) => {
|
Some(ref mode) => {
|
||||||
let buf = conv_block_unblock_helper(buf, mode, &mut rstat);
|
*buf = conv_block_unblock_helper(buf.clone(), mode, &mut rstat);
|
||||||
Ok((rstat, buf))
|
Ok(rstat)
|
||||||
}
|
}
|
||||||
None => Ok((rstat, buf)),
|
None => Ok(rstat),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue