mirror of
https://github.com/RGBCube/uutils-coreutils
synced 2025-07-28 11:37:44 +00:00
split: add BENCHMARKING.md documentation file
This commit is contained in:
parent
70ca1f45ea
commit
b37718de10
1 changed files with 47 additions and 0 deletions
47
src/uu/split/BENCHMARKING.md
Normal file
47
src/uu/split/BENCHMARKING.md
Normal file
|
@ -0,0 +1,47 @@
|
|||
<!-- spell-checker:ignore testfile -->
|
||||
|
||||
# Benchmarking to measure performance
|
||||
|
||||
To compare the performance of the `uutils` version of `split` with the
|
||||
GNU version of `split`, you can use a benchmarking tool like
|
||||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||
running
|
||||
|
||||
sudo apt-get install hyperfine
|
||||
|
||||
Next, build the `split` binary under the release profile:
|
||||
|
||||
cargo build --release -p uu_split
|
||||
|
||||
Now, get a text file to test `split` on. The `split` program has three
|
||||
main modes of operation: chunk by lines, chunk by bytes, and chunk by
|
||||
lines with a byte limit. You may want to test the performance of `split`
|
||||
with various shapes and sizes of input files and under various modes of
|
||||
operation. For example, to test chunking by bytes on a large input file,
|
||||
you can create a file named `testfile.txt` containing one million null
|
||||
bytes like this:
|
||||
|
||||
printf "%0.s\0" {1..1000000} > testfile.txt
|
||||
|
||||
For another example, to test chunking by bytes on a large real-world
|
||||
input file, you could download a [database dump of Wikidata][1] or some
|
||||
related files that the Wikimedia project provides. For example, [this
|
||||
file][2] contains about 130 million lines.
|
||||
|
||||
Finally, you can compare the performance of the two versions of `split`
|
||||
by running, for example,
|
||||
|
||||
cd /tmp && hyperfine \
|
||||
--prepare 'rm x* || true' \
|
||||
"split -b 1000 testfile.txt" \
|
||||
"target/release/split -b 1000 testfile.txt"
|
||||
|
||||
Since `split` creates a lot of files on the filesystem, I recommend
|
||||
changing to the `/tmp` directory before running the benchmark. The
|
||||
`--prepare` argument to `hyperfine` runs a specified command before each
|
||||
timing run. We specify `rm x* || true` so that the output files from the
|
||||
previous run of `split` are removed before each run begins.
|
||||
|
||||
[0]: https://github.com/sharkdp/hyperfine
|
||||
[1]: https://www.wikidata.org/wiki/Wikidata:Database_download
|
||||
[2]: https://dumps.wikimedia.org/wikidatawiki/20211001/wikidatawiki-20211001-pages-logging.xml.gz
|
Loading…
Add table
Add a link
Reference in a new issue