diff --git a/src/uu/factor/BENCHMARKING.md b/src/uu/factor/BENCHMARKING.md index 7b611db4b..6bf9cbf90 100644 --- a/src/uu/factor/BENCHMARKING.md +++ b/src/uu/factor/BENCHMARKING.md @@ -9,7 +9,6 @@ They are located outside the `uu_factor` crate, as they do not comply with the project's minimum supported Rust version, *i.e.* may require a newer version of `rustc`. - ## Microbenchmarking deterministic functions We currently use [`criterion`] to benchmark deterministic functions, @@ -20,8 +19,9 @@ the hardware, operating system version, etc., but they are noisy and affected by other tasks on the system (browser, compile jobs, etc.), which can cause `criterion` to report spurious performance improvements and regressions. -This can be mitigated by getting as close to [idealised conditions][lemire] +This can be mitigated by getting as close to [idealized conditions][lemire] as possible: + - minimize the amount of computation and I/O running concurrently to the benchmark, *i.e.* close your browser and IM clients, don't compile at the same time, etc. ; @@ -29,15 +29,13 @@ as possible: - [isolate a **physical** core], set it to `nohz_full`, and pin the benchmark to it, so it won't be preempted in the middle of a measurement ; - disable ASLR by running `setarch -R cargo bench`, so we can compare results - across multiple executions. - + across multiple executions. [`criterion`]: https://bheisler.github.io/criterion.rs/book/index.html [lemire]: https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/ [isolate a **physical** core]: https://pyperf.readthedocs.io/en/latest/system.html#isolate-cpus-on-linux [frequency stays constant]: ... - ### Guidance for designing microbenchmarks *Note:* this guidance is specific to `factor` and takes its application domain @@ -45,30 +43,29 @@ into account; do not expect it to generalize to other projects. It is based on Daniel Lemire's [*Microbenchmarking calls for idealized conditions*][lemire], which I recommend reading if you want to add benchmarks to `factor`. -1. Select a small, self-contained, deterministic component +1. Select a small, self-contained, deterministic component `gcd` and `table::factor` are good example of such: - no I/O or access to external data structures ; - no call into other components ; - - behaviour is deterministic: no RNG, no concurrency, ... ; + - behavior is deterministic: no RNG, no concurrency, ... ; - the test's body is *fast* (~100ns for `gcd`, ~10µs for `factor::table`), so each sample takes a very short time, minimizing variability and maximizing the numbers of samples we can take in a given time. -2. Benchmarks are immutable (once merged in `uutils`) +2. Benchmarks are immutable (once merged in `uutils`) Modifying a benchmark means previously-collected values cannot meaningfully be compared, silently giving nonsensical results. If you must modify an existing benchmark, rename it. -3. Test common cases +3. Test common cases We are interested in overall performance, rather than specific edge-cases; - use **reproducibly-randomised inputs**, sampling from either all possible + use **reproducibly-randomized inputs**, sampling from either all possible input values or some subset of interest. -4. Use [`criterion`], `criterion::black_box`, ... +4. Use [`criterion`], `criterion::black_box`, ... `criterion` isn't perfect, but it is also much better than ad-hoc solutions in each benchmark. - ## Wishlist ### Configurable statistical estimators @@ -77,7 +74,6 @@ which I recommend reading if you want to add benchmarks to `factor`. where the code under test is fully deterministic and the measurements are subject to additive, positive noise, [the minimum is more appropriate][lemire]. - ### CI & reproducible performance testing Measuring performance on real hardware is important, as it relates directly @@ -95,19 +91,17 @@ performance improvements and regressions. [`cachegrind`]: https://www.valgrind.org/docs/manual/cg-manual.html [`iai`]: https://bheisler.github.io/criterion.rs/book/iai/iai.html - -### Comparing randomised implementations across multiple inputs +### Comparing randomized implementations across multiple inputs `factor` is a challenging target for system benchmarks as it combines two characteristics: -1. integer factoring algorithms are randomised, with large variance in +1. integer factoring algorithms are randomized, with large variance in execution time ; 2. various inputs also have large differences in factoring time, that corresponds to no natural, linear ordering of the inputs. - If (1) was untrue (i.e. if execution time wasn't random), we could faithfully compare 2 implementations (2 successive versions, or `uutils` and GNU) using a scatter plot, where each axis corresponds to the perf. of one implementation.