From ddfcd2eb14d8046c6246c753b00e2c0466e43c17 Mon Sep 17 00:00:00 2001 From: nicoo Date: Mon, 3 May 2021 15:04:06 +0200 Subject: [PATCH] factor::benchmarking: Add wishlist / planned work --- src/uu/factor/BENCHMARKING.md | 49 +++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/src/uu/factor/BENCHMARKING.md b/src/uu/factor/BENCHMARKING.md index c629252b8..3ad038c15 100644 --- a/src/uu/factor/BENCHMARKING.md +++ b/src/uu/factor/BENCHMARKING.md @@ -63,3 +63,52 @@ which I recommend reading if you want to add benchmarks to `factor`. 4. Use [`criterion`], `criterion::black_box`, ... `criterion` isn't perfect, but it is also much better than ad-hoc solutions in each benchmark. + + +## Wishlist + +### Configurable statistical estimators + +`criterion` always uses the arithmetic average as estimator; in µbenchmarks, +where the code under test is fully deterministic and the measurements are +subject to additive, positive noise, [the minimum is more appropriate][lemire]. + + +### CI & reproducible performance testing + +Measuring performance on real hardware is important, as it relates directly +to what users of `factor` experience; however, such measurements are subject +to the constraints of the real-world, and aren't perfectly reproducible. +Moreover, the mitigations for it (described above) aren't achievable in +virtualized, multi-tenant environments such as CI. + +Instead, we could run the µbenchmarks in a simulated CPU with [`cachegrind`], +measure execution “time” in that model (in CI), and use it to detect and report +performance improvements and regressions. + +[`iai`] is an implementation of this idea for Rust. + +[`cachegrind`]: https://www.valgrind.org/docs/manual/cg-manual.html +[`iai`]: https://bheisler.github.io/criterion.rs/book/iai/iai.html + + +### Comparing randomised implementations across multiple inputs + +`factor` is a challenging target for system benchmarks as it combines two +characteristics: + +1. integer factoring algorithms are randomised, with large variance in + execution time ; + +2. various inputs also have large differences in factoring time, that + corresponds to no natural, linear ordering of the inputs. + + +If (1) was untrue (i.e. if execution time wasn't random), we could faithfully +compare 2 implementations (2 successive versions, or `uutils` and GNU) using +a scatter plot, where each axis corresponds to the perf. of one implementation. + +Similarly, without (2) we could plot numbers on the X axis and their factoring +time on the Y axis, using multiple lines for various quantiles. The large +differences in factoring times for successive numbers, mean that such a plot +would be unreadable.