# Performance Profiling Tutorial ## Effective Benchmarking with Hyperfine [Hyperfine](https://github.com/sharkdp/hyperfine) is a powerful command-line benchmarking tool that allows you to measure and compare execution times of commands with statistical rigor. ### Benchmarking Best Practices When evaluating performance improvements, always set up your benchmarks to compare: 1. The GNU implementation as reference 2. The implementation without the change 3. The implementation with your change This three-way comparison provides clear insights into: - How your implementation compares to the standard (GNU) - The actual performance impact of your specific change ### Example Benchmark First, you will need to build the binary in release mode. Debug builds are significantly slower: ```bash cargo build --features unix --profile profiling ``` ```bash # Three-way comparison benchmark hyperfine \ --warmup 3 \ "/usr/bin/ls -R ." \ "./target/profiling/coreutils.prev ls -R ." \ "./target/profiling/coreutils ls -R ." # can be simplified with: hyperfine \ --warmup 3 \ -L ls /usr/bin/ls,"./target/profiling/coreutils.prev ls","./target/profiling/coreutils ls" \ "{ls} -R ." ``` ``` # to improve the reproducibility of the results: taskset -c 0 ``` ### Interpreting Results Hyperfine provides summary statistics including: - Mean execution time - Standard deviation - Min/max times - Relative performance comparison Look for consistent patterns rather than focusing on individual runs, and be aware of system noise that might affect results. ## Using Samply for Profiling [Samply](https://github.com/mstange/samply) is a sampling profiler that helps you identify performance bottlenecks in your code. ### Basic Profiling ```bash # Generate a flame graph for your application samply record ./target/debug/coreutils ls -R # Profile with higher sampling frequency samply record --rate 1000 ./target/debug/coreutils seq 1 1000 ``` The output using the `debug` profile might be easier to understand, but the performance characteristics may be somewhat different from `release` profile that we _actually_ care about. Consider using the `profiling` profile, that compiles in `release` mode but with debug symbols. For example: ```bash cargo build --profile profiling -p uu_ls samply record -r 10000 target/profiling/ls -lR /var .git .git .git > /dev/null ``` ## Workflow: Measuring Performance Improvements 1. **Establish baselines**: ```bash hyperfine --warmup 3 \ "/usr/bin/sort large_file.txt" \ "our-sort-v1 large_file.txt" ``` 2. **Identify bottlenecks**: ```bash samply record ./our-sort-v1 large_file.txt ``` 3. **Make targeted improvements** based on profiling data 4. **Verify improvements**: ```bash hyperfine --warmup 3 \ "/usr/bin/sort large_file.txt" \ "our-sort-v1 large_file.txt" \ "our-sort-v2 large_file.txt" ``` 5. **Document performance changes** with concrete numbers ```bash hyperfine --export-markdown file.md [...] ```