1
Fork 0
mirror of https://github.com/RGBCube/uutils-coreutils synced 2025-07-28 03:27:44 +00:00

wc: specialize scanning loop on settings. (#3708)

* wc: specialize scanning loop on settings.

The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.

On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.

* Reduce the size of the wc dispatch table by half.

By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
This commit is contained in:
Owen Anderson 2022-07-18 03:16:52 -07:00 committed by GitHub
parent 189a8af450
commit 735db78b3d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 180 additions and 26 deletions

View file

@ -60,7 +60,106 @@ fn test_utf8() {
}
#[test]
fn test_utf8_extra() {
fn test_utf8_words() {
new_ucmd!()
.arg("-w")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is("87\n");
}
#[test]
fn test_utf8_line_length_words() {
new_ucmd!()
.arg("-Lw")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 87 48\n");
}
#[test]
fn test_utf8_line_length_chars() {
new_ucmd!()
.arg("-Lm")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 442 48\n");
}
#[test]
fn test_utf8_line_length_chars_words() {
new_ucmd!()
.arg("-Lmw")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 87 442 48\n");
}
#[test]
fn test_utf8_chars() {
new_ucmd!()
.arg("-m")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is("442\n");
}
#[test]
fn test_utf8_chars_words() {
new_ucmd!()
.arg("-mw")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 87 442\n");
}
#[test]
fn test_utf8_line_length_lines() {
new_ucmd!()
.arg("-Ll")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 25 48\n");
}
#[test]
fn test_utf8_line_length_lines_words() {
new_ucmd!()
.arg("-Llw")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 25 87 48\n");
}
#[test]
fn test_utf8_lines_chars() {
new_ucmd!()
.arg("-ml")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 25 442\n");
}
#[test]
fn test_utf8_lines_words_chars() {
new_ucmd!()
.arg("-mlw")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 25 87 442\n");
}
#[test]
fn test_utf8_line_length_lines_chars() {
new_ucmd!()
.arg("-Llm")
.pipe_in_fixture("UTF_8_weirdchars.txt")
.run()
.stdout_is(" 25 442 48\n");
}
#[test]
fn test_utf8_all() {
new_ucmd!()
.arg("-lwmcL")
.pipe_in_fixture("UTF_8_weirdchars.txt")