cut: optimizations

mirror of https://github.com/RGBCube/uutils-coreutils synced 2025-08-01 05:27:45 +00:00

* Use buffered stdout to reduce write sys calls.

This simple change yielded the biggest performace gain.

* Use `for_byte_record_with_terminator` from the `bstr` crate.

This is to minimize the per line copying needed by
`BufReader::read_until`. The `cut_fields` and `cut_fields_delimiter`
functions used `read_until` to iterate over lines. That required copying
each input line to the line buffer. With
`for_byte_record_with_terminator` copying is minimized as it calls our
closure with a reference to BufReader's buffer most of the time.  It
needs to copy (internally) only to process any incomplete lines at the
end of the buffer.

* Re-write `Searcher` to use `memchr`.

Switch from the naive implementation to one that uses `memchr`.

* Rewrite `cut_bytes` almost entirely.

This was already well optimized. The performance gain in this case is
not from avoiding copying. In fact, it needed zero copying whereas new
implementation introduces some copying similar to `cut_fields` described
above. But the occassional copying cost is more than offset by the use
of the very fast `memchr` inside `for_byte_record_with_terminator`.
This change also simplifies the code significantly. Removed the `buffer`
module.

This commit is contained in:

Chirag Jadwani

2021-04-24 21:34:42 +05:30

parent 2f17bfc14c

commit 2c1459cbfc

5 changed files with 157 additions and 302 deletions

2

Cargo.lock generated

View file

 @ -1777,7 +1777,9 @@ dependencies = [
 name = "uu_cut"
 version = "0.0.6"
 dependencies = [
  "bstr",
  "clap",
  "memchr 2.3.4",
  "uucore",
  "uucore_procs",
 ]

Rows
Columns

cut: optimizations

2 Cargo.lock generated Unescape Escape View file

2

Cargo.lock generated

View file