1
Fork 0
mirror of https://github.com/RGBCube/uutils-coreutils synced 2025-09-15 11:36:16 +00:00

sort: automatically fall back to extsort

To make this work we make default sort a special case of external sort.

External sorting uses auxiliary files for intermediate chunks. However,
when we can keep our intermediate chunks in memory, we don't write them
to the file system at all. Only when we notice that we can't keep them
in memory they are written to the disk.

Additionally, we don't allocate buffers with the capacity of their
maximum size anymore. Instead, they start with a capacity of 8kb and are
grown only when needed.

This makes sorting smaller files about as fast as it was before
(I'm seeing a regression of ~3%), and allows us to seamlessly continue
with auxiliary files when needed.
This commit is contained in:
Michael Debertol 2021-05-21 23:00:13 +02:00
parent df45b20dc1
commit e7da8058dc
7 changed files with 138 additions and 112 deletions

View file

@ -15,29 +15,18 @@ fn test_helper(file_name: &str, args: &str) {
.stdout_is_fixture(format!("{}.expected.debug", file_name));
}
// FYI, the initialization size of our Line struct is 96 bytes.
//
// At very small buffer sizes, with that overhead we are certainly going
// to overrun our buffer way, way, way too quickly because of these excess
// bytes for the struct.
//
// For instance, seq 0..20000 > ...text = 108894 bytes
// But overhead is 1920000 + 108894 = 2028894 bytes
//
// Or kjvbible-random.txt = 4332506 bytes, but minimum size of its
// 99817 lines in memory * 96 bytes = 9582432 bytes
//
// Here, we test 108894 bytes with a 50K buffer
//
#[test]
fn test_larger_than_specified_segment() {
new_ucmd!()
.arg("-n")
.arg("-S")
.arg("50K")
.arg("ext_sort.txt")
.succeeds()
.stdout_is_fixture("ext_sort.expected");
fn test_buffer_sizes() {
let buffer_sizes = ["0", "50K", "1M", "1000G"];
for buffer_size in &buffer_sizes {
new_ucmd!()
.arg("-n")
.arg("-S")
.arg(buffer_size)
.arg("ext_sort.txt")
.succeeds()
.stdout_is_fixture("ext_sort.expected");
}
}
#[test]