1
Fork 0
mirror of https://github.com/RGBCube/uutils-coreutils synced 2025-07-27 19:17:43 +00:00
Commit graph

32 commits

Author SHA1 Message Date
Daniel Hofstetter
2a406d8cbb sort: adapt fixtures to change in unicode-width 2024-12-09 09:20:19 +01:00
Michael Debertol
418f5b7692 sort: handle empty merge inputs 2021-07-31 21:02:20 +02:00
Michael Debertol
233a778963 sort/ls: implement version cmp matching GNU spec
This reimplements version_cmp, which is used in sort and ls to sort
according to versions.
However, it is not bug-for-bug identical with GNU's implementation.
I reported a bug with GNU here:
https://lists.gnu.org/archive/html/bug-coreutils/2021-06/msg00045.html
This implementation does not contain the bugs regarding the handling of
file extensions and null bytes.
2021-06-27 15:29:17 +02:00
Michael Debertol
548a895cd6 sort: compatibility of human-numeric sort
Closes #1985.
This makes human-numeric sort follow the same algorithm as GNU's/FreeBSD's sort.
As documented by GNU in https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html,
we first compare by sign, then by si unit and finally by the numeric value.
2021-06-25 18:19:00 +02:00
Michael Debertol
66359a0f56 sort: insert line separators after non-empty files
If files don't end witht a line separator we have to insert one,
otherwise the last line will be combined with the first line of the next
file.
2021-06-06 18:01:08 +02:00
Michael Debertol
06b3092f5f sort: fix debug output for zeros / invalid numbers
We were reporting "no match" when sorting something like "0 ". This is
because we don't distinguish between 0 and invalid lines when sorting.
For debug output we have to get this information back.
2021-06-01 18:18:51 +02:00
Michael Debertol
dc63133f14
sort: correctly inherit global flags for keys (#2302)
Closes #2254. We should only inherit global settings for keys when there
are absolutely no options attached to the key.

The default key (matching the whole line) is implicitly added only if no
keys are supplied.

Improved some error messages by including more context.
2021-05-29 23:25:56 +02:00
Michael Debertol
e9656a6c32
sort: make GNU test sort-debug-keys pass (#2269)
* sort: disable support for thousand separators

In order to be compatible with GNU, we have to disable thousands
separators. GNU does not enable them for the C locale, either.

Once we add support for locales we can add this feature back.

* sort: delete unused fixtures

* sort: compare -0 and 0 equal

I must have misunderstood this when implementing, but GNU considers
-0, 0, and invalid numbers to be equal.

* sort: strip blanks before applying the char index

* sort: don't crash when key start is after key end

* sort: add "no match" for months at the first non-whitespace char

We should put the "^ no match for key" indicator at the first
non-whitespace character of a field.

* sort: improve support for e notation

* sort: use maches! macros
2021-05-28 22:38:29 +02:00
Michael Debertol
e0ebf907a4 sort: make merging stable
When merging files we need to prioritize files that occur earlier in the
command line arguments with -m.

This also makes the extsort merge step (and thus extsort itself) stable again.
2021-05-09 11:43:38 +02:00
electricboogie
4c395146dd Merge branch 'master' of https://github.com/uutils/coreutils 2021-04-25 10:11:27 -05:00
Michael Debertol
e6f6b109a5 sort: implement --debug
This adds a --debug flag, which, when activated, will draw lines below
the characters that are actually used for comparisons.

This is not a complete implementation of --debug. It should, quoting the man page
for GNU sort: "annotate the part of the line used to sort, and warn
about questionable usage to stderr". Warning about "questionable usage"
is not part of this patch.

This change required some adjustments to be able to get the range that
is actually used for comparisons. Most notably, general numeric comparisons
were rewritten, fixing some bugs along the lines.

Testing is mostly done by adding fixtures for the expected debug output of
existing tests.
2021-04-23 22:36:15 +02:00
electricboogie
25021f31eb Incorporate overhead of Line struct 2021-04-19 21:24:52 -05:00
Michael Debertol
4bbbe3a3f2
sort: implement numeric string comparison (#2070)
* sort: implement numeric string comparison

This implements -n and -h using a string comparison algorithm instead
of parsing each number to a f64 and comparing those.

This should result in a moderate performance increase and eliminate loss
of precision.

* cache parsed f64 numbers

For general numeric comparisons we have to parse numbers as f64,
as this behavior is explicitly documented by GNU coreutils.
We can however cache the parsed value to speed up comparisons.

* fix leading zeroes for negative numbers

* use more appropriate name for exponent

* improvements to the parse function

* move checks into main loop and fix thousands separator condition

* remove unneeded checks

* rustfmt
2021-04-17 13:49:35 +02:00
electricboogie
a76d452f75
Sort: More small fixes (#2065)
* Various fixes and performance improvements

* fix a typo

Co-authored-by: Michael Debertol <michael.debertol@gmail.com>

* Fix month parse for months with leading whitespace

* Implement test for months whitespace fix

* Confirm human numeric works as expected with whitespace with a test

* Correct arg help value name for --parallel

* Fix SemVer non version lines/empty line sorting with a test

Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
2021-04-17 10:06:19 +02:00
Michael Debertol
69f4410a8a
sort: dedup using compare_by (#2064)
compare_by is the function used for sorting, we should use it for dedup
as well.
2021-04-10 19:49:10 +02:00
electricboogie
e5113ad00e
Sort: Various fixes and performance improvements (#2057)
* Various fixes and performance improvements

* fix a typo

Co-authored-by: Michael Debertol <michael.debertol@gmail.com>

Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
2021-04-10 11:56:20 +02:00
electricboogie
8474249e5f
Sort: Implement stable sort, ignore non-printing, month sort dedup, auto parallel sort through rayon, zero terminated sort, check silent (#2008) 2021-04-08 22:07:09 +02:00
elgris
71ba8b3fd6 sort: add "dictionary-order" flag.
The flag makes 'sort' command ignore non-dictionary symbols
(non-alphanumeric and non-spaces). The only difference with GNU sort is
that it takes ALL alphanumeric symbols, not only ASCII ones.
2020-05-07 23:08:24 +02:00
Julio Rincon
29c6ad5f6a tests: untrimmed stdout assertion (fix #1235) 2019-02-08 07:54:48 +11:00
xplorld
47f5f12759 sort: treat "NaN" as string in numeric sort 2018-09-03 22:28:18 -07:00
Anthony Deschamps
6dc1eb54c0 sort: Implement ignore-case
Test included.
2017-01-21 13:30:22 -05:00
David Laban
e1af1520e7 sort: make compare_by honour settings.reverse
This allows sort --merge --reverse to work as well.
2016-08-13 00:42:43 +01:00
David Laban
8a8319a337 sort --merge works, but ignores --unique and --reverse
FileMerger receives Lines Iterables of the pre-sorted input files
via push_file() It implements Iterator, which yields lines from the
input files in (merged) sorted order. If the input files are not sorted,
then the behavior is undefined.

Internally, FileMerger uses a
std::collections::BinaryHeap<MergeableFile>.

MergeableFile is an internal helper that implements Ord in a way that
BinaryHeap can use (note that we want smallest-first, but BinaryHeap
returns largest first, so MergeableFile::cmp() calls reverse() on
whatever compare_by() returns.
2016-08-13 00:42:43 +01:00
David Laban
6751d2c708 implement sort --stable
Made a new function sort_by(lines, compare_fns), which accepts a
list of compare_fns and calls lines.sort_by() with a closure that
calls each compare_fn in turn until one returns something other
than equal.

Default behavior ensures that String::cmp is the last element in the
compare_fns list (referred to as 'last resort' sorting by man sort).
Passing --stable (-s) turns this behaviour off.

Test cases provided for `sort --month` and `sort --month --stable`.
2016-08-03 07:56:40 +01:00
palaviv
3fd8136423 sort: Support check 2016-06-14 22:21:30 +03:00
palaviv
3bc5a5f769 sort: support multiple input files 2016-06-14 21:25:29 +03:00
palaviv
87455f998a sort: Version sort support 2016-06-14 20:33:09 +03:00
palaviv
d4ffbe0526 sort: unique option support 2016-06-11 15:46:41 +03:00
Joseph Crail
6b129887d6 tests/sort: add test for default mode 2016-03-29 00:58:24 -04:00
Joseph Crail
b290c10845 tests/sort: refactor to match other tests
Instead of using numerals to denote individual cases, I used descriptive
case names. I also changed the extension for the expected output fixture
to match other tests.

I removed one redundant test and another unnecessary helper function.
2016-03-29 00:58:24 -04:00
Joseph Crail
55c0b1786f tests/sort: add tests for month sort 2016-03-25 16:55:58 -04:00
Nathan Ross
a21c54e2cd rewrite tests for cargo compat, decoupled directory, output handling 2015-11-23 02:04:15 -05:00