doc: extensions: Explain how printf/seq handle precision

There are some difference in behaviour vs GNU coreutils, explain what those are.
2025-09-13 18:47:58 +00:00 · 2025-04-01 19:38:09 +02:00 · 2025-04-01 19:38:09 +02:00 · 3f12ed993a
commit 3f12ed993a
parent 705c2f1cf4
1 changed files with 78 additions and 0 deletions
--- a/docs/src/extensions.md
+++ b/docs/src/extensions.md
@ -1,3 +1,5 @@
 <!-- spell-checker:ignore hhhhp armv7 cccccccccccccccccccccccccccccccdp ccccccccccccccd ccccccccccccccdp fffffffp -->
 # Extensions over GNU
 Though the main goal of the project is compatibility, uutils supports a few
@ -71,8 +73,84 @@ feature is adopted from [FreeBSD](https://www.freebsd.org/cgi/man.cgi?cut).
 mail headers in the input. `-q`/`--quick` breaks lines more quickly. And `-T`/`--tab-width` defines the
 number of spaces representing a tab when determining the line length.
 ## `printf`
 `printf` uses arbitrary precision decimal numbers to parse and format floating point
 numbers. GNU coreutils uses `long double`, whose actual size may be [double precision
 64-bit float](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)
 (e.g 32-bit arm), [extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision)
 (x86(-64)), or
 [quadruple precision 128-bit float](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format) (e.g. arm64).
 Practically, this means that printing a number with a large precision will stay exact:
 ```
 printf "%.48f\n" 0.1
 0.100000000000000000000000000000000000000000000000 << uutils on all platforms
 0.100000000000000000001355252715606880542509316001 << GNU coreutils on x86(-64)
 0.100000000000000000000000000000000004814824860968 << GNU coreutils on arm64
 0.100000000000000005551115123125782702118158340454 << GNU coreutils on armv7 (32-bit)
 ```
 ### Hexadecimal floats
 For hexadecimal float format (`%a`), POSIX only states that one hexadecimal number
 should be present left of the decimal point (`0xh.hhhhp±d` [1]), but does not say how
 many _bits_ should be included (between 1 and 4). On x86(-64), the first digit always
 includes 4 bits, so its value is always between `0x8` and `0xf`, while on other
 architectures, only 1 bit is included, so the value is always `0x1`.
 However, the first digit will of course be `0x0` if the number is zero. Also,
 rounding numbers may cause the first digit to be `0x1` on x86(-64) (e.g.
 `0xf.fffffffp-5` rounds to `0x1.00p-1`), or `0x2` on other architectures.
 We chose to replicate x86-64 behavior on all platforms.
 Additionally, the default precision of the hexadecimal float format (`%a` without
 any specifier) is expected to be "sufficient for exact representation of the value" [1].
 This is not possible in uutils as we store arbitrary precision numbers that may be
 periodic in hexadecimal form (`0.1 = 0xc.ccc...p-7`), so we revert
 to the number of digits that would be required to exactly print an
 [extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision),
 emulating GNU coreutils behavior on x86(-64). An 80-bit float has 64 bits in its
 integer and fractional part, so 16 hexadecimal digits are printed in total (1 digit
 before the decimal point, 15 after).
 Practically, this means that the default hexadecimal floating point output is
 identical to x86(-64) GNU coreutils:
 ```
 printf "%a\n" 0.1
 0xc.ccccccccccccccdp-7 << uutils on all platforms
 0xc.ccccccccccccccdp-7 << GNU coreutils on x86-64
 0x1.999999999999999999999999999ap-4 << GNU coreutils on arm64
 0x1.999999999999ap-4   << GNU coreutils on armv7 (32-bit)
 ```
 We _can_ print an arbitrary number of digits if a larger precision is requested,
 and the leading digit will still be in the `0x8`-`0xf` range:
 ```
 printf "%.32a\n" 0.1
 0xc.cccccccccccccccccccccccccccccccdp-7 << uutils on all platforms
 0xc.ccccccccccccccd00000000000000000p-7 << GNU coreutils on x86-64
 0x1.999999999999999999999999999a0000p-4 << GNU coreutils on arm64
 0x1.999999999999a0000000000000000000p-4 << GNU coreutils on armv7 (32-bit)
 ```
 ***Note: The architecture-specific behavior on non-x86(-64) platforms may change in
 the future.***
 ## `seq`
 Unlike GNU coreutils, `seq` always uses arbitrary precision decimal numbers, no
 matter the parameters (integers, decimal numbers, positive or negative increments,
 format specified, etc.), so its output will be more correct than GNU coreutils for
 some inputs (e.g. small fractional increments where GNU coreutils uses `long double`).
 The only limitation is that the position of the decimal point is stored in a `i64`,
 so values smaller than 10**(-2**63) will underflow to 0, and some values larger
 than 10**(2**63) may overflow to infinity.
 See also comments under `printf` for formatting precision and differences.
 `seq` provides `-t`/`--terminator` to set the terminator character.
 ## `ls`