From 3f12ed993a02590033b53384388ab9677e4ef153 Mon Sep 17 00:00:00 2001 From: Nicolas Boichat Date: Tue, 1 Apr 2025 19:38:09 +0200 Subject: [PATCH] doc: extensions: Explain how printf/seq handle precision There are some difference in behaviour vs GNU coreutils, explain what those are. --- docs/src/extensions.md | 78 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/docs/src/extensions.md b/docs/src/extensions.md index 1e715f729..af6119da5 100644 --- a/docs/src/extensions.md +++ b/docs/src/extensions.md @@ -1,3 +1,5 @@ + + # Extensions over GNU Though the main goal of the project is compatibility, uutils supports a few @@ -71,8 +73,84 @@ feature is adopted from [FreeBSD](https://www.freebsd.org/cgi/man.cgi?cut). mail headers in the input. `-q`/`--quick` breaks lines more quickly. And `-T`/`--tab-width` defines the number of spaces representing a tab when determining the line length. +## `printf` + +`printf` uses arbitrary precision decimal numbers to parse and format floating point +numbers. GNU coreutils uses `long double`, whose actual size may be [double precision +64-bit float](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) +(e.g 32-bit arm), [extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision) +(x86(-64)), or +[quadruple precision 128-bit float](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format) (e.g. arm64). + +Practically, this means that printing a number with a large precision will stay exact: +``` +printf "%.48f\n" 0.1 +0.100000000000000000000000000000000000000000000000 << uutils on all platforms +0.100000000000000000001355252715606880542509316001 << GNU coreutils on x86(-64) +0.100000000000000000000000000000000004814824860968 << GNU coreutils on arm64 +0.100000000000000005551115123125782702118158340454 << GNU coreutils on armv7 (32-bit) +``` + +### Hexadecimal floats + +For hexadecimal float format (`%a`), POSIX only states that one hexadecimal number +should be present left of the decimal point (`0xh.hhhhp±d` [1]), but does not say how +many _bits_ should be included (between 1 and 4). On x86(-64), the first digit always +includes 4 bits, so its value is always between `0x8` and `0xf`, while on other +architectures, only 1 bit is included, so the value is always `0x1`. + +However, the first digit will of course be `0x0` if the number is zero. Also, +rounding numbers may cause the first digit to be `0x1` on x86(-64) (e.g. +`0xf.fffffffp-5` rounds to `0x1.00p-1`), or `0x2` on other architectures. + +We chose to replicate x86-64 behavior on all platforms. + +Additionally, the default precision of the hexadecimal float format (`%a` without +any specifier) is expected to be "sufficient for exact representation of the value" [1]. +This is not possible in uutils as we store arbitrary precision numbers that may be +periodic in hexadecimal form (`0.1 = 0xc.ccc...p-7`), so we revert +to the number of digits that would be required to exactly print an +[extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision), +emulating GNU coreutils behavior on x86(-64). An 80-bit float has 64 bits in its +integer and fractional part, so 16 hexadecimal digits are printed in total (1 digit +before the decimal point, 15 after). + +Practically, this means that the default hexadecimal floating point output is +identical to x86(-64) GNU coreutils: +``` +printf "%a\n" 0.1 +0xc.ccccccccccccccdp-7 << uutils on all platforms +0xc.ccccccccccccccdp-7 << GNU coreutils on x86-64 +0x1.999999999999999999999999999ap-4 << GNU coreutils on arm64 +0x1.999999999999ap-4 << GNU coreutils on armv7 (32-bit) +``` + +We _can_ print an arbitrary number of digits if a larger precision is requested, +and the leading digit will still be in the `0x8`-`0xf` range: +``` +printf "%.32a\n" 0.1 +0xc.cccccccccccccccccccccccccccccccdp-7 << uutils on all platforms +0xc.ccccccccccccccd00000000000000000p-7 << GNU coreutils on x86-64 +0x1.999999999999999999999999999a0000p-4 << GNU coreutils on arm64 +0x1.999999999999a0000000000000000000p-4 << GNU coreutils on armv7 (32-bit) +``` + +***Note: The architecture-specific behavior on non-x86(-64) platforms may change in +the future.*** + ## `seq` +Unlike GNU coreutils, `seq` always uses arbitrary precision decimal numbers, no +matter the parameters (integers, decimal numbers, positive or negative increments, +format specified, etc.), so its output will be more correct than GNU coreutils for +some inputs (e.g. small fractional increments where GNU coreutils uses `long double`). + +The only limitation is that the position of the decimal point is stored in a `i64`, +so values smaller than 10**(-2**63) will underflow to 0, and some values larger +than 10**(2**63) may overflow to infinity. + +See also comments under `printf` for formatting precision and differences. + `seq` provides `-t`/`--terminator` to set the terminator character. ## `ls`