From 61d69a18d7b8fb7fdedde14ef9b76b2fa18651f2 Mon Sep 17 00:00:00 2001 From: Sylvestre Ledru Date: Wed, 11 Jun 2025 09:30:48 +0200 Subject: [PATCH] l10n: document a bit how it works (#8102) * l10n: document a bit how it works * add a link to fluent Co-authored-by: Daniel Hofstetter * fix typo Co-authored-by: Daniel Hofstetter --------- Co-authored-by: Daniel Hofstetter --- docs/src/l10n.md | 187 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) create mode 100644 docs/src/l10n.md diff --git a/docs/src/l10n.md b/docs/src/l10n.md new file mode 100644 index 000000000..7f606a0b1 --- /dev/null +++ b/docs/src/l10n.md @@ -0,0 +1,187 @@ +# ๐ŸŒ Localization (L10n) in uutils coreutils + +This guide explains how localization (L10n) is implemented in the **Rust-based coreutils project**, detailing the use of [Fluent](https://projectfluent.org/) files, runtime behavior, and developer integration. + +--- + +## ๐Ÿ“ Fluent File Layout + +Each utility has its own set of translation files under: + +``` + src/uu//locales/.ftl +``` + +Examples: + +``` + src/uu/ls/locales/en-US.ftl + src/uu/ls/locales/fr-FR.ftl +``` + +These files follow Fluent syntax and contain localized message patterns. + +--- + +## โš™๏ธ Initialization + +Localization must be explicitly initialized at runtime using: + +``` + setup_localization(path) +``` + + +This is typically done: +- In `src/bin/coreutils.rs` for **multi-call binaries** +- In `src/uucore/src/lib.rs` for **single-call utilities** + +The string parameter determines the lookup path for Fluent files. + +--- + +## ๐ŸŒ Locale Detection + +Locale selection is automatic and performed via: + +``` + fn detect_system_locale() -> Result +``` + +It reads the `LANG` environment variable (e.g., `fr-FR.UTF-8`), strips encoding, and parses the identifier. + +If parsing fails or `LANG` is not set, it falls back to: + +``` + const DEFAULT_LOCALE: &str = "en-US"; +``` + +You can override the locale at runtime by running: + +``` + LANG=ja-JP ./target/debug/ls +``` + +--- + +## ๐Ÿ“ฅ Retrieving Messages + +Two APIs are available: + +### `get_message(id: &str) -> String` + +Returns the message from the current locale bundle. + +``` + let msg = get_message("id-greeting"); +``` + +If not found, falls back to `en-US`. If still missing, returns the ID itself. + +--- + +### `get_message_with_args(id: &str, args: HashMap) -> String` + +Supports variable interpolation and pluralization. + +``` + let msg = get_message_with_args( + "error-io", + HashMap::from([ + ("error".to_string(), std::io::Error::last_os_error().to_string()) + ]) + ); +``` + +Fluent message example: + +``` + error-io = I/O error occurred: { $error } +``` + +Variables must match the Fluent placeholder keys (`$error`, `$name`, `$count`, etc.). + +--- + +## ๐Ÿ“ฆ Fluent Syntax Example + +``` + id-greeting = Hello, world! + welcome = Welcome, { $name }! + count-files = You have { $count -> + [one] { $count } file + *[other] { $count } files + } +``` + +Use plural rules and inline variables to adapt messages dynamically. + +--- + +## ๐Ÿงช Testing Localization + +Run all localization-related unit tests with: + +``` + cargo test --lib -p uucore +``` + +Tests include: +- Loading bundles +- Plural logic +- Locale fallback +- Fluent parse errors +- Thread-local behavior +- ... + +--- + +## ๐Ÿงต Thread-local Storage + +Localization is stored per thread using a `OnceLock`. +Each thread must call `setup_localization()` individually. +Initialization is **one-time-only** per thread โ€” re-initialization results in an error. + +--- + +## ๐Ÿงช Development vs Release Mode + +During development (`cfg(debug_assertions)`), paths are resolved relative to the crate source: + +``` + $CARGO_MANIFEST_DIR/../uu//locales/ +``` + +In release mode, **paths are resolved relative to the executable**: + +``` + /locales// +``` + +If both fallback paths fail, an error is returned during `setup_localization()`. + +--- + +## ๐Ÿ”ค Unicode Isolation Handling + +By default, the Fluent system wraps variables with Unicode directional isolate characters (`U+2068`, `U+2069`) to protect against visual reordering issues in bidirectional text (e.g., mixing Arabic and English). + +In this implementation, isolation is **disabled** via: + +``` + bundle.set_use_isolating(false); +``` + +This improves readability in CLI environments by preventing extraneous characters around interpolated values: + +Correct (as rendered): + +``` + "Welcome, Alice!" +``` + +Fluent default (disabled here): + +``` + "\u{2068}Alice\u{2069}" +```