LibUnicode: Generate Unicode locale likely subtag data

CLDR contains a set of likely subtag data where, given a locale, you can resolve what is the most likely language, script, or territory of that locale. This data is needed for resolving territory aliases. These aliases might contain multiple territories, and we need to resolve which of those territories is most likely correct for a locale. Note that the likely subtag data is quite huge (a few thousand entries). As an optimization encouraged by the spec, we only generate the smallest subset of this data that we actually need (about 150 entries).
2025-10-29 03:52:35 +00:00 · 2021-08-31 09:40:24 -04:00 · 2021-08-31 09:40:24 -04:00 · 1fbc5dba08
commit 1fbc5dba08
parent 72f49e42b4
3 changed files with 149 additions and 2 deletions
--- a/Userland/Libraries/LibUnicode/Locale.cpp
+++ b/Userland/Libraries/LibUnicode/Locale.cpp
@ -860,4 +860,19 @@ Optional<StringView> resolve_subdivision_alias(StringView subdivision)
 #endif
 }

+String resolve_most_likely_territory([[maybe_unused]] LanguageID const& language_id, StringView territory_alias)
+{
+    auto aliases = territory_alias.split_view(' ');
+
+#if ENABLE_UNICODE_DATA
+    if (aliases.size() > 1) {
+        auto territory = Detail::resolve_most_likely_territory(language_id);
+        if (territory.has_value() && aliases.contains_slow(*territory))
+            return territory.release_value();
+    }
+#endif
+
+    return aliases[0].to_string();
+}
+
 }