This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
These APIs only perform small allocations, and are only used by LibJS
and the time zone settings widget. Callers which could only have failed
from these APIs are also made to be infallible here.
These APIs only perform small allocations, and are only used by LibJS.
Callers which could only have failed from these APIs are also made to
be infallible here.
The LocaleData generator currently stores vectors of unique instances of
CLDR data (e.g. languages, currencies, etc.). For each CLDR file that we
parse, we linearly search through those vectors to decide if the current
field being parsed is unique. Given the size of the CLDR, this adds up
to quite a bit of time.
Augment these vectors with a hash map to store the index of each unique
instance in those vectors. This allows for quickly checking if a field
is unique, and to later look up those indices.
We do not apply this technique to every bit of CLDR data here. For
example, CLDR::character_orders contains only 2 entries. In that case,
it is quicker to search the vector than it is to hash a string key.
This reduces the runtime of GenerateLocaleData from to 2.03s to 1.09s.
Similar to languages and currencies, extract the loop to collect the
unique set of date fields to a preprocessing function. This alone does
not yield any performance improvement, but combined with an upcoming
patch will make the parse_locale_date_fields() a bit faster.
We currently parse each CLDR calendar, then decide based on its primary
key whether we want to skip it. Instead, we can decide to skip it based
on its file name.
This reduces the runtime of GenerateLocaleData from 2.03s to 1.97s.
The LocaleData generator has to read a few of the CLDR files more than
once, to e.g. prepare some data up front (for reasons why, see commits
c86f7a6 and 0b69e9f). This takes non-neglible time, especially for large
JSON files such as currencies.json. So in these cases, cache the parsed
JSON in a map.
This reduces the runtime of GenerateLocaleData from 2.32s to 2.03s.
In CLDR 42 and earlier, we were able to assume all cldr-localename files
existed for every locale. They now do not exist for locales that don't
provide any localized data. Namely, this is the "und" locale (which is
an alias for the root locale, i.e. the locale we fall back to when a
user provides an unknown locale).
Further, we were previously able to assume that each currencies.json in
cldr-numbers contained all currencies. This file now excludes currencies
whose localized names are the same as the currency key. Therefore, we
now preprocess currencies.json to discover all currencies ahead of time,
much like we already do for languages.json.
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").
Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).
No functional changes, just a lot of new FIXMEs.
For example, consider cases where we want to propagate errors only in
specific instances:
auto result = read_data(); // something like ErrorOr<ByteBuffer>
if (result.is_error() && result.error().code() != EINTR)
continue;
auto bytes = TRY(result);
The TRY invocation will currently copy the byte buffer when the
expression (in this case, just a local variable) is stored into
_temporary_result.
This patch binds the expression to a reference to prevent such copies.
In less trival invocations (such as TRY(some_function()), this will
incur only temporary lifetime extensions, i.e. no functional change.
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
For now, this is limited to strings that are 3 bytes or less. We can use
7 bytes on 64-bit platforms, but we do not yet assume 64-bit for Lagom
hosts (e.g. wasm).
In order to prevent this commit from having to refactor almost all of
Intl, the goal here is to update the internal parsing/canonicalization
of locales within LibLocale only. Call sites which are already equiped
to handle String and OOM errors do so, however.
If USING_AK_GLOBALLY is not defined, the name IsLvalueReference might
not be available in the global namespace. Follow the pattern established
in LibTest to fully qualify AK types in macros to avoid this problem.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Hand-picking the smallest index type that fits a particular generated
array started with commit 3ad159537e. This
was to reduce the size of the generated library.
Since then, the number of types using UniqueStorage has grown a ton,
creating a long list of types for which index types are manually picked.
When a new UCD/CLDR/TZDB is released, and the current index type no
longer fits the generated data, we fail to generate. Tracking down which
index caused the failure is a pretty annoying process.
Instead, we can just use size_t while in the generators themselves, then
automatically pick the size needed for the generated code.
There were some notable changes to the CLDR JSON format and data in this
release.
The patterns for a date at a specific time, i.e. "{date} at {time}", now
appear under the "atTime" attribute of the "dateTimeFormats" object.
Locale specific changes that affected test-js:
All locales:
* In many patterns, the code points U+00A0 (NO-BREAK SPACE) and U+202F
(NARROW NO-BREAK SPACE) are now used in place of an ASCII space. For
example, before the "dayPeriod" fields AM and PM.
* Separators such as U+2013 (EN DASH) are now surrounded by U+2009 (THIN
SPACE) in place of an ASCII space character.
Locale "en":
* Narrow localizations of time formats are even more narrow. For
example, the abbreviation "wk." for "week" is now just "wk".
Locale "ar":
* The code point U+060C (ARABIC COMMA) is now used in place of an ASCII
comma.
* The code point U+200F (RIGHT-TO-LEFT MARK) now appears at the
beginning of many localizations.
* When the "latn" numbering system is used for currency formatting, the
currency symbol more consistently is placed at the end of the pattern.
Locale "he":
* The "many" plural rules category has been removed.
Locales "zh" and "es-419":
* Several display-name localizations were changed.
When LibLocale is placed in the Locale namespace, this will conflict
with the Locale structure in each CLDR generator. Rename this to
"LocaleData", and rename its parent UnicodeLocaleData to just "CLDR"
to avoid confusion between LocaleData and UnicodeLocaleData.
To prepare for placing all CLDR generated data in a new library,
LibLocale, this moves the code generators for the CLDR data to the
LibLocale subfolder.
2022-09-05 14:37:16 -04:00
Renamed from Meta/Lagom/Tools/CodeGenerators/LibUnicode/GenerateUnicodeLocale.cpp (Browse further)