1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-24 15:32:33 +00:00
Commit graph

37 commits

Author SHA1 Message Date
Timothy Flynn
43a3471298 LibLocale: Move locale source files to the LibLocale folder
These are still included in LibUnicode, but this updates their location
and the include paths of other files which include them.
2022-09-05 14:37:16 -04:00
Timothy Flynn
ff48220dca Userland: Move files destined for LibLocale to the Locale namespace 2022-09-05 14:37:16 -04:00
Timothy Flynn
2c2ede8581 LibUnicode: Mark UniqueStringStorage::generate as constant
This is just to allow it to be invoked from callers who hold a constant
UniqueStringStorage instance.
2022-08-17 15:42:12 +01:00
Timothy Flynn
becec3578f LibTimeZone+LibUnicode: Generate string data with run-length encoding
Currently, the unique string lists are stored in the initialized data
sections of their shared libraries. In order to move the data to the
read-only section, generate the strings using RLE arrays.

We generate two arrays: the first is the RLE data itself, the second is
a list of indices into the RLE array for each string. We then generate a
decoding method to convert an RLE string to a StringView.
2022-08-16 16:56:17 +02:00
Timothy Flynn
c2e5b20eb6 LibUnicode: Generate available values for the keywords co, kf, kn, hc
This also ensures we only include values we actually support in the
generated list of available values.
2022-07-15 12:31:43 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
e5f09ea170 Everywhere: Split Error::from_string_literal and Error::from_string_view
Error::from_string_literal now takes direct char const*s, while
Error::from_string_view does what Error::from_string_literal used to do:
taking StringViews. This change will remove the need to insert `sv`
after error strings when returning string literal errors once
StringView(char const*) is removed.

No functional changes.
2022-07-12 23:11:35 +02:00
DexesTTP
7ceeb74535 AK: Use an enum instead of a bool for String::replace(all_occurences)
This commit has no behavior changes.

In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
2022-07-06 11:12:45 +02:00
Sam Atkins
3b1e063d30 LibCore+Everywhere: Make Core::Stream::read() return Bytes
A mistake I've repeatedly made is along these lines:
```c++
auto nread = TRY(source_file->read(buffer));
TRY(destination_file->write(buffer));
```

It's a little clunky to have to create a Bytes or StringView from the
buffer's data pointer and the nread, and easy to forget and just use
the buffer. So, this patch changes the read() function to return a
Bytes of the data that were just read.

The other read_foo() methods will be modified in the same way in
subsequent commits.

Fixes #13687
2022-04-16 13:27:51 -04:00
Timothy Flynn
63c3437274 LibUnicode: Use BCP 47 data to generate available calendars and numbers
BCP 47 will be the single source of truth for known calendar and number
system keywords, and their aliases (e.g. "gregory" is an alias for
"gregorian"). Move the generation of available keywords to where we
parse the BCP 47 data, so that hard-coded aliases may be removed from
other generators.
2022-02-16 07:23:07 -05:00
Timothy Flynn
a338e9403b LibUnicode: Port the CLDR locale generator to the stream API
This adds a generator utility to read an entire file and parse it as a
JSON value. This is heavily used by the CLDR generators. The idea here
is to put the file reading details in the utility so that when we have a
good story for generically reading an entire stream in LibCore, we can
update the generators to use that by only touching this helper.
2022-02-14 11:39:46 -05:00
Timothy Flynn
9327c2233f LibTimeZone: Port the TZDB generator to the stream API
This also moves the open_file helper to the utility file. It's currently
a lambda redefined in each TZDB/Unicode generator. It used to display
the missing command line flag and other info local to each generator.
After switching to LibMain, it just returns a generic error message, and
is duplicated several times.
2022-02-14 11:39:46 -05:00
Timothy Flynn
6efbafa6e0 Everywhere: Update copyrights with my new serenityos.org e-mail :^) 2022-01-31 18:23:22 +00:00
Timothy Flynn
ebd33e580b LibUnicode: Generate a list of available calendars 2022-01-31 00:32:41 +00:00
Timothy Flynn
1c2c98ac5d LibTimeZone: Add method to convert a time zone to a string 2022-01-11 00:36:45 +01:00
Timothy Flynn
b543c3e490 Meta: Don't assume how each generator wants to generate keyed map names
The generate_mapping helper generates a series of structs like:

    Array<SomeType, 1> s_mapping_key_0 {};
    Array<SomeType, 2> s_mapping_key_1 {};
    Array<SomeType, 3> s_mapping_key_2 {};
    Array<Span<SomeType const>> s_mapping { {
        s_mapping_key_0.span(),
        s_mapping_key_1.span(),
        s_mapping_key_2.span(),
    } };

Where the names of the struct were generated by the format_mapping_name
lambda inside the helper. Rather than this lambda making assumptions on
how each generator wants to name its structs, add a parameter for the
caller to provide a naming formatter.

This is because the TimeZoneData generator will want pretty specific
identifier formatting rules.
2022-01-11 00:36:45 +01:00
Timothy Flynn
6da1bfeeea Meta: Support generating case-insensitive value-from-string methods
This also extracts the default parameters for generate_value_from_string
to a structure. This is just to make it cleaner to add new options.
2022-01-11 00:36:45 +01:00
mjz19910
10ec98dd38 Everywhere: Fix spelling mistakes 2022-01-07 15:44:42 +01:00
Timothy Flynn
dd88ff70ac LibUnicode: Remove now unused value-from-string generator overload 2022-01-04 22:49:43 +00:00
Timothy Flynn
cf8e11a562 LibUnicode: Add temporary overload of value-from-string generator
This is a temporary mechanism while LibUnicode is in an in-between state
where some symbols are weakly linked and others are dynamically loaded.
The latter require an asm() label to be loaded.
2022-01-04 22:49:43 +00:00
Timothy Flynn
52394deece LibUnicode: Remove now unused value-from-string generator overload
The generate_value_from_string_for_dynamic_loading() overload was just
temporary until all generates were switched over to dynamic loading.
2021-12-21 13:09:49 -08:00
Timothy Flynn
3fd53baa25 LibUnicode: Dynamically load the generated UnicodeData symbols
The generated data for libunicodedata.so is quite large, and loading it
is a price paid by nearly every application by way of depending on
LibRegex. In order to defer this cost until an application actually uses
one of the surrounding APIs, dynamically load the generated symbols.

To be able to load the symbols dynamically, the generated methods must
have demangled names. Typically, this is accomplished with `extern "C"`
blocks. The clang toolchain complains about this here because the types
returned from the generators are strictly C++ types. So to demangle the
names, we use the asm() compiler directive to manually define a symbol
name; the caveat is that we *must* be sure the symbols are unique. As an
extra precaution, we prefix each symbol name with "unicode_". For more
details, see: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html

This symbol loader used in this implementation provides the additional
benefit of removing many [[maybe_unused]] attributes from the LibUnicode
methods. Internally, if ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF, the
loader is able to stub out the function pointers it returns.

Note that as of this commit, LibUnicode is still directly linked against
LibUnicodeData. This commit is just a first step towards removing that.
2021-12-21 13:09:49 -08:00
Timothy Flynn
415763b1b3 LibUnicode: Define traits for a vector of integral/enum types
Any generator which defines a unique storage instance for a list of
numbers will need this.
2021-12-13 21:28:56 -08:00
Timothy Flynn
76af9fae63 LibUnicode: Support storing lists in UniqueStorage for code generators
The evolution of UniqueStorage has been as follows:

1. It was created as UniqueStringStorage to ensure only one copy of each
   unique string is generated. Interested parties stored an index into
   a unique string list, rather than the string itself.
   Commits: f9e605397c and 04e6b43f05

2. It became apparent that non-string structures could also be de-
   duplicated to reduce the size of libunicode.so. UniqueStringStorage
   was generalized to UniqueStorage for this purpose.
   Commit: d8e6beb14f

It's now also apparent that there's heavy duplication of lists of
structures. For example, the NumberFormat generator stores 4 lists of
NumberFormat objects. In total, we currently generate nearly 2,000 lists
of these objects, of which 275 are unique.

This change updates UniqueStorage to support storing lists. The only
change is how the storage is generated - we generate each stored list
individually, then an array storing spans of those lists.
2021-12-11 14:17:47 +00:00
Timothy Flynn
b76e44f66f LibUnicode: Parse and generate time zone names in long and short form 2021-12-08 11:29:36 +00:00
Timothy Flynn
d8e6beb14f LibUnicode: Generalize the generators' unique string storage
UniqueStringStorage is used to ensure only one copy of a string will be
generated, and interested parties store just an index into the generated
storage. Generalize this class to allow any* type to be stored uniquely.

* To actually be storable, the type must have both an AK::Format and an
AK::Traits overload available.
2021-12-06 15:46:34 +01:00
Timothy Flynn
71903ea7e1 LibUnicode: Parse and generate calendar (ca) Unicode keywords
Also removes a few fly-by "StringView x = nullptr;" unnecessary
initializers.
2021-11-29 22:48:46 +00:00
Timothy Flynn
15fc03ef34 LibUnicode: Sort generated enums case-insensitively
This hasn't mattered yet by chance, because the source for all enums
contains names of the same case. But the enum generated for hour cycle
regions will have mixed case. Sort them case-insensitively in order to
traverse these names in the same order in both generate_enum and
generate_mapping.
2021-11-29 22:48:46 +00:00
Timothy Flynn
f471ecdbe9 LibUnicode: Parse and generate date, time, and date-time format patterns 2021-11-29 22:48:46 +00:00
Timothy Flynn
0aa3e5c2ea LibUnicode: Port generator utility methods to ErrorOr
Most of these were VERIFY-ing for success, but propagating an error
message up to serenity_main() is much nicer than just a SIGABRT.
2021-11-23 22:58:05 +01:00
Timothy Flynn
a13fa15a30 LibUnicode: Generate default-content locales as aliases
Previously, we were just copying the locale data into default-content
locales (for example, copying the "en" data into "en-US"). Instead, we
can just define the default-content locales as aliases to their main
locales.
2021-11-19 11:45:35 +01:00
Timothy Flynn
9d1519e21c LibUnicode: Move GenerateUnicodeData's Alias struct to generator header
This will be used for locale aliases as well. Also rename the "property"
field in this struct to "name", as it no longer is only used for
property aliases.
2021-11-19 11:45:35 +01:00
Andreas Kling
587f9af960 AK: Make JSON parser return ErrorOr<JsonValue> (instead of Optional)
Also add slightly richer parse errors now that we can include a string
literal with returned errors.

This will allow us to use TRY() when working with JSON data.
2021-11-17 00:21:10 +01:00
Timothy Flynn
e9493a2cd5 LibUnicode: Ensure UnicodeNumberFormat is aware of default content
For example, there isn't a unique set of data for the en-US locale;
rather, it defaults to the data for the en locale. See this commit for
much more detail: 357c97dfa8
2021-11-13 11:52:45 +00:00
Timothy Flynn
04e6b43f05 LibUnicode: Move (soon-to-be) common code out of GenerateUnicodeLocale
The data used for number formatting is going to grow quite a bit when
the cldr-units package is parsed. To prevent the generated UnicodeLocale
file from growing outrageously large, the number formatting data can go
into its own file. To prepare for this, move code that will be common
between the generators for UnicodeLocale and UnicodeNumberFormat to the
utility header.
2021-11-12 20:46:38 +00:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Timothy Flynn
f91d63af83 LibUnicode: Generate enum/alias from-string methods without a HashMap
The *_from_string() and resolve_*_alias() generated methods are the last
remaining users of HashMap in the LibUnicode generated files (read: the
last methods not using compile-time structures). This converts these
methods to use an array containing pairs of hash values to the desired
lookup value.

Because this code generation is the same between GenerateUnicodeData.cpp
and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the
LibUnicode generators to contain the method that generates the methods.
2021-10-13 16:38:51 +02:00