1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-23 18:12:30 +00:00
Commit graph

286 commits

Author SHA1 Message Date
Timothy Flynn
dd88ff70ac LibUnicode: Remove now unused value-from-string generator overload 2022-01-04 22:49:43 +00:00
Timothy Flynn
437b9fe204 LibUnicode: Convert UnicodeData to link with weak symbols 2022-01-04 22:49:43 +00:00
Timothy Flynn
f576142fe8 LibJS+LibUnicode: Convert UnicodeLocale to link with weak symbols 2022-01-04 22:49:43 +00:00
Timothy Flynn
cf8e11a562 LibUnicode: Add temporary overload of value-from-string generator
This is a temporary mechanism while LibUnicode is in an in-between state
where some symbols are weakly linked and others are dynamically loaded.
The latter require an asm() label to be loaded.
2022-01-04 22:49:43 +00:00
Timothy Flynn
ba4cdf34f8 LibUnicode: Convert UnicodeDateTimeFormat to link with weak symbols 2022-01-04 22:49:43 +00:00
Timothy Flynn
98709d9be1 LibUnicode: Convert UnicodeNumberFormat to link with weak symbols
Currently, we load the generated Unicode symbols with dlopen at runtime.
This is unnecessary as of 565a880ce5.

Applications that want Unicode data now link directly against the shared
library holding that data. So the same functionality can be achieved
with weak symbols.
2022-01-04 22:49:43 +00:00
Timothy Flynn
126a3fe180 LibUnicode: Add minimal support for generic & offset-based time zones
ECMA-402 now supports short-offset, long-offset, short-generic, and
long-generic time zone name formatting. For example, in the en-US locale
the America/Eastern time zone would be formatted as:

    short-offset: GMT-5
    long-offset: GMT-05:00
    short-generic: ET
    long-generic: Eastern Time

We currently only support the UTC time zone, however. Therefore, this
very minimal implementation does not consider GMT offset or generic
display names. Instead, the CLDR defines specific strings for UTC.
2022-01-03 15:11:59 +01:00
Timothy Flynn
52394deece LibUnicode: Remove now unused value-from-string generator overload
The generate_value_from_string_for_dynamic_loading() overload was just
temporary until all generates were switched over to dynamic loading.
2021-12-21 13:09:49 -08:00
Timothy Flynn
15e1498419 LibUnicode: Dynamically load the generated UnicodeDateTimeFormat symbols 2021-12-21 13:09:49 -08:00
Timothy Flynn
a1f0ca59ae LibUnicode: Dynamically load the generated UnicodeNumberFormat symbols 2021-12-21 13:09:49 -08:00
Timothy Flynn
09be26b5d2 LibUnicode: Dynamically load the generated UnicodeLocale symbols 2021-12-21 13:09:49 -08:00
Timothy Flynn
3fd53baa25 LibUnicode: Dynamically load the generated UnicodeData symbols
The generated data for libunicodedata.so is quite large, and loading it
is a price paid by nearly every application by way of depending on
LibRegex. In order to defer this cost until an application actually uses
one of the surrounding APIs, dynamically load the generated symbols.

To be able to load the symbols dynamically, the generated methods must
have demangled names. Typically, this is accomplished with `extern "C"`
blocks. The clang toolchain complains about this here because the types
returned from the generators are strictly C++ types. So to demangle the
names, we use the asm() compiler directive to manually define a symbol
name; the caveat is that we *must* be sure the symbols are unique. As an
extra precaution, we prefix each symbol name with "unicode_". For more
details, see: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html

This symbol loader used in this implementation provides the additional
benefit of removing many [[maybe_unused]] attributes from the LibUnicode
methods. Internally, if ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF, the
loader is able to stub out the function pointers it returns.

Note that as of this commit, LibUnicode is still directly linked against
LibUnicodeData. This commit is just a first step towards removing that.
2021-12-21 13:09:49 -08:00
Michel Hermier
060e5ccbbc Lagom: Bind time_zone_list_index_type in the generator
The variable `s_time_zone_list_index_type` seems to be unused (detected
when compiling with clang), and it seems logical to bind it even it if
it is not used for now.
2021-12-18 21:01:10 -08:00
Timothy Flynn
ce6c515873 LibUnicode: Generate unique list patterns and lists of list patterns 2021-12-13 21:28:56 -08:00
Timothy Flynn
0ad2decd04 LibUnicode: Generate unique list of keyword values 2021-12-13 21:28:56 -08:00
Timothy Flynn
0c6cc4ad96 LibUnicode: Generate unique lists of localized currencies 2021-12-13 21:28:56 -08:00
Timothy Flynn
a45f2ccc25 LibUnicode: Generate unique lists of languages, territories, and scripts 2021-12-13 21:28:56 -08:00
Timothy Flynn
6e5f0b139b LibUnicode: Remove unused fields from generated structures
A couple of structures held a string index that is unused. Removing them
also removes the string values from the unique string list.
2021-12-13 21:28:56 -08:00
Timothy Flynn
77fc877c04 LibUnicode: Generate unique lists of hour cycles 2021-12-13 21:28:56 -08:00
Timothy Flynn
6f17696176 LibUnicode: Generate unique lists of time zone structures 2021-12-13 21:28:56 -08:00
Timothy Flynn
df33156462 LibUnicode: Generate unique lists of day period structures 2021-12-13 21:28:56 -08:00
Timothy Flynn
265785e847 LibUnicode: Generate unique day period structures 2021-12-13 21:28:56 -08:00
Timothy Flynn
7af1818e76 LibUnicode: Generate unique time zone structures
Each of the 374 locales contain 156 time zone structures. Of these
58,344 structures, 13,578 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
b14b37f386 LibUnicode: Generate unique calendar structures
Of the 374 generated calendars, 173 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
4b721597d7 LibUnicode: Generate unique lists of calendar range patterns
Of the 374 range pattern lists and 374 range12 pattern lists, 230 are
unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
9fc2442e7d LibUnicode: Generate unique lists of calendar patterns
Of the 374 generated lists, 152 are unique. These lists have upwards of
1000 entries as well, so the de-duplication is particularly nice.
2021-12-13 21:28:56 -08:00
Timothy Flynn
09547f4084 LibUnicode: Generate unique lists of calendar symbols structures
Of the 374 generated lists, 120 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
f681ec9d98 LibUnicode: Generate unique calendar symbols structures
Each of the 374 generated calendars include 4 symbols structures. Of
these 1496 structures, only 386 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
62ff029890 LibUnicode: Generate CalendarSymbols in a predetermined order
Similar to commit 2a7f36b392, this change moves the generated
CalendarSymbol enumeration to the public LibUnicode/NumberFormat.h
header with a pre-defined set of symbols that we need. This is to
prepare for uniquely generating the CalendarSymbols structure.
2021-12-13 21:28:56 -08:00
Timothy Flynn
cf8ef954e5 LibUnicode: Generate unique lists of calendar symbols
Each of the 374 generated calendars include 4 sets of symbols, each of
which have 3 lists of symbols (narrow, short, long). Of these 4488
lists, only 819 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
af7caa97c8 LibUnicode: Generate unique calendar format structures
There are currently 374 calendars generated, each of which include 3
CalendarFormat structures. Of these 1122 instances, only 167 are unique.
2021-12-13 21:28:56 -08:00
Timothy Flynn
415763b1b3 LibUnicode: Define traits for a vector of integral/enum types
Any generator which defines a unique storage instance for a list of
numbers will need this.
2021-12-13 21:28:56 -08:00
Timothy Flynn
1e95e7716b LibUnicode: Generate unique units 2021-12-11 14:17:47 +00:00
Timothy Flynn
4c2c8b8e33 LibUnicode: Generate unique number systems 2021-12-11 14:17:47 +00:00
Timothy Flynn
2a7f36b392 LibJS+LibUnicode: Generate unique numeric symbol lists
There are 443 number system objects generated, each of which held an
array of number system symbols. Of those 443 arrays, only 39 are unique.

To uniquely store these, this change moves the generated NumericSymbol
enumeration to the public LibUnicode/NumberFormat.h header with a pre-
defined set of symbols that we need. This is to ensure the generated,
unique arrays are created in a known order with known symbols. While it
is unfortunate to no longer discover these symbols at generation time,
it does allow us to ignore unwanted symbols and perform less string-to-
enumeration conversions at lookup time.
2021-12-11 14:17:47 +00:00
Timothy Flynn
9cc323b0b0 LibUnicode: Generate unique NumberFormat lists for each Unit 2021-12-11 14:17:47 +00:00
Timothy Flynn
cdbfe01827 LibUnicode: Generate unique NumberFormat lists for each NumberSystem 2021-12-11 14:17:47 +00:00
Timothy Flynn
76af9fae63 LibUnicode: Support storing lists in UniqueStorage for code generators
The evolution of UniqueStorage has been as follows:

1. It was created as UniqueStringStorage to ensure only one copy of each
   unique string is generated. Interested parties stored an index into
   a unique string list, rather than the string itself.
   Commits: f9e605397c and 04e6b43f05

2. It became apparent that non-string structures could also be de-
   duplicated to reduce the size of libunicode.so. UniqueStringStorage
   was generalized to UniqueStorage for this purpose.
   Commit: d8e6beb14f

It's now also apparent that there's heavy duplication of lists of
structures. For example, the NumberFormat generator stores 4 lists of
NumberFormat objects. In total, we currently generate nearly 2,000 lists
of these objects, of which 275 are unique.

This change updates UniqueStorage to support storing lists. The only
change is how the storage is generated - we generate each stored list
individually, then an array storing spans of those lists.
2021-12-11 14:17:47 +00:00
Timothy Flynn
a417c23de0 LibUnicode: Parse and generate per-locale day period ranges 2021-12-10 21:27:24 +00:00
Timothy Flynn
fa8e881cfa LibUnicode: Parse and generate secondary day period symbols
Generate morning2, afternoon2, evening2, and night2 symbols.
2021-12-10 21:27:24 +00:00
Timothy Flynn
76aab821f4 LibJS+LibUnicode: Rename some Unicode::DayPeriod values
In the CLDR, there aren't "night" values, there are "night1" & "night2"
values. This is for locales which use a different name for nighttime
depending on the hour. For example, the ja locale uses "夜" between the
hours of 19:00 and 23:00, and "夜中" between the hours of 23:00 and
04:00. Our CLDR parser is currently ignoring "night2", so this rename
is to prepare for that.

We could probably come up with better names, but in the end, the API in
LibUnicode will be such that outside callers won't even see Night1, etc.
2021-12-10 21:27:24 +00:00
Timothy Flynn
9d4c4303fd LibUnicode: Parse and generate date time range format patterns 2021-12-09 23:43:04 +00:00
Timothy Flynn
fe84a365c2 LibUnicode: Parse and generate format pattern skeletons
Pattern skeletons are more or less the "key" of format patterns. Every
format pattern is assigned a skeleton. Interval patterns (which are not
yet parsed) are also assigned a skeleton - this is used to match them to
an "owning" format pattern. So we will use the skeleton generated here
to match format patterns at runtime with their available interval
patterns.

An alternative approach would be to append interval patterns directly to
their owning format pattern, but this has some draw backs:

    1. Skeletons aren't totally unique. A skeleton may appear in both
       the "dateFormats" and "availableFormats" objects, in which case
       the same interval formats would be generated more than once.

    2. Otherwise unique format patterns may only differ by the interval
       patterns assigned to them. This would cause the UniqueStorage for
       the format patterns to increase in size, impacting both compile
       times and libunicode.so size.
2021-12-09 23:43:04 +00:00
Timothy Flynn
b17c6ab661 LibUnicode: Fix typo in format pattern parser
See: https://unicode.org/reports/tr35/tr35-dates.html#dfst-day
2021-12-09 23:43:04 +00:00
Timothy Flynn
b76e44f66f LibUnicode: Parse and generate time zone names in long and short form 2021-12-08 11:29:36 +00:00
Timothy Flynn
2bbf8aa24c LibUnicode: Generate era, month, weekday and day period calendar symbols
The parsing in parse_calendar_symbols() might be a bit more verbose than
it really needs to be, but it is to ensure the symbols are generated in
a known order that we can control with enumerations.
2021-12-08 11:29:36 +00:00
Timothy Flynn
9f7c727720 LibJS+LibUnicode: Generate missing patterns with fractionalSecondDigits
TR-35's Matching Skeleton algorithm dictates how user requests including
fractional second digits should be handled when the CLDR format pattern
does not include that field. When the format pattern contains {second},
but does not contain {fractionalSecondDigits}, generate a second pattern
which appends "{decimal}{fractionalSecondDigits}" to the {second} field.
2021-12-08 11:29:36 +00:00
Timothy Flynn
6ace4000bf LibJS+LibUnicode: Supply field type in CalendarPattern's for-each method
Some callers will want different behavior depending on what field is
being provided to the callback.
2021-12-08 11:29:36 +00:00
Timothy Flynn
80ea6e664d LibUnicode: Do not set day period format length for {ampm} segments
TR-35 does define lengths for {ampm}, but they are unused by ECMA-402.
To the contrary, defining the day_period length for this segment will
prevent BasicFormatMatcher from ever selecting a pattern that contains
this segment. Instead, ECMA-402 will only use the short length for
{ampm} segments.
2021-12-08 11:29:36 +00:00
Timothy Flynn
dfe8d02482 LibUnicode: Generate missing format patterns
TR-35 describes how to combine date, time, and available formats with
date-time format patterns to generate more available format patterns:
https://unicode.org/reports/tr35/tr35-dates.html#Missing_Skeleton_Fields

Use these steps to generate ~400 new patterns for each calendar. These
are required for ECMA-402's BasicFormatMatcher to produce reasonable
results.
2021-12-06 15:46:34 +01:00