serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-23 18:12:30 +00:00

Author	SHA1	Message	Date
Timothy Flynn	dd88ff70ac	LibUnicode: Remove now unused value-from-string generator overload	2022-01-04 22:49:43 +00:00
Timothy Flynn	437b9fe204	LibUnicode: Convert UnicodeData to link with weak symbols	2022-01-04 22:49:43 +00:00
Timothy Flynn	f576142fe8	LibJS+LibUnicode: Convert UnicodeLocale to link with weak symbols	2022-01-04 22:49:43 +00:00
Timothy Flynn	cf8e11a562	LibUnicode: Add temporary overload of value-from-string generator This is a temporary mechanism while LibUnicode is in an in-between state where some symbols are weakly linked and others are dynamically loaded. The latter require an asm() label to be loaded.	2022-01-04 22:49:43 +00:00
Timothy Flynn	ba4cdf34f8	LibUnicode: Convert UnicodeDateTimeFormat to link with weak symbols	2022-01-04 22:49:43 +00:00
Timothy Flynn	98709d9be1	LibUnicode: Convert UnicodeNumberFormat to link with weak symbols Currently, we load the generated Unicode symbols with dlopen at runtime. This is unnecessary as of `565a880ce5`. Applications that want Unicode data now link directly against the shared library holding that data. So the same functionality can be achieved with weak symbols.	2022-01-04 22:49:43 +00:00
Timothy Flynn	126a3fe180	LibUnicode: Add minimal support for generic & offset-based time zones ECMA-402 now supports short-offset, long-offset, short-generic, and long-generic time zone name formatting. For example, in the en-US locale the America/Eastern time zone would be formatted as: short-offset: GMT-5 long-offset: GMT-05:00 short-generic: ET long-generic: Eastern Time We currently only support the UTC time zone, however. Therefore, this very minimal implementation does not consider GMT offset or generic display names. Instead, the CLDR defines specific strings for UTC.	2022-01-03 15:11:59 +01:00
Timothy Flynn	52394deece	LibUnicode: Remove now unused value-from-string generator overload The generate_value_from_string_for_dynamic_loading() overload was just temporary until all generates were switched over to dynamic loading.	2021-12-21 13:09:49 -08:00
Timothy Flynn	15e1498419	LibUnicode: Dynamically load the generated UnicodeDateTimeFormat symbols	2021-12-21 13:09:49 -08:00
Timothy Flynn	a1f0ca59ae	LibUnicode: Dynamically load the generated UnicodeNumberFormat symbols	2021-12-21 13:09:49 -08:00
Timothy Flynn	09be26b5d2	LibUnicode: Dynamically load the generated UnicodeLocale symbols	2021-12-21 13:09:49 -08:00
Timothy Flynn	3fd53baa25	LibUnicode: Dynamically load the generated UnicodeData symbols The generated data for libunicodedata.so is quite large, and loading it is a price paid by nearly every application by way of depending on LibRegex. In order to defer this cost until an application actually uses one of the surrounding APIs, dynamically load the generated symbols. To be able to load the symbols dynamically, the generated methods must have demangled names. Typically, this is accomplished with `extern "C"` blocks. The clang toolchain complains about this here because the types returned from the generators are strictly C++ types. So to demangle the names, we use the asm() compiler directive to manually define a symbol name; the caveat is that we must be sure the symbols are unique. As an extra precaution, we prefix each symbol name with "unicode_". For more details, see: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html This symbol loader used in this implementation provides the additional benefit of removing many [[maybe_unused]] attributes from the LibUnicode methods. Internally, if ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF, the loader is able to stub out the function pointers it returns. Note that as of this commit, LibUnicode is still directly linked against LibUnicodeData. This commit is just a first step towards removing that.	2021-12-21 13:09:49 -08:00
Michel Hermier	060e5ccbbc	Lagom: Bind `time_zone_list_index_type` in the generator The variable `s_time_zone_list_index_type` seems to be unused (detected when compiling with clang), and it seems logical to bind it even it if it is not used for now.	2021-12-18 21:01:10 -08:00
Timothy Flynn	ce6c515873	LibUnicode: Generate unique list patterns and lists of list patterns	2021-12-13 21:28:56 -08:00
Timothy Flynn	0ad2decd04	LibUnicode: Generate unique list of keyword values	2021-12-13 21:28:56 -08:00
Timothy Flynn	0c6cc4ad96	LibUnicode: Generate unique lists of localized currencies	2021-12-13 21:28:56 -08:00
Timothy Flynn	a45f2ccc25	LibUnicode: Generate unique lists of languages, territories, and scripts	2021-12-13 21:28:56 -08:00
Timothy Flynn	6e5f0b139b	LibUnicode: Remove unused fields from generated structures A couple of structures held a string index that is unused. Removing them also removes the string values from the unique string list.	2021-12-13 21:28:56 -08:00
Timothy Flynn	77fc877c04	LibUnicode: Generate unique lists of hour cycles	2021-12-13 21:28:56 -08:00
Timothy Flynn	6f17696176	LibUnicode: Generate unique lists of time zone structures	2021-12-13 21:28:56 -08:00
Timothy Flynn	df33156462	LibUnicode: Generate unique lists of day period structures	2021-12-13 21:28:56 -08:00
Timothy Flynn	265785e847	LibUnicode: Generate unique day period structures	2021-12-13 21:28:56 -08:00
Timothy Flynn	7af1818e76	LibUnicode: Generate unique time zone structures Each of the 374 locales contain 156 time zone structures. Of these 58,344 structures, 13,578 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	b14b37f386	LibUnicode: Generate unique calendar structures Of the 374 generated calendars, 173 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	4b721597d7	LibUnicode: Generate unique lists of calendar range patterns Of the 374 range pattern lists and 374 range12 pattern lists, 230 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	9fc2442e7d	LibUnicode: Generate unique lists of calendar patterns Of the 374 generated lists, 152 are unique. These lists have upwards of 1000 entries as well, so the de-duplication is particularly nice.	2021-12-13 21:28:56 -08:00
Timothy Flynn	09547f4084	LibUnicode: Generate unique lists of calendar symbols structures Of the 374 generated lists, 120 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	f681ec9d98	LibUnicode: Generate unique calendar symbols structures Each of the 374 generated calendars include 4 symbols structures. Of these 1496 structures, only 386 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	62ff029890	LibUnicode: Generate CalendarSymbols in a predetermined order Similar to commit `2a7f36b392`, this change moves the generated CalendarSymbol enumeration to the public LibUnicode/NumberFormat.h header with a pre-defined set of symbols that we need. This is to prepare for uniquely generating the CalendarSymbols structure.	2021-12-13 21:28:56 -08:00
Timothy Flynn	cf8ef954e5	LibUnicode: Generate unique lists of calendar symbols Each of the 374 generated calendars include 4 sets of symbols, each of which have 3 lists of symbols (narrow, short, long). Of these 4488 lists, only 819 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	af7caa97c8	LibUnicode: Generate unique calendar format structures There are currently 374 calendars generated, each of which include 3 CalendarFormat structures. Of these 1122 instances, only 167 are unique.	2021-12-13 21:28:56 -08:00
Timothy Flynn	415763b1b3	LibUnicode: Define traits for a vector of integral/enum types Any generator which defines a unique storage instance for a list of numbers will need this.	2021-12-13 21:28:56 -08:00
Timothy Flynn	1e95e7716b	LibUnicode: Generate unique units	2021-12-11 14:17:47 +00:00
Timothy Flynn	4c2c8b8e33	LibUnicode: Generate unique number systems	2021-12-11 14:17:47 +00:00
Timothy Flynn	2a7f36b392	LibJS+LibUnicode: Generate unique numeric symbol lists There are 443 number system objects generated, each of which held an array of number system symbols. Of those 443 arrays, only 39 are unique. To uniquely store these, this change moves the generated NumericSymbol enumeration to the public LibUnicode/NumberFormat.h header with a pre- defined set of symbols that we need. This is to ensure the generated, unique arrays are created in a known order with known symbols. While it is unfortunate to no longer discover these symbols at generation time, it does allow us to ignore unwanted symbols and perform less string-to- enumeration conversions at lookup time.	2021-12-11 14:17:47 +00:00
Timothy Flynn	9cc323b0b0	LibUnicode: Generate unique NumberFormat lists for each Unit	2021-12-11 14:17:47 +00:00
Timothy Flynn	cdbfe01827	LibUnicode: Generate unique NumberFormat lists for each NumberSystem	2021-12-11 14:17:47 +00:00
Timothy Flynn	76af9fae63	LibUnicode: Support storing lists in UniqueStorage for code generators The evolution of UniqueStorage has been as follows: 1. It was created as UniqueStringStorage to ensure only one copy of each unique string is generated. Interested parties stored an index into a unique string list, rather than the string itself. Commits: `f9e605397c` and `04e6b43f05` 2. It became apparent that non-string structures could also be de- duplicated to reduce the size of libunicode.so. UniqueStringStorage was generalized to UniqueStorage for this purpose. Commit: `d8e6beb14f` It's now also apparent that there's heavy duplication of lists of structures. For example, the NumberFormat generator stores 4 lists of NumberFormat objects. In total, we currently generate nearly 2,000 lists of these objects, of which 275 are unique. This change updates UniqueStorage to support storing lists. The only change is how the storage is generated - we generate each stored list individually, then an array storing spans of those lists.	2021-12-11 14:17:47 +00:00
Timothy Flynn	a417c23de0	LibUnicode: Parse and generate per-locale day period ranges	2021-12-10 21:27:24 +00:00
Timothy Flynn	fa8e881cfa	LibUnicode: Parse and generate secondary day period symbols Generate morning2, afternoon2, evening2, and night2 symbols.	2021-12-10 21:27:24 +00:00
Timothy Flynn	76aab821f4	LibJS+LibUnicode: Rename some Unicode::DayPeriod values In the CLDR, there aren't "night" values, there are "night1" & "night2" values. This is for locales which use a different name for nighttime depending on the hour. For example, the ja locale uses "夜" between the hours of 19:00 and 23:00, and "夜中" between the hours of 23:00 and 04:00. Our CLDR parser is currently ignoring "night2", so this rename is to prepare for that. We could probably come up with better names, but in the end, the API in LibUnicode will be such that outside callers won't even see Night1, etc.	2021-12-10 21:27:24 +00:00
Timothy Flynn	9d4c4303fd	LibUnicode: Parse and generate date time range format patterns	2021-12-09 23:43:04 +00:00
Timothy Flynn	fe84a365c2	LibUnicode: Parse and generate format pattern skeletons Pattern skeletons are more or less the "key" of format patterns. Every format pattern is assigned a skeleton. Interval patterns (which are not yet parsed) are also assigned a skeleton - this is used to match them to an "owning" format pattern. So we will use the skeleton generated here to match format patterns at runtime with their available interval patterns. An alternative approach would be to append interval patterns directly to their owning format pattern, but this has some draw backs: 1. Skeletons aren't totally unique. A skeleton may appear in both the "dateFormats" and "availableFormats" objects, in which case the same interval formats would be generated more than once. 2. Otherwise unique format patterns may only differ by the interval patterns assigned to them. This would cause the UniqueStorage for the format patterns to increase in size, impacting both compile times and libunicode.so size.	2021-12-09 23:43:04 +00:00
Timothy Flynn	b17c6ab661	LibUnicode: Fix typo in format pattern parser See: https://unicode.org/reports/tr35/tr35-dates.html#dfst-day	2021-12-09 23:43:04 +00:00
Timothy Flynn	b76e44f66f	LibUnicode: Parse and generate time zone names in long and short form	2021-12-08 11:29:36 +00:00
Timothy Flynn	2bbf8aa24c	LibUnicode: Generate era, month, weekday and day period calendar symbols The parsing in parse_calendar_symbols() might be a bit more verbose than it really needs to be, but it is to ensure the symbols are generated in a known order that we can control with enumerations.	2021-12-08 11:29:36 +00:00
Timothy Flynn	9f7c727720	LibJS+LibUnicode: Generate missing patterns with fractionalSecondDigits TR-35's Matching Skeleton algorithm dictates how user requests including fractional second digits should be handled when the CLDR format pattern does not include that field. When the format pattern contains {second}, but does not contain {fractionalSecondDigits}, generate a second pattern which appends "{decimal}{fractionalSecondDigits}" to the {second} field.	2021-12-08 11:29:36 +00:00
Timothy Flynn	6ace4000bf	LibJS+LibUnicode: Supply field type in CalendarPattern's for-each method Some callers will want different behavior depending on what field is being provided to the callback.	2021-12-08 11:29:36 +00:00
Timothy Flynn	80ea6e664d	LibUnicode: Do not set day period format length for {ampm} segments TR-35 does define lengths for {ampm}, but they are unused by ECMA-402. To the contrary, defining the day_period length for this segment will prevent BasicFormatMatcher from ever selecting a pattern that contains this segment. Instead, ECMA-402 will only use the short length for {ampm} segments.	2021-12-08 11:29:36 +00:00
Timothy Flynn	dfe8d02482	LibUnicode: Generate missing format patterns TR-35 describes how to combine date, time, and available formats with date-time format patterns to generate more available format patterns: https://unicode.org/reports/tr35/tr35-dates.html#Missing_Skeleton_Fields Use these steps to generate ~400 new patterns for each calendar. These are required for ECMA-402's BasicFormatMatcher to produce reasonable results.	2021-12-06 15:46:34 +01:00

... 2 3 4 5 6

286 commits