serenity

mirror of https://github.com/RGBCube/serenity synced 2025-09-19 03:46:17 +00:00

Author	SHA1	Message	Date
Timothy Flynn	32c07bc6c3	LibUnicode: Generate per-locale data for the "noon" fixed day period Note that not all locales have this day period.	2022-07-21 20:36:03 +01:00
Timothy Flynn	0a6363d3e9	LibUnicode: Implement the range pattern processing algorithm This algorithm is to inject spacing around the range separator under certain conditions. For example, in en-US, the range [3, 5] should be formatted as "3–5" if unitless, but as "$3 – $5" for currency.	2022-07-20 22:30:16 +01:00
Timothy Flynn	b2709f161e	LibUnicode: Generate per-locale approximately & range separator symbols	2022-07-20 22:30:16 +01:00
Timothy Flynn	998f62936b	LibUnicode: Remove obsolete Unicode::get_default_number_system This has been superseded by get_preferred_keyword_value_for_locale, which doesn't require allocating a Vector just to return its first element.	2022-07-15 12:31:43 +02:00
Timothy Flynn	f8f7015419	LibUnicode: Generate a method to lookup locale-preferred keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	80568d5776	LibUnicode: Generate a method to lookup available keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	c2e5b20eb6	LibUnicode: Generate available values for the keywords co, kf, kn, hc This also ensures we only include values we actually support in the generated list of available values.	2022-07-15 12:31:43 +02:00
Timothy Flynn	a337b059dd	LibUnicode: Parse and generate per-locale plural ranges	2022-07-12 00:43:34 +01:00
Timothy Flynn	f672b4c151	LibUnicode: Remove now-unused Unicode::select_pattern_with_plurality	2022-07-08 20:33:52 +02:00
Timothy Flynn	232df4196b	LibUnicode: Replace NumberFormat::Plurality with Unicode::PluralCategory To prepare for using plural rules within number & duration format, this removes the NumberFormat::Plurality enumeration. This also adds PluralCategory::ExactlyZero & PluralCategory::ExactlyOne. These are used in locales like French, where PluralCategory::One really means any value from 0.00 to 1.99. PluralCategory::ExactlyOne means only the value 1, as the name implies. These exact rules are not known by the general plural rules, they are explicitly for number / currency format.	2022-07-08 20:33:52 +02:00
Timothy Flynn	cc5c707649	LibJS+LibUnicode: Do not generate the PluralCategory enum The PluralCategory enum is currently generated for plural rules. Instead of generating it, this moves the enum to the public LibUnicode header. While it was nice to auto-discover these values, they are well defined by TR-35, and we will need their values from within the number format code generator (which can't rely on the plural rules generator having run yet). Further, number format will require additional values in the enum that plural rules doesn't know about.	2022-07-08 20:33:52 +02:00
Timothy Flynn	bf85bf2a9e	LibJS: Use Intl.PluralRules within Intl.RelativeFormat The Polish test cases added here cover previous failures from test262, due to the way that 0 is specified to be "many" in Polish.	2022-07-08 11:51:54 +02:00
Timothy Flynn	8aeacccd82	LibUnicode: Generate a list of available plural categories per locale Separate lists are generated for cardinal and ordinal form.	2022-07-08 11:51:54 +02:00
Timothy Flynn	ea78bac36d	LibUnicode: Parse and generate per-locale plural rules from the CLDR Plural rules in the CLDR are of the form: "cs": { "pluralRule-count-one": "i = 1 and v = 0 @integer 1", "pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4", "pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...", "pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..." } The syntax is described here: https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax There are up to 2 sets of rules for each locale, a cardinal set and an ordinal set. The approach here is to generate a C++ function for each set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is transpiled to a C++ if-statement within its function. Then lookup tables are generated to match locales to their generated functions. NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile flags because the generated plural rules have lots of extra parentheses (because e.g. we need to selectively negate and combine rules). The code to generate only exactly the right number of parentheses is quite hairy, so this just tells the compiler to ignore the extras.	2022-07-08 11:51:54 +02:00
Timothy Flynn	12e7c0808a	LibUnicode: Generate per-region week data This includes: * The minimum number of days in a week for that week to count as the first week of a new year. * The day to be shown as the first day of the week in a calendar. * The start/end days of the weekend. Like the existing hour cycle data, week data is presented per-region in the CLDR, rather than per-locale. The method to add likely subtags to a locale to perform region lookups is the same. The list of regions in the CLDR for hour cycle, minimum days, first day, and weekend days are quite different. So rather than changing the existing HourCycleRegion enum to a generic Region enum, we generate separate enums for each of the week data fields. This allows each lookup into these fields to remain simple array-based index access, without any "jumps" for regions that don't have CLDR data for a field.	2022-07-06 16:56:42 +02:00
Timothy Flynn	4868b888be	LibUnicode: Generate per-locale text layout information Currently contains just each locale's character order, but is set up to easily add other text layout fields from the CLDR if ECMA-402 eventually requires them.	2022-07-06 16:56:42 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00
Idan Horowitz	573061e76c	LibUnicode: Extract the timeSeparator numeric symbol from CLDR This will be used by Intl.DurationFormat	2022-07-01 01:00:05 +03:00
Timothy Flynn	1f2542247f	LibUnicode: Upgrade to CLDR version 41.0.0 Release notes: https://cldr.unicode.org/index/downloads/cldr-41 Note that the HourCycleRegion enum now contains 272 entires, thus needs to be bumped from u8 to u16.	2022-04-07 08:29:10 -04:00
Timothy Flynn	70ede2825e	LibUnicode: Use BCP 47 data to filter valid calendar names	2022-02-16 07:23:07 -05:00
Timothy Flynn	71d86261c3	LibUnicode: Use BCP 47 data to filter valid numbering system names There isn't too much of an effective difference here other than that the BCP 47 data contains some aliases we would otherwise not handle.	2022-02-16 07:23:07 -05:00
Timothy Flynn	63c3437274	LibUnicode: Use BCP 47 data to generate available calendars and numbers BCP 47 will be the single source of truth for known calendar and number system keywords, and their aliases (e.g. "gregory" is an alias for "gregorian"). Move the generation of available keywords to where we parse the BCP 47 data, so that hard-coded aliases may be removed from other generators.	2022-02-16 07:23:07 -05:00
Timothy Flynn	89ead8c00a	LibJS+LibUnicode: Parse Unicode keywords from the BCP 47 CLDR package We have a fair amount of hard-coded keywords / aliases that can now be replaced with real data from BCP 47. As a result, the also changes the awkward way we were previously generating keys. Before, we were more or less generating keywords as a CSV list of keys, e.g. for the "nu" key, we'd generate "latn,arab,grek" (ordered by locale preference). Then at runtime, we'd split on the comma. We now just generate spans of keywords directly.	2022-02-16 07:23:07 -05:00
thankyouverycool	0505e031f1	Meta+LibUnicode: Download and parse Unicode block properties This parses Blocks.txt for CharacterType properties and creates a global display array for use in apps.	2022-02-15 10:13:19 -05:00
Idan Horowitz	4967bcd4ce	LibUnicode: Implement sentence segmentation	2022-01-31 21:05:04 +02:00
Idan Horowitz	a593a5c8ab	LibUnicode: Implement word segmentation	2022-01-31 21:05:04 +02:00
Idan Horowitz	58b0eed6a7	LibUnicode: Implement grapheme segmentation	2022-01-31 21:05:04 +02:00
Idan Horowitz	2d50c08f34	LibUnicode: Download and parse {Grapheme,Word,Sentence} break props	2022-01-31 21:05:04 +02:00
Timothy Flynn	6efbafa6e0	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	2022-01-31 18:23:22 +00:00
Timothy Flynn	bb0f548614	LibUnicode: Generate a list of available currencies	2022-01-31 00:32:41 +00:00
Timothy Flynn	481ced53d8	LibUnicode: Generate a list of available numbering systems	2022-01-31 00:32:41 +00:00
Timothy Flynn	ebd33e580b	LibUnicode: Generate a list of available calendars	2022-01-31 00:32:41 +00:00
Timothy Flynn	f8892fdea2	LibUnicode: Templatize our naive implementation of plurality selection As we didn't (and still don't) have Intl.PluralRules when we implemented Intl.NumberFormat, we use a locale-unaware basic implementation to pick a pattern based on a number's value. Templatize this method for now to work other other format-like structures (will be used for relative-time formatting).	2022-01-27 21:16:44 +00:00
Timothy Flynn	789f093b2e	LibUnicode: Parse and generate relative-time format patterns Relative-time format patterns are of one of two forms: * Tensed - refer to the past or the future, e.g. "N years ago" or "in N years". * Numbered - refer to a specific numeric value, e.g. "in 1 year" becomes "next year" and "in 0 years" becomes "this year". In ECMA-402, tensed and numbered refer to the numeric formatting options of "always" and "auto", respectively.	2022-01-27 21:16:44 +00:00
Timothy Flynn	2d2f713426	LibUnicode: Generate per-locale minimum grouping digit values Previously, we were breaking up digits into groups without regard for the locale's minimumGroupingDigits value in the CLDR. This value is 1 in most locales, but is 2 in locales such as pl-PL. What this means is that in those locales, the group separator should only be inserted if the thousands group has at least 2 digits. So 1000 is formatted as "1,000" in en-US, but "1000" in pl-PL. And 10000 is "10,000" in en-US and "10 000" in pl-PL.	2022-01-27 20:30:52 +00:00
Timothy Flynn	bced4e9324	LibJS+LibUnicode: Convert Intl.ListFormat to use Unicode::Style Remove ListFormat's own definition of the Style enum, which was further duplicated by a generated ListPatternStyle enum with the same values.	2022-01-25 19:02:59 +00:00
Timothy Flynn	e261132e8b	LibUnicode: Add helper methods to convert a Style to and from a string This conversion is duplicated a few times in our Intl implementation, so let's just define these once and be done with it.	2022-01-25 19:02:59 +00:00
Timothy Flynn	7f6edb7976	LibUnicode: Remove the Unicode::Style::Numeric value It is unused.	2022-01-25 19:02:59 +00:00
Timothy Flynn	0a4430fc41	LibJS+LibTimeZone+LibUnicode: Remove direct linkage to LibTimeZone This is no longer needed now that LibTimeZone is included within LibC. Remove the direct linkage so that others do not mistakenly copy-paste the CMakeLists text elsewhere.	2022-01-23 12:48:26 +00:00
Timothy Flynn	4400150cd2	LibJS+LibUnicode: Return the appropriate time zone name depending on DST	2022-01-19 21:20:41 +00:00
Timothy Flynn	70f49d0696	LibJS+LibTimeZone+LibUnicode: Indicate whether a time zone is in DST Return whether the time zone is in DST during the provided time from TimeZone::get_time_zone_offset,	2022-01-19 21:20:41 +00:00
Timothy Flynn	701b7810ba	LibUnicode: Generate code point abbreviations	2022-01-18 15:13:25 +00:00
Timothy Flynn	c86f7a675d	LibUnicode: Do not limit language display names to known locales Currently, the UnicodeLocale generator collects a list of known locales from the CLDR before processing language display names. For each locale, the identifier is broken into language, script, and region subtags, and we create a list of seen languages. When processing display names, we skip languages we hadn't seen in that first step. This is insufficient for language display names like "en-GB", which do not have an locale entry in the CLDR, and thus are skipped. So instead, create the list of known languages by actually reading through the list of languages which have a display name.	2022-01-13 23:05:31 +01:00
Timothy Flynn	b0671ceb74	LibUnicode: Add a method to combine locale subtags into a display string This is just a convenience wrapper around the underlying generated APIs.	2022-01-13 23:05:31 +01:00
Timothy Flynn	91acc2e9c5	LibUnicode: Parse and generate locale display patterns These patterns indicate how to display locale strings when that locale contains multiple subtags. For example, "en-US" would be displayed as "English (United States)".	2022-01-13 23:05:31 +01:00
Timothy Flynn	8126cb2545	LibJS+LibUnicode: Remove unnecessary locale currency mapping wrapper Before LibUnicode generated methods were weakly linked, we had a public method (get_locale_currency_mapping) for retrieving currency mappings. That method invoked one of several style-specific methods that only existed in the generated UnicodeLocale. One caveat of weakly linked functions is that every such function must have a public declaration. The result is that each of those styled methods are declared publicly, which makes the wrapper redundant because it is just as easy to invoke the method for the desired style.	2022-01-13 13:43:57 +01:00
Timothy Flynn	0d75949827	LibUnicode: Parse and generate locale display names for date fields	2022-01-13 13:43:57 +01:00
Timothy Flynn	7f162c471d	LibUnicode: Parse and generate locale display names for calendars Note there's a bit of an unfortunate duplication in the calendar enum generated by UnicodeLocale and the existing enum generated by UnicodeDateTimeFormat. The former contains every calendar known to the CLDR, whereas the latter contains the calendars we've actually parsed for DateTimeFormat (currently only Gregorian). The new enum generated here can be removed once DateTimeFormat knows about all calendars.	2022-01-13 13:43:57 +01:00
Timothy Flynn	c5138f0f2b	LibUnicode: Parse number system digits from the CLDR We had a hard-coded table of number system digits copied from ECMA-402. Turns out these digits are in the CLDR, so let's parse the digits from there instead of hard-coding them.	2022-01-12 10:49:07 +01:00
Timothy Flynn	d50f5e14f8	LibUnicode: Fall back to GMT offset when a time zone name is unavailable The following table in TR-35 includes a web of fall back rules when the requested time zone style is unavailable: https://unicode.org/reports/tr35/tr35-dates.html#dfst-zone Conveniently, the subset of styles supported by ECMA-402 (and therefore LibUnicode) all either fall back to GMT offset or to a style that is unsupported but itself falls back to GMT offset.	2022-01-11 23:56:35 +01:00

1 2 3 4 5

209 commits