serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-30 13:12:44 +00:00

Author	SHA1	Message	Date
Timothy Flynn	63c3437274	LibUnicode: Use BCP 47 data to generate available calendars and numbers BCP 47 will be the single source of truth for known calendar and number system keywords, and their aliases (e.g. "gregory" is an alias for "gregorian"). Move the generation of available keywords to where we parse the BCP 47 data, so that hard-coded aliases may be removed from other generators.	2022-02-16 07:23:07 -05:00
Timothy Flynn	89ead8c00a	LibJS+LibUnicode: Parse Unicode keywords from the BCP 47 CLDR package We have a fair amount of hard-coded keywords / aliases that can now be replaced with real data from BCP 47. As a result, the also changes the awkward way we were previously generating keys. Before, we were more or less generating keywords as a CSV list of keys, e.g. for the "nu" key, we'd generate "latn,arab,grek" (ordered by locale preference). Then at runtime, we'd split on the comma. We now just generate spans of keywords directly.	2022-02-16 07:23:07 -05:00
Timothy Flynn	6efbafa6e0	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	2022-01-31 18:23:22 +00:00
Timothy Flynn	bced4e9324	LibJS+LibUnicode: Convert Intl.ListFormat to use Unicode::Style Remove ListFormat's own definition of the Style enum, which was further duplicated by a generated ListPatternStyle enum with the same values.	2022-01-25 19:02:59 +00:00
Timothy Flynn	e261132e8b	LibUnicode: Add helper methods to convert a Style to and from a string This conversion is duplicated a few times in our Intl implementation, so let's just define these once and be done with it.	2022-01-25 19:02:59 +00:00
Timothy Flynn	7f6edb7976	LibUnicode: Remove the Unicode::Style::Numeric value It is unused.	2022-01-25 19:02:59 +00:00
Timothy Flynn	b0671ceb74	LibUnicode: Add a method to combine locale subtags into a display string This is just a convenience wrapper around the underlying generated APIs.	2022-01-13 23:05:31 +01:00
Timothy Flynn	91acc2e9c5	LibUnicode: Parse and generate locale display patterns These patterns indicate how to display locale strings when that locale contains multiple subtags. For example, "en-US" would be displayed as "English (United States)".	2022-01-13 23:05:31 +01:00
Timothy Flynn	8126cb2545	LibJS+LibUnicode: Remove unnecessary locale currency mapping wrapper Before LibUnicode generated methods were weakly linked, we had a public method (get_locale_currency_mapping) for retrieving currency mappings. That method invoked one of several style-specific methods that only existed in the generated UnicodeLocale. One caveat of weakly linked functions is that every such function must have a public declaration. The result is that each of those styled methods are declared publicly, which makes the wrapper redundant because it is just as easy to invoke the method for the desired style.	2022-01-13 13:43:57 +01:00
Timothy Flynn	0d75949827	LibUnicode: Parse and generate locale display names for date fields	2022-01-13 13:43:57 +01:00
Timothy Flynn	7f162c471d	LibUnicode: Parse and generate locale display names for calendars Note there's a bit of an unfortunate duplication in the calendar enum generated by UnicodeLocale and the existing enum generated by UnicodeDateTimeFormat. The former contains every calendar known to the CLDR, whereas the latter contains the calendars we've actually parsed for DateTimeFormat (currently only Gregorian). The new enum generated here can be removed once DateTimeFormat knows about all calendars.	2022-01-13 13:43:57 +01:00
Timothy Flynn	f576142fe8	LibJS+LibUnicode: Convert UnicodeLocale to link with weak symbols	2022-01-04 22:49:43 +00:00
Timothy Flynn	97508b74eb	LibUnicode: Remove declaration of function which moved to another header Unicode::get_number_system_symbol is declared in UnicodeNumberFormat and defined in UnicodeNumberFormat.cpp.	2021-12-21 13:09:49 -08:00
Timothy Flynn	914675e826	LibJS+LibUnicode: Separate number formatting methods from Locale.h Currently, we generate separate data files for locale and number format related tables/methods, but provide public accessors for all of the data in one Locale.h file. Rather than continuing this trend for date-time, relative time, etc. formatting, it's a bit easier to reason about if the public accessors are also in separate files.	2021-11-29 22:48:46 +00:00
Timothy Flynn	cafb717486	LibUnicode: Parse and generate CLDR unit data for Intl.NumberFormat The units data is in another CLDR package, cldr-units.	2021-11-16 23:14:09 +00:00
Timothy Flynn	04b8b87c17	LibJS+LibUnicode: Support multiple identifiers within format pattern This wasn't the case for compact patterns, but unit patterns can contain multiple (up to 2, really) identifiers that must each be recognized by LibJS. Each generated NumberFormat object now stores an array of identifiers parsed. The format pattern itself is encoded with the index into this array for that identifier, e.g. the compact format string "0K" will become "{number}{compactIdentifier:0}".	2021-11-16 23:14:09 +00:00
Timothy Flynn	3b68370212	LibJS+LibUnicode: Rename the generated compact_identifier to identifier This field is currently used to store the StringView into the compact name/symbol in the format string. Units will need to store a similar field, so rename the field to be more generic, and extract the parser for it.	2021-11-16 23:14:09 +00:00
Timothy Flynn	6d34a0b4e8	LibJS+LibUnicode: Rename method to select a NumberFormat plurality Instead of currency pattern lookups within select_currency_unit_pattern, rename the method to select_pattern_with_plurality and accept any list of patterns. This method will be needed for units.	2021-11-16 23:14:09 +00:00
Timothy Flynn	1f546476d5	LibJS+LibUnicode: Fix computation of compact pattern exponents The compact scale of each formatting rule was precomputed in commit: `be69eae651` Using the formula: compact scale = magnitude - pattern scale This computation was off-by-one. For example, consider the format key "10000-count-one", which maps to "00 thousand" in en-US. What we are really after is the exponent that best represents the string "thousand" for values greater than 10000 and less than 100000 (the next format key). We were previously doing: log10(10000) - "00 thousand".count("0") = 2 Which clearly isn't what we want. Instead, if we do: log10(10000) + 1 - "00 thousand".count("0") = 3 We get the correct exponent for each format key for each locale. This commit also renames the generated variable from "compact_scale" to "exponent" to match the terminology used in ECMA-402.	2021-11-16 00:56:55 +00:00
Timothy Flynn	48d5684780	LibUnicode: Parse compact identifiers and replace them with a format key For example, in en-US, the decimal, long compact pattern for numbers between 10,000 and 100,000 is "00 thousand". In that pattern, "thousand" is the compact identifier, and the generated format pattern is now "{number} {compactIdentifier}". This also generates that identifier as its own field in the NumberFormat structure.	2021-11-16 00:56:55 +00:00
Timothy Flynn	30fbb7d9cd	LibUnicode: Parse and generate scientific formatting rules	2021-11-14 17:00:35 +00:00
Timothy Flynn	3b7f5af042	LibUnicode: Generate primary and secondary number grouping sizes Most locales have a single grouping size (the number of integer digits to be written before inserting a grouping separator). However some have a primary and secondary size. We parse the primary size as the size used for the least significant integer digits, and the secondary size for the most significant.	2021-11-14 10:35:19 +00:00
Timothy Flynn	c65dea64bd	LibJS+LibUnicode: Don't remove {currency} keys in GetNumberFormatPattern In order to implement Intl.NumberFormat.prototype.formatToParts, do not replace {currency} keys in the format pattern before ECMA-402 tells us to. Otherwise, the array return by formatToParts will not contain the expected currency key. Early replacement was done to avoid resolving the currency display more than once, as it involves a couple of round trips to search through LibUnicode data. So this adds a non-standard method to NumberFormat to do this resolution and cache the result. Another side effect of this change is that LibUnicode must replace unit format patterns of the form "{0} {1}" during code generation. These were previously skipped during code generation because LibJS would just replace the keys with the currency display at runtime. But now that the currency display injection is delayed, any {0} or {1} keys in the format pattern will cause PartitionNumberPattern to abort.	2021-11-13 19:01:25 +00:00
Timothy Flynn	a701ed52fc	LibJS+LibUnicode: Fully implement currency number formatting Currencies are a bit strange; the layout of currency data in the CLDR is not particularly compatible with what ECMA-402 expects. For example, the currency format in the "en" and "ar" locales for the Latin script are: en: "¤#,##0.00" ar: "¤\u00A0#,##0.00" Note how the "ar" locale has a non-breaking space after the currency symbol (¤), but "en" does not. This does not mean that this space will appear in the "ar"-formatted string, nor does it mean that a space won't appear in the "en"-formatted string. This is a runtime decision based on the currency display chosen by the user ("$" vs. "USD" vs. "US dollar") and other rules in the Unicode TR-35 spec. ECMA-402 shies away from the nuances here with "implementation-defined" steps. LibUnicode will store the data parsed from the CLDR however it is presented; making decisions about spacing, etc. will occur at runtime based on user input.	2021-11-13 11:52:45 +00:00
Timothy Flynn	9421d5c0cf	LibUnicode: Generate currency unit-pattern number formats These are used when formatting a number as currency with a display option of "name" (e.g. for USD, the name is "US Dollars" in en-US). These patterns appear in the CLDR in a different manner than other number formats that are pluralized. They are of the form "{0} {1}", therefore do not undergo subpattern replacements.	2021-11-13 11:52:45 +00:00
Timothy Flynn	39e031c4dd	LibJS+LibUnicode: Generate all styles of currency localizations Currently, LibUnicode is only parsing and generating the "long" style of currency display names. However, the CLDR contains "short" and "narrow" forms as well that need to be handled. Parse these, and update LibJS to actually respect the "style" option provided by the user for displaying currencies with Intl.DisplayNames. Note: There are some discrepencies between the engines on how style is handled. In particular, running: new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd') Gives: SpiderMoney: "USD" V8: "US Dollar" LibJS: "$" And running: new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd') Gives: SpiderMonkey: "$" V8: "US Dollar" LibJS: "$" My best guess is V8 isn't handling style, and just returning the long form (which is what LibJS did before this commit). And SpiderMoney can handle some styles, but if they don't have a value for the requested style, they fall back to the canonicalized code passed into of().	2021-11-13 11:52:45 +00:00
Timothy Flynn	be69eae651	LibUnicode: Precompute the compact scale of each number formatting rule This will be needed for the ComputeExponentForMagnitude AO for compact formatting, namely step 5b: Let exponent be an implementation- and locale-dependent (ILD) integer by which to scale a number of the given magnitude in compact notation for the current locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	230b133ee3	LibUnicode: Parse number formats into zero/positive/negative patterns A number formatting pattern in the CLDR contains one or two entries, delimited by a semi-colon. Previously, LibUnicode was just storing the entire pattern as one string. This changes the generator to split the pattern on that delimiter and generate the 3 unique patterns expected by ECMA-402. The rules for generating the 3 patterns are as follows: * If the pattern contains 1 entry, it is the zero pattern. The positive pattern is the zero pattern prepended with {plusSign}. The negative pattern is the zero pattern prepended with {minusSign}. * If the pattern contains 2 entries, the first is the zero pattern, and the second is the negative pattern. The positive pattern is the zero pattern prepended with {plusSign}.	2021-11-12 09:17:08 +00:00
Timothy Flynn	1244ebcd4f	LibUnicode: Parse and generate standard accounting formatting rules Also known as "currency-accounting" in some CLDR documentation.	2021-11-12 09:17:08 +00:00
Timothy Flynn	967afc1b84	LibUnicode: Parse and generate standard currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	bffd73e0d4	LibUnicode: Parse and generate standard decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	feb8c22a62	LibUnicode: Parse and generate standard percentage formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	4317a1b552	LibUnicode: Parse and generate compact currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	604a596c90	LibUnicode: Parse and generate compact decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	12b468a588	LibUnicode: Begin parsing and generating locale number systems The number system data in the CLDR contains information on how to format numbers in a locale-dependent manner. Start parsing this data, beginning with numeric symbol strings. For example the symbol NaN maps to "NaN" in the en-US locale, and "非數值" in the zh-Hant locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	3ae4ff109f	LibUnicode: Extract canonicalization of Unicode extension values LibJS will need to canonicalize Unicode extension values, so extract the lambda that was doing this work to its own function. This also changes the helpers it invokes to take the provided key as a StringView because we don't need (and won't always have) full String objects here.	2021-09-11 11:05:50 +01:00
Timothy Flynn	b1d4bcf364	LibUnicode: Generate numeric keyword values for each locale This is needed for Intl.NumberFormat's usage of the ResolveLocale AO, where the [[RelevantExtensionKeys]] internal slot will be "nu".	2021-09-11 11:05:50 +01:00
Timothy Flynn	4f2bcebe74	LibUnicode+LibJS: Store locale keyword values as a single string Previously, LibUnicode would store the values of a keyword as a Vector. For example, the locale "en-u-ca-abc-def" would have its keyword "ca" stored as {"abc, "def"}. Then, canonicalization would occur on each of the elements in that Vector. This is incorrect because, for example, the keyword value "true" should only be dropped if that is the entire value. That is, the canonical form of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change for canonicalization. However, we would canonicalize that locale as "en-u-kb-abc".	2021-09-08 21:08:48 +01:00
Timothy Flynn	3f64a14e06	LibUnicode: Parse and generate the Unicode locale list patterns dataset This data informs consumers how to join lists of values. For example, in en-US, the list ["a", "b", "c"] formatted to a string should become "a, b, and c".	2021-09-06 23:49:56 +01:00
Timothy Flynn	40ea659282	LibUnicode+LibJS: Return removed extensions from remove_extension_type Some callers will need to hold onto the removed extensions.	2021-09-06 15:24:27 +01:00
Timothy Flynn	12ae0a44d7	LibUnicode: Add public wrapper for the generated locale_from_string	2021-09-06 15:24:27 +01:00
Timothy Flynn	a77f323dfb	LibUnicode: Implement the Remove Likely Subtags method Unlike Add Likely Subtags, this method doesn't require generated data. Instead, it is defined in terms of Add Likely Subtags.	2021-09-04 13:51:40 +01:00
Timothy Flynn	e6a2ab1202	LibUnicode: Generate an implementation of the Add Likely Subtags method	2021-09-04 13:51:40 +01:00
Timothy Flynn	ca90231794	LibUnicode: Define is_unicode_*_subtag helpers inline in their header The UnicodeLocale generator will need to parse canonicalized locale strings, and will require using these methods. However, the generator cannot depend on LibUnicode because Locale.cpp within LibUnicode already depends on the generated file. Instead, defining the methods that the generator needs inline allows the generator to use them without linking against LibUnicode.	2021-09-04 13:51:40 +01:00
Timothy Flynn	21c4922ac0	LibUnicode: Add helper methods to LocaleID and LanguageID for LibJS Add a method to remove an extension type from the locale's extension set and methods to convert a locale and language to a string without canonicalization. Each of these will be used by LibJS.	2021-09-02 17:56:42 +01:00
Timothy Flynn	a05419db55	LibUnicode: Add lexer to test if a string matches the "type" production	2021-09-02 17:56:42 +01:00
Timothy Flynn	1fbc5dba08	LibUnicode: Generate Unicode locale likely subtag data CLDR contains a set of likely subtag data where, given a locale, you can resolve what is the most likely language, script, or territory of that locale. This data is needed for resolving territory aliases. These aliases might contain multiple territories, and we need to resolve which of those territories is most likely correct for a locale. Note that the likely subtag data is quite huge (a few thousand entries). As an optimization encouraged by the spec, we only generate the smallest subset of this data that we actually need (about 150 entries).	2021-09-01 14:14:47 +01:00
Timothy Flynn	9b118f1f06	LibUnicode: Generate Unicode locale alias data CLDR contains a set of aliases for languages, territories, etc. that no longer are meant to be used (e.g. due to deprecation). For example, the language "aam" is deprecated and should be canonicalized as "aas".	2021-09-01 14:14:47 +01:00
Timothy Flynn	d13142f015	LibJS+LibUnicode: Store parsed Unicode locale data as full strings Originally, it was convenient to store the parsed Unicode locale data as views into the original string being parsed. But to implement locale aliases will require mutating the data that was parsed. To prepare for that, store the parsed data as proper strings.	2021-09-01 14:14:47 +01:00
Timothy Flynn	30855e6663	LibUnicode: Parse locale private use extensions	2021-08-30 19:42:40 +01:00

1 2

59 commits