1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-24 00:22:30 +00:00
Commit graph

145 commits

Author SHA1 Message Date
Timothy Flynn
2a7f36b392 LibJS+LibUnicode: Generate unique numeric symbol lists
There are 443 number system objects generated, each of which held an
array of number system symbols. Of those 443 arrays, only 39 are unique.

To uniquely store these, this change moves the generated NumericSymbol
enumeration to the public LibUnicode/NumberFormat.h header with a pre-
defined set of symbols that we need. This is to ensure the generated,
unique arrays are created in a known order with known symbols. While it
is unfortunate to no longer discover these symbols at generation time,
it does allow us to ignore unwanted symbols and perform less string-to-
enumeration conversions at lookup time.
2021-12-11 14:17:47 +00:00
Timothy Flynn
5bdee9e38a LibJS: Use locale-aware day period time ranges to format period symbols
For the test cases changed here, we now recognize "morning2" and
"afternoon2" from the CLDR, so the expected results now match the specs
and other engines.
2021-12-10 21:27:24 +00:00
Timothy Flynn
76aab821f4 LibJS+LibUnicode: Rename some Unicode::DayPeriod values
In the CLDR, there aren't "night" values, there are "night1" & "night2"
values. This is for locales which use a different name for nighttime
depending on the hour. For example, the ja locale uses "夜" between the
hours of 19:00 and 23:00, and "夜中" between the hours of 23:00 and
04:00. Our CLDR parser is currently ignoring "night2", so this rename
is to prepare for that.

We could probably come up with better names, but in the end, the API in
LibUnicode will be such that outside callers won't even see Night1, etc.
2021-12-10 21:27:24 +00:00
Timothy Flynn
53df13fed7 LibJS: Implement Intl.DateTimeFormat.prototype.formatRangeToParts 2021-12-09 23:43:04 +00:00
Timothy Flynn
04f8fb07e1 LibJS: Implement Intl.DateTimeFormat.prototype.formatRange 2021-12-09 23:43:04 +00:00
Timothy Flynn
1f35eda37b LibJS: Store range format patterns in the Intl.DateTimeFormat object
Now that LibUnicode generates these patterns, the AO steps to retrieve
them may be implemented.
2021-12-09 23:43:04 +00:00
Timothy Flynn
7a0d5e3f7a LibJS: Do not return views into potentially freed memory
In a future commit, the "part" view returned from FormatDateTimePattern
may be a view into a string that goes out of scope. Ensure the AO only
returns valid views. A similar approach is used in Intl.NumberFormat.
2021-12-09 23:43:04 +00:00
Timothy Flynn
1e68e7f129 LibJS: Implement Intl.DateTimeFormat.prototype.formatToParts 2021-12-08 11:29:36 +00:00
Timothy Flynn
adaf5985a4 LibJS: Implement (most of) Intl.DateTimeFormat.prototype.format
There are a few FIXMEs that will need to be addressed, but this
implements most of the prototype method. The FIXMEs are mostly related
to range formatting, which has been entirely ignored so far. But other
than that, the following will need to be addressed:

* Determining flexible day periods must be made locale-aware.
* DST will need to be determined and acted upon.
* Time zones other than UTC and calendars other than Gregorian are
  ignored.
* Some of our results differ from other engines as they have some
  format patterns we do not. For example, they seem to have a lonely
  {dayPeriod} pattern, whereas our closest pattern is
  "{hour} {dayPeriod}".
2021-12-08 11:29:36 +00:00
Timothy Flynn
d010ba10c3 LibJS: Cache the data locale used by Intl.DateTimeFormat
Unlike the locale, the data locale has Unicode locale extensions removed
(e.g. the data locale for "en-US-u-ca-gregory" is just "en-US"). Cache
the data locale for LibUnicode lookups during formatting.
2021-12-08 11:29:36 +00:00
Timothy Flynn
26f9666191 LibJS: Do not override hour, minute, and second format field lengths
This was an oversight in e42d954743.

These fields should always follow the locale preference in the CLDR.
Overriding these fields would permit formats like "h:mm:ss" to result in
strings like "1:2:3" instead of "1:02:03".
2021-12-08 11:29:36 +00:00
Timothy Flynn
9f7c727720 LibJS+LibUnicode: Generate missing patterns with fractionalSecondDigits
TR-35's Matching Skeleton algorithm dictates how user requests including
fractional second digits should be handled when the CLDR format pattern
does not include that field. When the format pattern contains {second},
but does not contain {fractionalSecondDigits}, generate a second pattern
which appends "{decimal}{fractionalSecondDigits}" to the {second} field.
2021-12-08 11:29:36 +00:00
Timothy Flynn
6ace4000bf LibJS+LibUnicode: Supply field type in CalendarPattern's for-each method
Some callers will want different behavior depending on what field is
being provided to the callback.
2021-12-08 11:29:36 +00:00
Timothy Flynn
e42d954743 LibJS: Always respect user-provided format field lengths
ECMA-402 doesn't explicitly handle a note in the TR-35 spec related to
expanding field lengths based on user-provided options. Instead, it
assumes the "implementation defined" locale data includes the possible
values.

LibUnicode does not generate every possible combination of field lengths
in its implementation of TR-35's "Missing Skeleton Fields", because the
number of generated patterns would grow out of control. Instead, it's
much simpler to handle this difference at runtime.
2021-12-06 15:46:34 +01:00
Timothy Flynn
9dc9700e3b LibJS: Fallback to [[pattern]] when [[pattern12]] is unavailable
Other implementations unconditionally initialize [[pattern12]] from
[[pattern]] regardless of whether [[pattern]] has an hour pattern of h11
or h12. LibUnicode does not do this. So when InitializeDateTimeFormat
defaults the hour cycle to the locale's preferred hour cycle, if the
best format didn't have an equivalent hour pattern, [[pattern12]] will
be empty.
2021-12-06 15:46:34 +01:00
Timothy Flynn
d2588d852b LibJS: Change all [[RelevantExtensionKeys]] to return constexpr arrays
There's no need to allocate a vector for this internal slot. Similar to
commit: bb11437792
2021-12-01 16:36:26 +00:00
Timothy Flynn
4a08fd2be2 LibJS: Implement Intl.DateTimeFormat.prototype.resolvedOptions 2021-11-29 22:48:46 +00:00
Timothy Flynn
d0e1997e07 LibJS: Implement Intl.DateTimeFormat.supportedLocalesOf 2021-11-29 22:48:46 +00:00
Timothy Flynn
16151aa7d5 LibJS+LibUnicode: Implement the Intl.DateTimeFormat constructor 2021-11-29 22:48:46 +00:00
Timothy Flynn
75b2a09a2f LibJS: Implement a nearly empty Intl.DateTimeFormat object
This adds plumbing for the Intl.DateTimeFormat object, constructor, and
prototype.

Note that unlike other Intl objects, the Intl.DateTimeFormat object has
a LibUnicode structure as a base. This is to prevent wild amounts of
code duplication between LibUnicode, Intl.DateTimeFormat, and other
not-yet-defined Intl structures, because there's 12 fields shared
between them.
2021-11-29 22:48:46 +00:00
Timothy Flynn
914675e826 LibJS+LibUnicode: Separate number formatting methods from Locale.h
Currently, we generate separate data files for locale and number format
related tables/methods, but provide public accessors for all of the data
in one Locale.h file. Rather than continuing this trend for date-time,
relative time, etc. formatting, it's a bit easier to reason about if the
public accessors are also in separate files.
2021-11-29 22:48:46 +00:00
Timothy Flynn
bb11437792 LibJS: Change Intl's GetOption AO to accept a Span rather than a Vector
Allocating a Vector for each of these invocations is a bit silly when
the values are basically all compile-time arrays. This AO is used even
more heavily by Intl.DateTimeFormat, so change it to accept a Span to
reduce its cost.

This also adds an overload to accept a fixed-size C-array so callers do
not have to be prefixed with AK::Array, i.e. this:

    get_option(..., AK::Array { "a"sv, "b"sv }, ...);

Reduces to:

    get_option(..., { "a"sv, "b"sv }, ...);

(Which is how all call sites were already written to construct a Vector
in place).
2021-11-29 22:48:46 +00:00
Linus Groh
a20b189eab LibJS: Fix incorrectly formatted section comments
A couple of missing URLs, spaces, and a stray comma.
2021-11-24 17:37:27 +00:00
Timothy Flynn
251f692440 LibJS: Re-implement SetNumberFormatDigitOptions AO
This is an editorial change in the Intl spec.

See: d89c84f
2021-11-24 14:17:15 +00:00
Timothy Flynn
a2f629f38a LibJS: Update spec comments in GetOption and DefaultNumberOption AOs
This is an editorial change in the Intl spec.

See: 913ca6d
2021-11-24 14:17:15 +00:00
Timothy Flynn
a1d5849e67 LibJS: Implement unit number formatting 2021-11-16 23:14:09 +00:00
Timothy Flynn
04b8b87c17 LibJS+LibUnicode: Support multiple identifiers within format pattern
This wasn't the case for compact patterns, but unit patterns can contain
multiple (up to 2, really) identifiers that must each be recognized by
LibJS.

Each generated NumberFormat object now stores an array of identifiers
parsed. The format pattern itself is encoded with the index into this
array for that identifier, e.g. the compact format string "0K" will
become "{number}{compactIdentifier:0}".
2021-11-16 23:14:09 +00:00
Timothy Flynn
3b68370212 LibJS+LibUnicode: Rename the generated compact_identifier to identifier
This field is currently used to store the StringView into the compact
name/symbol in the format string. Units will need to store a similar
field, so rename the field to be more generic, and extract the parser
for it.
2021-11-16 23:14:09 +00:00
Timothy Flynn
6d34a0b4e8 LibJS+LibUnicode: Rename method to select a NumberFormat plurality
Instead of currency pattern lookups within select_currency_unit_pattern,
rename the method to select_pattern_with_plurality and accept any list
of patterns. This method will be needed for units.
2021-11-16 23:14:09 +00:00
Timothy Flynn
99c15741ba LibJS: Conditionally ignore [[UseGrouping]] in compact notation 2021-11-16 00:56:55 +00:00
Timothy Flynn
14aca03161 LibJS: Remove FIXME comment from PartitionNotationSubPattern AO
All possible patterns generated by LibUnicode are now handled. We have a
similar VERIFY_NOT_REACHED in PartitionNumberPattern.
2021-11-16 00:56:55 +00:00
Timothy Flynn
fdae323401 LibJS: Implement compact formatting for Intl.NumberFormat 2021-11-16 00:56:55 +00:00
Timothy Flynn
80b86d20dc LibJS: Cache the number format used for compact notation
Finding the best number format to use for compact notation involves
creating a Vector of all compact formats for the locale and looking for
the one that best matches the number's magnitude. ECMA-402 wants this
number format to be found multiple times, so cache the result for future
use.
2021-11-16 00:56:55 +00:00
Timothy Flynn
1f546476d5 LibJS+LibUnicode: Fix computation of compact pattern exponents
The compact scale of each formatting rule was precomputed in commit:
be69eae651

Using the formula: compact scale = magnitude - pattern scale

This computation was off-by-one.

For example, consider the format key "10000-count-one", which maps to
"00 thousand" in en-US. What we are really after is the exponent that
best represents the string "thousand" for values greater than 10000
and less than 100000 (the next format key). We were previously doing:

    log10(10000) - "00 thousand".count("0") = 2

Which clearly isn't what we want. Instead, if we do:

    log10(10000) + 1 - "00 thousand".count("0") = 3

We get the correct exponent for each format key for each locale.

This commit also renames the generated variable from "compact_scale" to
"exponent" to match the terminology used in ECMA-402.
2021-11-16 00:56:55 +00:00
Timothy Flynn
4d79ab6866 LibJS: Implement engineering and scientific number formatting 2021-11-14 17:00:35 +00:00
Timothy Flynn
b019a7fe64 LibJS: Define the "name" property on the number format function
Also add a missing spec link.
2021-11-14 17:00:35 +00:00
Timothy Flynn
15c5fbd9e9 LibJS: Implement number grouping for Intl.NumberFormat
For example, in en-US, the number 123456 should be formatted as the
string "123,456". In en-IN, it should be formatted as "1,23,456".
2021-11-14 10:35:19 +00:00
Timothy Flynn
3450def494 LibJS: Implement Intl.NumberFormat.prototype.formatToParts 2021-11-13 19:01:25 +00:00
Timothy Flynn
40973814e9 LibJS: Throw an exception in NumberFormat.prototype.format for BigInts
Rather than crashing in the call to as_double(), throw an exception for
now.
2021-11-13 19:01:25 +00:00
Timothy Flynn
c65dea64bd LibJS+LibUnicode: Don't remove {currency} keys in GetNumberFormatPattern
In order to implement Intl.NumberFormat.prototype.formatToParts, do not
replace {currency} keys in the format pattern before ECMA-402 tells us
to. Otherwise, the array return by formatToParts will not contain the
expected currency key.

Early replacement was done to avoid resolving the currency display more
than once, as it involves a couple of round trips to search through
LibUnicode data. So this adds a non-standard method to NumberFormat to
do this resolution and cache the result.

Another side effect of this change is that LibUnicode must replace unit
format patterns of the form "{0} {1}" during code generation. These were
previously skipped during code generation because LibJS would just
replace the keys with the currency display at runtime. But now that the
currency display injection is delayed, any {0} or {1} keys in the format
pattern will cause PartitionNumberPattern to abort.
2021-11-13 19:01:25 +00:00
Timothy Flynn
d872d030f1 LibJS: Avoid potential for dangling string views in partition patterns
There aren't any dangling views in as of yet, but a subsequent commit
will cause the "part" variable to be a view into an internally generated
string. Therefore, after returning from PartitionNumberPattern, that
view will be pointed at freed memory.

This commit is to set the precendence of not returning a view to "part".
2021-11-13 19:01:25 +00:00
Timothy Flynn
a701ed52fc LibJS+LibUnicode: Fully implement currency number formatting
Currencies are a bit strange; the layout of currency data in the CLDR is
not particularly compatible with what ECMA-402 expects. For example, the
currency format in the "en" and "ar" locales for the Latin script are:

    en: "¤#,##0.00"
    ar: "¤\u00A0#,##0.00"

Note how the "ar" locale has a non-breaking space after the currency
symbol (¤), but "en" does not. This does not mean that this space will
appear in the "ar"-formatted string, nor does it mean that a space won't
appear in the "en"-formatted string. This is a runtime decision based on
the currency display chosen by the user ("$" vs. "USD" vs. "US dollar")
and other rules in the Unicode TR-35 spec.

ECMA-402 shies away from the nuances here with "implementation-defined"
steps. LibUnicode will store the data parsed from the CLDR however it is
presented; making decisions about spacing, etc. will occur at runtime
based on user input.
2021-11-13 11:52:45 +00:00
Timothy Flynn
39e031c4dd LibJS+LibUnicode: Generate all styles of currency localizations
Currently, LibUnicode is only parsing and generating the "long" style of
currency display names. However, the CLDR contains "short" and "narrow"
forms as well that need to be handled. Parse these, and update LibJS to
actually respect the "style" option provided by the user for displaying
currencies with Intl.DisplayNames.

Note: There are some discrepencies between the engines on how style is
handled. In particular, running:

new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd')

Gives:

  SpiderMoney: "USD"
  V8: "US Dollar"
  LibJS: "$"

And running:

new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd')

Gives:

  SpiderMonkey: "$"
  V8: "US Dollar"
  LibJS: "$"

My best guess is V8 isn't handling style, and just returning the long
form (which is what LibJS did before this commit). And SpiderMoney can
handle some styles, but if they don't have a value for the requested
style, they fall back to the canonicalized code passed into of().
2021-11-13 11:52:45 +00:00
Timothy Flynn
89523f70cf LibJS: Begin implementing Intl.NumberFormat.prototype.format
There is quite a lot to be done here so this is just a first pass at
number formatting. Decimal and percent formatting are mostly working,
but only for standard and compact notation (engineering and scientific
notation are not implemented here). Currency formatting is parsed, but
there is more work to be done to handle e.g. using symbols instead of
currency codes ("$" instead of "USD"), and putting spaces around the
currency symbol ("USD 2.00" instead of "USD2.00").
2021-11-12 09:17:08 +00:00
Timothy Flynn
0469006263 LibJS: Change Intl's PatternPartition record to hold a String value
It was previously holding a StringView, which was either a view into a
LibUnicode-generated string or a string passed from the user.

Intl.NumberFormat will need this record to hold internally-created
strings, so a StringView will not suffice (the way the steps are laid
out, that view will ultimately end up dangling).

This shouldn't be too wasteful since the StringView it was holding was
converted to a String eventually anyways.
2021-11-12 09:17:08 +00:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling
398c181c79 LibJS: Rename PropertyName to PropertyKey
Let's use the same name as the spec. :^)
2021-10-24 17:18:07 +02:00
Linus Groh
66770de264 LibJS: Convert Intl.NumberFormat functions to ThrowCompletionOr 2021-10-22 23:20:18 +01:00
Linus Groh
351f0b70bd LibJS: Convert Intl.Locale functions to ThrowCompletionOr 2021-10-22 23:20:18 +01:00
Linus Groh
9fb1d7e18b LibJS: Convert Intl.ListFormat functions to ThrowCompletionOr 2021-10-22 23:20:18 +01:00