When searching for the locale-specific flexible day period for a given
hour, we were neglecting to handle cases where the period crosses 00:00.
For example, the en locale defines a day period range of [21:00, 06:00).
When given the hour of 05:00, we were checking if (21 <= 5 && 5 < 6),
thus not recognizing that the hour falls in that period.
Some locales do not define morning, night, etc. day period ranges.
TR-35 states they should fall back to the fixed day periods AM and PM.
Add a test case for the "as" locale, which is one such locale, to ensure
its AM/PM symbols are used.
For the test cases changed here, we now recognize "morning2" and
"afternoon2" from the CLDR, so the expected results now match the specs
and other engines.
There are a few FIXMEs that will need to be addressed, but this
implements most of the prototype method. The FIXMEs are mostly related
to range formatting, which has been entirely ignored so far. But other
than that, the following will need to be addressed:
* Determining flexible day periods must be made locale-aware.
* DST will need to be determined and acted upon.
* Time zones other than UTC and calendars other than Gregorian are
ignored.
* Some of our results differ from other engines as they have some
format patterns we do not. For example, they seem to have a lonely
{dayPeriod} pattern, whereas our closest pattern is
"{hour} {dayPeriod}".
This was an oversight in e42d954743.
These fields should always follow the locale preference in the CLDR.
Overriding these fields would permit formats like "h:mm:ss" to result in
strings like "1:2:3" instead of "1:02:03".
TR-35's Matching Skeleton algorithm dictates how user requests including
fractional second digits should be handled when the CLDR format pattern
does not include that field. When the format pattern contains {second},
but does not contain {fractionalSecondDigits}, generate a second pattern
which appends "{decimal}{fractionalSecondDigits}" to the {second} field.
This is not a calendar supported by ECMA-402, so let's not waste space
with its data.
Further, don't generate "gregorian" as a valid Unicode locale extension
keyword. It's an invalid type identifier, thus cannot be used in locales
such as "en-u-ca-gregorian".
This adds plumbing for the Intl.DateTimeFormat object, constructor, and
prototype.
Note that unlike other Intl objects, the Intl.DateTimeFormat object has
a LibUnicode structure as a base. This is to prevent wild amounts of
code duplication between LibUnicode, Intl.DateTimeFormat, and other
not-yet-defined Intl structures, because there's 12 fields shared
between them.
Currencies are a bit strange; the layout of currency data in the CLDR is
not particularly compatible with what ECMA-402 expects. For example, the
currency format in the "en" and "ar" locales for the Latin script are:
en: "¤#,##0.00"
ar: "¤\u00A0#,##0.00"
Note how the "ar" locale has a non-breaking space after the currency
symbol (¤), but "en" does not. This does not mean that this space will
appear in the "ar"-formatted string, nor does it mean that a space won't
appear in the "en"-formatted string. This is a runtime decision based on
the currency display chosen by the user ("$" vs. "USD" vs. "US dollar")
and other rules in the Unicode TR-35 spec.
ECMA-402 shies away from the nuances here with "implementation-defined"
steps. LibUnicode will store the data parsed from the CLDR however it is
presented; making decisions about spacing, etc. will occur at runtime
based on user input.
Currently, LibUnicode is only parsing and generating the "long" style of
currency display names. However, the CLDR contains "short" and "narrow"
forms as well that need to be handled. Parse these, and update LibJS to
actually respect the "style" option provided by the user for displaying
currencies with Intl.DisplayNames.
Note: There are some discrepencies between the engines on how style is
handled. In particular, running:
new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd')
Gives:
SpiderMoney: "USD"
V8: "US Dollar"
LibJS: "$"
And running:
new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd')
Gives:
SpiderMonkey: "$"
V8: "US Dollar"
LibJS: "$"
My best guess is V8 isn't handling style, and just returning the long
form (which is what LibJS did before this commit). And SpiderMoney can
handle some styles, but if they don't have a value for the requested
style, they fall back to the canonicalized code passed into of().
There is quite a lot to be done here so this is just a first pass at
number formatting. Decimal and percent formatting are mostly working,
but only for standard and compact notation (engineering and scientific
notation are not implemented here). Currency formatting is parsed, but
there is more work to be done to handle e.g. using symbols instead of
currency codes ("$" instead of "USD"), and putting spaces around the
currency symbol ("USD 2.00" instead of "USD2.00").
Currently, we have NotA and NotAn, to be used dependent on whether the
following word begins with a vowel or not. To avoid this, change the
wording on NotA to be independent of this context.
The keyword accessors all have the same function body in the spec,
except for the Intl.Locale method they invoke. This generates those
properties in the same manner as RegExp.prototype.
Intl.Locale.prototype.calendar
Intl.Locale.prototype.caseFirst
Intl.Locale.prototype.collation
Intl.Locale.prototype.hourCycle
Intl.Locale.prototype.numberingSystem
The exception is Intl.Locale.prototype.numeric, which will be defined
separately because it is a boolean value.
This isn't particularly testable yet without the Intl.Locale constructor
but having this defined will make testing the constructor possible. So
more specific tests for this prototype will come later.