The number system data in the CLDR contains information on how to format
numbers in a locale-dependent manner. Start parsing this data, beginning
with numeric symbol strings. For example the symbol NaN maps to "NaN" in
the en-US locale, and "非數值" in the zh-Hant locale.
Some locales in the CLDR have alternate default numbering systems listed
under "defaultNumberingSystem-alt-*", e.g.:
"defaultNumberingSystem": "arab",
"defaultNumberingSystem-alt-latn": "latn",
"otherNumberingSystems": {
"native": "arab"
},
We were previously only parsing "defaultNumberingSystem" and
"otherNumberingSystems". This odd format appears to be an artifact of
converting from XML.
This isn't particularly important because this generates code that is
quite hidden from outside callers. But when viewing the generated code,
it's a bit nicer to read e.g. enum identifiers such as "MinusSign"
rather than "Minussign".
First off, this verifies that an initial value is always provided in
Properties.json for each property.
Second, it verifies that parsing that initial value succeeds.
This means that a call to `property_initial_value()` will always return
a valid StyleValue. :^)
This file contains the list of locales which default to their parent
locale's values. In the core CLDR dataset, these locales have their own
files, but they are empty (except for identity data). For example:
https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml
In the JSON export, these files are excluded, so we currently are not
recognizing these locales just by iterating the locale files.
This is a prerequisite for upgrading to CLDR version 40. One of these
default-content locales is the popular "en-US" locale, which defaults to
"en" values. We were previously inferring the existence of this locale
from the "en-US-POSIX" locale (many implementations, including ours,
strip variants such as POSIX). However, v40 removes the "en-US-POSIX"
locale entirely, meaning that without this change, we wouldn't know that
"en-US" exists (we would default to "en").
For more detail on this and other v40 changes, see:
https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
This changes Web::Bindings::throw_dom_exception_if_needed() to return a
JS::ThrowCompletionOr instead of an Optional. This allows callers to
wrap the invocation with a TRY() macro instead of making a follow-up
call to should_return_empty(). Further, this removes all invocations to
vm.exception() in the generated bindings.
This also required converting URLSearchParams::for_each and the callback
function it invokes to ThrowCompletionOr. With this, the ReturnType enum
used by WrapperGenerator is removed as all callers would be using
ReturnType::Completion.
Both at the same time because many of them call construct() in call()
and I'm not keen on adding a bunch of temporary plumbing to turn
exceptions into throw completions.
Also changes the return value of construct() to Object* instead of Value
as it always needs to return an object; allowing an arbitrary Value is a
massive foot gun.
The old versions were renamed to JS_DECLARE_OLD_NATIVE_FUNCTION and
JS_DEFINE_OLD_NATIVE_FUNCTION, and will be eventually removed once all
native functions were converted to the new format.
Adds support for methods whose last parameter is a variadic DOMString.
We constructor a Vector<String> of the remaining arguments to pass to
the C++ implementation.
Note our Attribute class is what the spec refers to as just "Attr". The
main differences between the existing implementation and the spec are
just that the spec defines more fields.
Attributes can contain namespace URIs and prefixes. However, note that
these are not parsed in HTML documents unless the document content-type
is XML. So for now, these are initialized to null. Web pages are able to
set the namespace via JavaScript (setAttributeNS), so these fields may
be filled in when the corresponding APIs are implemented.
The main change to be aware of is that an attribute is a node. This has
implications on how attributes are stored in the Element class. Nodes
are non-copyable and non-movable because these constructors are deleted
by the EventTarget base class. This means attributes cannot be stored in
a Vector or HashMap as these containers assume copyability / movability.
So for now, the Vector holding attributes is changed to hold RefPtrs to
attributes instead. This might change when attribute storage is
implemented according to the spec (by way of NamedNodeMap).
This adds the ParamatizedType, as `Vector<String>` doesn't encode the
full type information. It is a separate struct as you can't have
`Vector<Type>` inside of `Type`. This also makes Type RefCounted
because I had to make parse_type return a pointer to make dynamic
casting work correctly.
The reason I made it RefCounted instead of using a NonnullOwnPtr is
because it causes compiler errors that I don't want to figure out right
now.
Typically size_t is used for indices, but we can take advantage of the
knowledge that there is approximately only 46K unique strings in the
generated UnicodeLocale.cpp file. Therefore, we can get away with using
u16 to store indices. There is a VERIFY that will fail if we ever exceed
the limits of u16.
On x86_64 builds, this reduces libunicode.so from 9.2 MiB to 7.3 MiB.
On i686 builds, this reduces libunicode.so from 3.9 MiB to 3.3 MiB.
These savings are entirely in the .rodata section of the shared library.
Note there are a couple of type differences between the spec and the IDL
file added in this commit. For example, we will need to support a type
of Variant to handle spec types such as "(double or sequence<double>)".
But for now, this allows web pages to construct an IntersectionObserver
with any valid type.
The *_from_string() and resolve_*_alias() generated methods are the last
remaining users of HashMap in the LibUnicode generated files (read: the
last methods not using compile-time structures). This converts these
methods to use an array containing pairs of hash values to the desired
lookup value.
Because this code generation is the same between GenerateUnicodeData.cpp
and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the
LibUnicode generators to contain the method that generates the methods.
This concept is not present in ECMAScript, and it bothers me every time
I see it.
It's only used by WrapperGenerator, and even there only relevant in two
places, so let's fully remove it from LibJS and use a simple ternary
expression instead:
cpp_name = js_name.is_null() && legacy_null_to_empty_string
? String::empty()
: js_name.to_string(global_object);
Previously this would generate the following code:
JS::Value foo_value;
if (!foo.is_undefined())
foo_value = foo;
Which is dangerous as we're passing an empty value around, which could
be exposed to user code again. This is fine with "= null", for which it
also generates:
else
foo_value = JS::js_null();
So, in summary: a value of type `any`, not `required`, with no default
value and no initializer from user code will now default to undefined
instead of an empty value.
The list-format strings used for Intl.ListFormat are small, but quite
heavily duplicated. For example, the string "{0}, {1}" appears 6,519
times. Generate unique strings for this data to avoid duplication.
In the generated UnicodeLocale.cpp file, there are 296,408 strings for
localizations of languages, territories, scripts, currencies & keywords.
Of these, only 43,848 (14.8%) are actually unique, so there are quite a
large number of duplicated strings.
This generates a single compile-time array to store these strings. The
arrays for the localizations now store an index into this single array
rather than duplicating any strings.
Some CLDR languages.json / territories.json files contain localizations
for some lanuages/territories that are otherwise not present in the CLDR
database. We already don't generate anything in UnicodeLocale.cpp for
these anomalies, but this will stop us from even storing that data in
the generator's memory.
This doesn't affect the output of the generator, but will have an effect
after an upcoming commit to unique-ify all of the strings in the CLDR.
There are only 112 code points with special casing rules, so this array
is quite small (compared to the size 34,626 UnicodeData hash map that is
also storing this data). Removing all casing rules from UnicodeData will
happen in a subsequent commit.