With this change, we now have ~1200 CellAllocators across both LibJS and
LibWeb in a normal WebContent instance.
This gives us a minimum heap size of 4.7 MiB in the scenario where we
only have one cell allocated per type. Of course, in practice there will
be many more of each type, so the effective overhead is quite a bit
smaller than that in practice.
I left a few types unconverted to this mechanism because I got tired of
doing this. :^)
As far as I can tell all of these steps are just equivalent to using the
qualified name. Add some tests which cover some of these cases, and
remove the FIXME's.
I have been going down into a bit of a rabbit hole trying to figure out
why the namespace is not getting set up properly on certain attributes.
At one stage, I thought the issue might have been around here where
attributes were being adjusted (it is not). I started adding spec
comments to understand what was happening, and by the time I realised it
wasn't in this place, I was already in too deep!
Add a whole bunch of spec comments, and leave one or two minor FIXME's
where the spec seems to have changed since this was originally
implemented.
These were DeprecatedFlyStrings, but had no reason to be. We were not
making use of the O(1) lookup, so instead of porting it over to a
FlyString, just make it a StringView.
Previously these were DeprecatedStrings that contained a null state.
After the null state was removed, the nullability of these members was
broken. This doesn't seem to cause any problems currently as the HTML
parser is not inserting attributes with their full qualified name, but
after we fix that problem, this bug surfaces.
This required dealing with a *lot* of fallout, but it's all basically
just switching from DeprecatedFlyString to either FlyString or
Optional<FlyString> in a hundred places to accommodate the change.
Renaming the DeprecatedString version of this function to
deprecated_tag_name. A FlyString is used here as we often need to
perform equality checks here, and the HTMLParser already has tag_name as
a FlyString.
Remove a FIXME while we're at it - we were already following the spec
there, and we still are :^)
Which pretty much needs to be done together due to the amount of places
where they are compared together.
This also involves porting over StackOfOpenElements over to FlyString
from DeprecatedFly string to prevent a gazillion calls to
`.to_deprecated_fly_string` calls in HTMLParser.
There are an unfortunate number of DeprecatedString conversions required
here, but these should all fall away and look much more pretty again
when other places are also ported away from DeprecatedString.
Leaves only the Element IDL interface left :^)
The existing implementation has some pre-existing issues where it is
incorrectly assumes that byte offsets are given through the IDL instead
of UTF-16 code units. While making these changes, leave some FIXMEs for
that.
We currently track the [line, column] position of every HTMLToken, as
this is what is needed for LibGUI's syntax highlighting. Some non-LibGUI
purposes (e.g. highlighting HTML with HTML) require a byte offset. Track
both during tokenization.
The FIXME here describes an old constraint on JS Interpreters which no
longer holds. It hails from a time when we had the global object and
JS realm attached to the document.
This is intended to annotate conversions from unknown floating-point
values to CSSPixels, and make it more obvious the fp value will be
rounded to the nearest fixed-point value.
In general it is not safe to convert any arbitrary floating-point value
to CSSPixels. CSSPixels has a resolution of 0.015625, which for small
values (e.g. scale factors between 0 and 1), can produce bad results
if converted to CSSPixels then scaled back up. In the worst case values
can underflow to zero and produce incorrect results.
We were previously setting the end position of attribute names in self-
closing HTML tags to the end of the attribute value. To illustrate the
previous behavior, consider this tag and its attribute's start and end
positions (shown inclusively below):
<meta charset="UTF-8" />
^ name start
^ value start
^ value end
^ name end
Rather than setting the end position of the attribute name when we parse
the closing slash, ensure the end position is already set while we are
in the AttributeName state. We now have:
<meta charset="UTF-8" />
^ name start
^ name end
^ value start
^ value end
The tokenizer unit test has been extended to test these positions.
To illustrate the previous behavior, consider these tags and their start
and end positions (shown inclusively below):
Start tag: End tag:
<span> </span>
^ start ^ start
^end ^end
The start position of a tag is the first ASCII-alpha code point after
the opening brace. The start position of a close tag is the slash just
before the first ASCII-alpha code point. And the end position of both
is the closing brace. So the opening brace is not included in the
emitted tag, but the closing brace is. And the end tag including the
slash is an oddity that had to be worked around in its only use case
(syntax highlighting).
We now consistently exclude the braces from the emitted tag, and also
exclude the slash from the end tag, so that it does not need to be
accounted for in syntax highlighting. That is, we now have:
Start tag: End tag:
<span> </span>
^ start ^ start
^end ^end
The tokenizer unit test has been extended to test these positions.
If we run an inline script from the HTML parser, it may append a text
node to the current insertion point.
If there was text content immediately following the script element,
we would previously overwrite the script-inserted text content, due to
an oversight in the way we select an appropriate insertion point
This patch fixes the issue by only inserting parser content into
existing text nodes if they are empty.
Stop worrying about tiny OOMs. Work towards #20449.
While going through these, I also changed the function signature in many
places where returning ThrowCompletionOr<T> is no longer necessary.
We now apply MathML's default user agent style sheet along with other
default styles. This sheet is not mixed in with the other styles in
CSS/Default.css because it is a namespaced stylesheet and so has to
be its own sheet.
Some websites (like Reddit) like to instantiate "components" by setting
innerHTML to a huge chunk of stuff. Sometimes those huge chunks of stuff
contain inline style sheets (i.e `<style>` elements).
Before this change, we would end up parsing the CSS in those elements
multiple times, because we had no way of knowing that we were within
a fragment parser's temporary document.
This patch avoids the extra CSS parsing work by adding adding a flag to
Document that tells us it's being used by the fragment parser. Then, we
simply avoid parsing CSS for style elements in such documents. The CSS
then gets parsed immediately upon insertion into the proper DOM.
There were multiple bugs in the parsing algorithm for handling text
occurring inside a `table` element:
- When there was pending non-whitespace text inside a table, we only
flushed one token instead of all pending tokens.
- Also, we didn't even flush one of the right tokens, but instead the
token that caused the flush to happen.
- Once we started flushing the right tokens, it turned out we had not
yet implemented character insertion points expressed as "before X".
- Finally, we were not exiting the "in table text" mode after flushing
pending tokens, effectively getting us stuck in that mode until EOF.