This implements a lot of cases, but not all of them. The following
need more infrastructure first:
- Flex
- FlexFlow
- Background
- BackgroundImage
- BackgroundRepeat
- ListStyle and parts
- Font
Also, colors are not parsed correctly. This will be handled next.
Now that StyleResolver is going to deal with StyleComponentValueRules,
it will need to be able to parse those into StyleValues, using
`parse_css_value()`.
Also added StyleValue::is_builtin_or_dynamic(), which returns true for
values that are valid anywhere - things like `initial` and `inherit`,
along with `var()`, `attr()` and `calc()` - which we want to test for
easily.
We skip whitespace tokens while doing this. As far as I can tell,
whitespace is not useful once we get to this point, and it legally
may or may not appear between any two tokens. By not including it
in the ValueListStyleValue, we make the "if it has 3 parts"-type
checks a lot more straightforward.
As the new CSS parser tokenizes its input, we can no longer easily
rely on a StringStyleValue for multi-value properties. (eg, border)
ValueListStyleValue lets us wrap all of the ComponentValues that
the Parser produced for one declaration, as a single StyleValue, to
then be parsed into StyleValues by the StyleResolver.
Originally, I wanted it to be a list of StyleValues, but several
properties use syntax that makes use of non-StyleValue tokens, eg:
```css
/* Syntax using a / */
font: 12px/14px sans-serif;
/* Multiple values separated by commas */
background: url(catdog.png), url(another-image.jpg), blue;
```
Passing the ComponentValue tokens themselves means that all that
information is carried over. The alternative might be to create a
StyleValue subclass for each property and parse them fully inside
the Parser. (eg, `FontStyleValue`)
I decided against `ListStyleValue` as a name, to avoid confusion
with list styles. It's not ideal, but names are hard.
This completely changes how HTMLTokens store their data. Previously,
space was allocated for all token types separately. Now, the HTMLToken's
data is stored in just a String, two booleans and a Variant.
This change reduces sizeof(HTMLToken) from 68 to 32. Also, this reduces
raw tokenization time by around 20 to 50 percent, depending on the page.
Full document parsing time (with HTMLDocumentParser, on a local HTML
page without any dependency files) is reduced by between 4 and 20
percent, depending on the page.
Since tokenizing HTML pages can easily generated 50'000 tokens and more,
the storage has been designed in a way that avoids heap allocations
where possible, while trying to reduce the size of the tokens. The only
tokens which need to allocate on the heap are thus DOCTYPE tokens (max.
1 per document), and tag tokens (but only if they have attributes). This
way, only around 5 percent of all tokens generated need to allocate on
the heap (except for StringImpl allocations).
Since all interaction with the HTMLToken class now happens over getters
and setters, there is no more need for HTMLTokenizer and
HTMLDocumentParser to have direct access to the members.
This is in preparation for an upcoming storage change of HTMLToken. In
contrast to the other token types, the accessor can hand out a mutable
reference to allow users to change parts of the DoctypeData easily.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
* wasm: Don't try to print the function results if it traps
* LibWasm: Inline some very hot functions
These are mostly pretty small functions too, and they were about ~10%
of runtime.
* LibWasm+Everywhere: Make the instruction count limit configurable
...and enable it for LibWeb and test-wasm.
Note that `wasm` will not be limited by this.
* LibWasm: Remove a useless use of ScopeGuard
There are no multiple exit paths in that function, so we can just put
the ending logic right at the end of the function instead.
This fixes parsing the following regular expression: /</g;
It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
If the text-for-rendering of the last selected node is empty, the select
all implementation would end up setting the index to -1. This value is
used directly for a substring length in the copy text implementation,
thus would cause a failed assertion.
Rather than parsing the selector every time we want to check it, we
now parse it once at the beginning.
A bonus effect of this is that we now support a selector list in
:not(), instead of just a single selector, though only when using
the new parser.
The end goal is to make the PseudoClass::not_selector be a Selector
instead of a String that is repeatedly re-parsed. But since Selector
contains a Vector of ComplexSelectors, which each have a Vector of
SimpleSelectors, it's probably a good idea to not be passing them
around by value anyway. :^)
Same reasoning again! This is the last one.
While I was at it, I added the two remaining CSS2.2 pseudo-elements,
::first-line and ::first-letter. All 4 are handled in the new CSS
parser, including with the compatibility single-colon syntax. I have
not added support to the old parser.