Get rid of the weird old signature:
- int StringType::to_int(bool& ok) const
And replace it with sensible new signature:
- Optional<int> StringType::to_int() const
This makes us at least parse selectors like [foo=bar\ baz] correctly.
The current solution here is quite hackish but the real fix will come
when we implement a spec-compliant CSS parser.
There was a logic mistake in the entity parser that chose the shorter
matching entity instead of the longer. Fix this and make the entity
lists constexpr while we're here.
This patch introduces support for more than just "absolute px" units in
our Length class. It now also supports "em" and "rem", which are units
relative to the font-size of the current layout node and the <html>
element's layout node respectively.
When handling a "textarea" start tag, we have to ignore the next token
if it's an LF ('\n'). However, we were not switching the tokenizer
state before fetching the lookahead token, and this caused us to force
the tokenizer into the RCDATA state too late, effectively getting it
stuck in that state for way longer than it should be.
Fixes#2508.
Oops, these were still using the byte-offset cursor. My goodness is it
unergonomic to index into UTF-8 strings, but Dr. Bugaev says it's good.
There is lots of room for improvement here. Just like the rest of the
tokenizer and parser. We'll have to do a few optimization passes over
them once they mature.
We already convert the input to UTF-8 before starting the tokenizer,
so all this patch had to do was switch the tokenizer to use an Utf8View
for its input (and to emit 32-bit codepoints.)
Now that we flush characters in a single place, we can call the Text's
children_changed() from there instead of having a goofy targeted hack
for <style> elements. :^)
Instead of appending character-at-a-time, we now buffer character
insertions in a StringBuilder, and flush them to the relevant node
whenever we start inserting into a new node (and when parsing ends.)
Last night I tried making a little test page that had a bunch of <img>
elements and nothing else. It didn't work.
Fix this by correctly adding a synthesized <html> element to the
document if we get something else in the "before html insertion mode.
You can still run the old parser with "br -O", but the new one is good
enough to be the default parser now. We'll fix issues as we go and
eventually remove the old one completely. :^)