In order to actually view the web as it is, we're gonna need a proper
HTML parser. So let's build one!
This patch introduces the Web::HTMLTokenizer class, which currently
operates on a StringView input stream where it fetches (ASCII only atm)
codepoints and tokenizes acccording to the HTML spec tokenization algo.
The tokenizer state machine looks a bit weird but is written in a way
that tries to mimic the spec as closely as possible, in order to make
development easier and bugs less likely.
This initial version is far from finished, but it can parse a trivial
document with a DOCTYPE and open/close tags. :^)
This patch is unfortunately rather large and might make some things feel
bloated, but it is necessary to fix a few flaws in LibJS, primarily
blindly coercing values to numbers without exception checks - i.e.
interpreter.argument(0).to_i32(); // can fail!!!
Some examples where the interpreter would actually crash:
var o = { toString: () => { throw Error() } };
+o;
o - 1;
"foo".charAt(o);
"bar".repeat(o);
To fix this, we now have the following...
to_double(Interpreter&)
to_i32()
to_i32(Interpreter&)
to_size_t()
to_size_t(Interpreter&)
...and a whole lot of exception checking.
There's intentionally no to_double(), use as_double() directly instead.
This way we still can use these convenient utility functions but don't
need to check for exceptions if we are sure the value already is a
number.
Fixes#2267.
Passing a Heap& to it only to then call interpreter() on that is weird.
Let's just give it the Interpreter& directly, like some of the other
to_something() functions.
There are now two API's on Value:
- Value::to_string(Interpreter&) -- may throw.
- Value::to_string_without_side_effects() -- will never throw.
These are some pretty big sweeping changes, so it's possible that I did
some part the wrong way. We'll work it out as we go. :^)
Fixes#2123.
Rather than printing them to stderr directly the parser now keeps a
Vector<Error>, which allows the "owner" of the parser to consume them
individually after parsing.
The Error struct has a message, line number, column number and a
to_string() helper function to format this information into a meaningful
error message.
The Function() constructor will now include an error message when
throwing a SyntaxError.
As suggested by @awesomekling in a code review and (initially) ignored
by me :^)
Implementation is roughly based on LibJS's trim_string(), but with a fix
for trimming all-whitespace strings.
Many properties can now have percentage values that get resolved in
layout. The reference value (what is this a percentage *of*?) differs
per property, so I've added a helper where you provide a reference
value as an added parameter to the existing length_or_fallback().
Currently we don't deal with them, so they shouldn't return a
SimpleSelector - that'd be a false positive.
Also don't produce a ComplexSelector if no SimpleSelector was parsed.
This fixes a couple of rendering issues on awesomekling.github.io:
link colours, footer size, content max-width (and possibly more!)
Previously we would only check if the border width property is empty and
skip drawing in that case, and enforcing a minimum width of 1px
otherwise - but "border: 0;" should not paint a border :^)
We now look at the Content-Type HTTP header when deciding how to render
some loaded content. If there is no Content-Type header (which will
always be the case when loading local files, for example), we make a
guess based on the URL filename.