We can now parse a little DOM like this:
<!DOCTYPE html>
<html>
<head></head>
<body>
<div></div>
</body>
</html>
This is pretty slow work, but the incremental progress is satisfying!
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.
The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)
This is going to become pretty large, but it's pretty cool!
Instead of emitting data-bearing tokens immediately, do it lazily at
the next state change. This allows us to accumulate full bursts of
text in between tags instead of having one token per character. :^)
This makes it a compile error to omit the END_STATE. Also add some more
missing END_STATE's exposed by this (nice!)
Thanks to @predmond for suggesting the multi-pair trick! :^)
In order to actually view the web as it is, we're gonna need a proper
HTML parser. So let's build one!
This patch introduces the Web::HTMLTokenizer class, which currently
operates on a StringView input stream where it fetches (ASCII only atm)
codepoints and tokenizes acccording to the HTML spec tokenization algo.
The tokenizer state machine looks a bit weird but is written in a way
that tries to mimic the spec as closely as possible, in order to make
development easier and bugs less likely.
This initial version is far from finished, but it can parse a trivial
document with a DOCTYPE and open/close tags. :^)
Many properties can now have percentage values that get resolved in
layout. The reference value (what is this a percentage *of*?) differs
per property, so I've added a helper where you provide a reference
value as an added parameter to the existing length_or_fallback().
Currently we don't deal with them, so they shouldn't return a
SimpleSelector - that'd be a false positive.
Also don't produce a ComplexSelector if no SimpleSelector was parsed.
This fixes a couple of rendering issues on awesomekling.github.io:
link colours, footer size, content max-width (and possibly more!)
It's still only a dummy as LibWeb doesn't have focused elements yet, but
at least now we don't treat "selector:focus" as just "selector".
This fixes an issue on google.com which was mostly grey - coming from
some menu item focus styles :^)
We're starting with a very basic decoding API and only ISO-8859-1 and
UTF-8 decoding (and UTF-8 decoding is really a no-op since String is
expected to be UTF-8.)
We now look at the HTTP response headers for a Content-Type header and
try to parse it if present to find the text encoding.
If the text encoding is iso-8859-1, we turn all non-ASCII characters
into question marks. This makes Swedish Google load on my machine! :^)
We were parsing "<br/>" as an open tag with the name "br/". This fixes
that specific scenario.
We also rename is_self_closing_tag() to is_void_element() to better fit
the specs.
Every Document now has an Origin, found via Document::origin().
It's based on the URL of the document.
This will be used to implement things like the same-origin policy.
We were allowing this dangerous kind of thing:
RefPtr<Base> base;
RefPtr<Derived> derived = base;
This patch changes the {Nonnull,}RefPtr constructors so this is no
longer possible.
To downcast one of these pointers, there is now static_ptr_cast<T>:
RefPtr<Derived> derived = static_ptr_cast<Derived>(base);
Fixing this exposed a ton of cowboy-downcasts in various places,
which we're now forced to fix. :^)
When we encounter a '<' during HTML parsing, we now look ahead to see
if there is a full </script> coming, otherwise we treat it as text.
This makes it possible to use '<' in inline scripts. :^)
Scripts loaded in this way will block the parser until they finish
executing. This means that they see the DOM before the whole document
has been fully parsed. This is all normal, of course.
To make this work, I changed the way we notify DOM nodes about tree
insertion. The inserted_into() callbacks are now incrementally invoked
during parse, as each node is appended to its parent.
To accomodate inline scripts and inline style sheets, we now also have
a children_changed() callback which is invoked on any parent when it
has children added/removed.
I made some mistakes in the selector parsing code. It's now able to
parse selectors composed of multiple complex selectors, instead of just
one complex selector.
This currently returns a JS::Array of elements matching a selector.
The more correct behavior would be to return a static NodeList, but as
we don't have NodeLists right now, that'll be a task for the future.
This patch adds the Event base class, along with a MouseEvent subclass.
We now dispatch MouseEvent objects for mousedown, mouseup and mousemove
and these objects have the .offsetX and .offsetY properties.
Both of those properties are hard-coded at the moment. This will be
fixed in the next patch. :^)
This makes it possible to write shorter CSS. Instead of writing
.foo {
border-width: 3px;
border-style: solid;
border-color: blue;
}
it is now possible to write
.foo {
border: 3px solid blue;
}
while the order of values is irrelevant.
Currently only the basic values are supported. More values should be
added in the future.
Three more value specific parse functions were added:
parse_line_width, parse_color, and parse_line_style
Additionally a few test cases were added to borders.html.
This patch adds the EventTarget class and makes Node inherit from it.
You can register event listeners on an EventTarget, and when you call
dispatch_event() on it, the event listeners will get invoked.
An event listener is basically a wrapper around a JS::Function*.
This is pretty far from how DOM events should eventually work, but it's
a place to start and we'll build more on top of this. :^)