1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-17 22:15:07 +00:00
Commit graph

36 commits

Author SHA1 Message Date
Andreas Kling
e44c87cfff LibWeb: Implement enough HTML parsing to handle a small simple DOM :^)
We can now parse a little DOM like this:

<!DOCTYPE html>
<html>
    <head></head>
    <body>
        <div></div>
    </body>
</html>

This is pretty slow work, but the incremental progress is satisfying!
2020-05-24 00:49:22 +02:00
Andreas Kling
fd1b31d0ff LibWeb: Start building the tree building part of the new HTML parser
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.

The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)

This is going to become pretty large, but it's pretty cool!
2020-05-24 00:14:23 +02:00
Andreas Kling
e45c8b842c LibWeb: Implement a bit more of DOCTYPE tokenization 2020-05-23 21:08:25 +02:00
Andreas Kling
7be36366be LibWeb: Emit character/comment tokens lazily to accumulate more data
Instead of emitting data-bearing tokens immediately, do it lazily at
the next state change. This allows us to accumulate full bursts of
text in between tags instead of having one token per character. :^)
2020-05-23 18:44:32 +02:00
Andreas Kling
45450c7edc LibWeb: Make BEGIN_STATE and END_STATE include some {{{ and }}}
This makes it a compile error to omit the END_STATE. Also add some more
missing END_STATE's exposed by this (nice!)

Thanks to @predmond for suggesting the multi-pair trick! :^)
2020-05-23 15:25:43 +02:00
Andreas Kling
2e4147d0fc LibWeb: Add missing END_STATE for TagName
Fixes #2339.
2020-05-23 10:33:23 +02:00
Andreas Kling
a58500fdc5 LibWeb: Teach HTMLTokenizer how to tokenize comments
We can now correctly tokenize the welcome.html test page. :^)
2020-05-23 01:54:26 +02:00
Andreas Kling
6caa5661f3 LibWeb: Teach HTMLTokenizer how to tokenize attributes
Properly tokenize single-quoted, double-quoted and unquoted attributes!
2020-05-23 01:22:15 +02:00
Andreas Kling
004ef9a86b LibWeb: Minor tweaks to HTMLToken declaration 2020-05-22 23:45:02 +02:00
Andreas Kling
272b35d2e1 LibWeb: Begin work on a spec-compliant HTML parser
In order to actually view the web as it is, we're gonna need a proper
HTML parser. So let's build one!

This patch introduces the Web::HTMLTokenizer class, which currently
operates on a StringView input stream where it fetches (ASCII only atm)
codepoints and tokenizes acccording to the HTML spec tokenization algo.

The tokenizer state machine looks a bit weird but is written in a way
that tries to mimic the spec as closely as possible, in order to make
development easier and bugs less likely.

This initial version is far from finished, but it can parse a trivial
document with a DOCTYPE and open/close tags. :^)
2020-05-22 21:46:13 +02:00
Sergey Bugaev
c00076de82 LibWeb: Update the CSS prefix to -libweb 2020-05-21 14:15:49 +02:00
Andreas Kling
25cfdf3f67 LibWeb: Parse &quot; into '"' 2020-05-21 12:27:08 +02:00
Hüseyin ASLITÜRK
241df7206e LibWeb: HTML Parser, handle html escaped characters
Convert HTML escaped (&#XXX;)  characters to string.
2020-05-21 01:19:42 +02:00
Linus Groh
7bfd24ca76 LibWeb: Support the :root pseudo class 2020-05-14 08:49:51 +02:00
Linus Groh
2f29e61203 LibWeb: Make CSS pseudo classes case-insensitive 2020-05-14 08:49:51 +02:00
Linus Groh
cbd746e3ec LibWeb: Support "transparent" CSS color value 2020-05-13 19:25:49 +02:00
Linus Groh
57857cd8f6 LibWeb: Make parsing of most CSS values case-insensitive
These are all valid:

width: AUTO;
height: 10PX;
color: LiMeGrEeN;
2020-05-13 19:25:49 +02:00
Andreas Kling
5f9d80d8bc LibWeb: Add basic support for CSS percentages
Many properties can now have percentage values that get resolved in
layout. The reference value (what is this a percentage *of*?) differs
per property, so I've added a helper where you provide a reference
value as an added parameter to the existing length_or_fallback().
2020-05-11 23:07:30 +02:00
Linus Groh
673527d314 LibWeb: Ignore parsed pseudo-element selectors & empty complex selectors
Currently we don't deal with them, so they shouldn't return a
SimpleSelector - that'd be a false positive.

Also don't produce a ComplexSelector if no SimpleSelector was parsed.

This fixes a couple of rendering issues on awesomekling.github.io:
link colours, footer size, content max-width (and possibly more!)
2020-05-11 10:48:54 +02:00
Andreas Kling
8a40294f42 LibWeb: Turn some HTML entities into nicer text in the parser 2020-05-05 15:50:28 +02:00
Andreas Kling
6676f2c259 LibWeb: Don't emit a simple selector if nothing was consumed 2020-05-05 15:50:28 +02:00
Linus Groh
055e955a1c LibWeb: Recognise :focus pseudo-class
It's still only a dummy as LibWeb doesn't have focused elements yet, but
at least now we don't treat "selector:focus" as just "selector".

This fixes an issue on google.com which was mostly grey - coming from
some menu item focus styles :^)
2020-05-05 13:16:33 +02:00
Andreas Kling
e09b83c60c LibTextCodec: Start fleshing out a simple text codec library
We're starting with a very basic decoding API and only ISO-8859-1 and
UTF-8 decoding (and UTF-8 decoding is really a no-op since String is
expected to be UTF-8.)
2020-05-03 23:01:58 +02:00
Andreas Kling
f3676ebef5 LibWeb: Handle iso-8859-1 web content a little bit better
We now look at the HTTP response headers for a Content-Type header and
try to parse it if present to find the text encoding.

If the text encoding is iso-8859-1, we turn all non-ASCII characters
into question marks. This makes Swedish Google load on my machine! :^)
2020-05-03 23:01:58 +02:00
Andreas Kling
88908be350 LibWeb: Parse <br/> into a self-closed br element
We were parsing "<br/>" as an open tag with the name "br/". This fixes
that specific scenario.

We also rename is_self_closing_tag() to is_void_element() to better fit
the specs.
2020-04-18 20:35:18 +02:00
Andreas Kling
7382b27c22 LibWeb: Add Origin concept (protocol, host, port tuple)
Every Document now has an Origin, found via Document::origin().
It's based on the URL of the document.

This will be used to implement things like the same-origin policy.
2020-04-07 23:01:45 +02:00
Andreas Kling
1d468ed6d3 AK: Stop allowing implicit downcast with RefPtr and NonnullRefPtr
We were allowing this dangerous kind of thing:

RefPtr<Base> base;
RefPtr<Derived> derived = base;

This patch changes the {Nonnull,}RefPtr constructors so this is no
longer possible.

To downcast one of these pointers, there is now static_ptr_cast<T>:

RefPtr<Derived> derived = static_ptr_cast<Derived>(base);

Fixing this exposed a ton of cowboy-downcasts in various places,
which we're now forced to fix. :^)
2020-04-05 11:19:00 +02:00
Andreas Kling
42f47da75d LibWeb: Treat '<' characters as part of the text inside <script>
When we encounter a '<' during HTML parsing, we now look ahead to see
if there is a full </script> coming, otherwise we treat it as text.

This makes it possible to use '<' in inline scripts. :^)
2020-04-04 21:01:58 +02:00
Andreas Kling
56ca91b9f8 LibWeb: Implement <script src> support for synchronous scripts
Scripts loaded in this way will block the parser until they finish
executing. This means that they see the DOM before the whole document
has been fully parsed. This is all normal, of course.

To make this work, I changed the way we notify DOM nodes about tree
insertion. The inserted_into() callbacks are now incrementally invoked
during parse, as each node is appended to its parent.

To accomodate inline scripts and inline style sheets, we now also have
a children_changed() callback which is invoked on any parent when it
has children added/removed.
2020-04-03 23:06:09 +02:00
Andreas Kling
06aec9667e LibWeb: Support more advanced selectors in document.querySelectorAll()
I made some mistakes in the selector parsing code. It's now able to
parse selectors composed of multiple complex selectors, instead of just
one complex selector.
2020-03-30 11:35:39 +02:00
Andreas Kling
0f7bcd4111 LibWeb: Add naive support for document.querySelectorAll()
This currently returns a JS::Array of elements matching a selector.
The more correct behavior would be to return a static NodeList, but as
we don't have NodeLists right now, that'll be a task for the future.
2020-03-30 11:35:39 +02:00
Andreas Kling
90a53b3520 LibWeb: Commit uncommitted text at the end of HTML parse
If there's any text left in the parse buffer at the end of HTML parsing
we now commit it as a Text node.
2020-03-25 18:50:10 +01:00
Andreas Kling
4dde36844b LibWeb: Add a DOM Event class (instead of events being simple strings)
This patch adds the Event base class, along with a MouseEvent subclass.
We now dispatch MouseEvent objects for mousedown, mouseup and mousemove
and these objects have the .offsetX and .offsetY properties.

Both of those properties are hard-coded at the moment. This will be
fixed in the next patch. :^)
2020-03-21 18:17:18 +01:00
myphs
0891f860f7 LibWeb: Add CSS property 'border'
This makes it possible to write shorter CSS. Instead of writing
.foo {
        border-width: 3px;
        border-style: solid;
        border-color: blue;
}
it is now possible to write
.foo {
        border: 3px solid blue;
}
while the order of values is irrelevant.
Currently only the basic values are supported. More values should be
added in the future.

Three more value specific parse functions were added:
parse_line_width, parse_color, and parse_line_style

Additionally a few test cases were added to borders.html.
2020-03-20 21:40:55 +01:00
Andreas Kling
f39e5352f0 LibWeb: Start working on DOM event support
This patch adds the EventTarget class and makes Node inherit from it.

You can register event listeners on an EventTarget, and when you call
dispatch_event() on it, the event listeners will get invoked.

An event listener is basically a wrapper around a JS::Function*.

This is pretty far from how DOM events should eventually work, but it's
a place to start and we'll build more on top of this. :^)
2020-03-18 17:13:22 +01:00
Andreas Kling
830a57c6b2 LibWeb: Rename directory LibHTML => LibWeb
Let's rename this to LibWeb since it aims to provide more parts of the
web platform than just HTML. :^)
2020-03-07 10:32:51 +01:00