Andreas Kling
03da686aa2
LibWeb: Ignore backslashes (\) in attribute selectors
...
This makes us at least parse selectors like [foo=bar\ baz] correctly.
The current solution here is quite hackish but the real fix will come
when we implement a spec-compliant CSS parser.
2020-06-10 15:50:07 +02:00
Andreas Kling
65c4e5cacf
LibWeb: Parse and match basic "contains" attribute selectors (~=)
2020-06-10 15:43:41 +02:00
Andreas Kling
e836f09094
LibWeb: Fix parser interpreting """ as """
...
There was a logic mistake in the entity parser that chose the shorter
matching entity instead of the longer. Fix this and make the entity
lists constexpr while we're here.
2020-06-10 10:34:28 +02:00
Andreas Kling
9b17bf3dcd
LibWeb: Use HTML::TagNames globals in the new HTML parser
2020-06-07 23:53:16 +02:00
Andreas Kling
1d94ca7cfc
LibWeb: Fix codepoint_from_entity() never returning an error
...
If we don't find a matching entity, return an empty Optional.
2020-06-07 19:13:56 +02:00
Andreas Kling
ab4c03ce2d
LibWeb: Fix tokenizer swallowing an extra token after a named entity
2020-06-07 19:09:03 +02:00
Andreas Kling
731685468a
LibWeb: Start fleshing out support for relative CSS units
...
This patch introduces support for more than just "absolute px" units in
our Length class. It now also supports "em" and "rem", which are units
relative to the font-size of the current layout node and the <html>
element's layout node respectively.
2020-06-07 17:55:46 +02:00
Andreas Kling
be6abce44f
LibWeb: Handle EOF tokens during "text" insertion
2020-06-06 16:36:18 +02:00
Luke
61d5bec739
LibWeb: Fully implement all script tokenizer states
...
Also fixes RAWTEXTLessThanSign having a separate emit and reconsume.
2020-06-06 09:55:15 +02:00
Andreas Kling
3337365000
LibWeb: Parse param/source/track start tags during "in body" insertion
2020-06-05 21:59:46 +02:00
Andreas Kling
b4591f0037
LibWeb: Fix parsing of "<textarea></textarea>"
...
When handling a "textarea" start tag, we have to ignore the next token
if it's an LF ('\n'). However, we were not switching the tokenizer
state before fetching the lookahead token, and this caused us to force
the tokenizer into the RCDATA state too late, effectively getting it
stuck in that state for way longer than it should be.
Fixes #2508 .
2020-06-05 12:05:42 +02:00
Andreas Kling
4e71684a3a
LibWeb: Fix missing tokenizer state change in RCDATALessThanSign
...
We can't RECONSUME_IN after we've used EMIT_CHARACTER since we'll have
returned from the function.
2020-06-05 12:02:30 +02:00
Andreas Kling
b59f4632d5
LibWeb: Unbreak character reference and DOCTYPE parsing post-UTF-8
...
Oops, these were still using the byte-offset cursor. My goodness is it
unergonomic to index into UTF-8 strings, but Dr. Bugaev says it's good.
There is lots of room for improvement here. Just like the rest of the
tokenizer and parser. We'll have to do a few optimization passes over
them once they mature.
2020-06-04 22:09:36 +02:00
Andreas Kling
b6288163f1
LibWeb: Make the new HTML parser parse input as UTF-8
...
We already convert the input to UTF-8 before starting the tokenizer,
so all this patch had to do was switch the tokenizer to use an Utf8View
for its input (and to emit 32-bit codepoints.)
2020-06-04 21:12:17 +02:00
Andreas Kling
19190267a6
LibWeb: Fix incorrectly consumed characters after reference tokens
...
The NumericCharacterReferenceEnd tokenizer state should not advance
the input stream.
2020-06-04 16:49:21 +02:00
Andreas Kling
ca33bc7895
LibWeb: Fix tokenization of attributes with empty attributes
...
We were neglecting to emit start tags for tags where the last attribute
had no value.
Also fix a parse error TODO that I hit while looking at this.
2020-06-04 12:00:09 +02:00
Kyle McLean
b9549078cc
LibWeb: Handle "html" end tag during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
a3bf3a5d68
LibWeb: Handle "xmp" start tag during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
c70bd0ba58
LibWeb: Handle "nobr" start tag during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
22521e57fd
LibWeb: Handle "form" end tag during "in body" if stack of open elements does not contain "template"
2020-06-04 09:09:33 +02:00
Kyle McLean
4edd0643a6
LibWeb: Handle NULL character during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
5e3972a946
LibWeb: Parse "body" end tags during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
1ad81e4833
LibWeb: Parse "br" end tags during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
9fca4b56d3
LibWeb: Parse end tags for "applet", "marquee", and "object" during "in body"
2020-06-04 09:09:33 +02:00
Andreas Kling
3c2fbc825c
LibWeb: Call children_changed() on text nodes when flushing characters
...
Now that we flush characters in a single place, we can call the Text's
children_changed() from there instead of having a goofy targeted hack
for <style> elements. :^)
2020-06-03 22:13:29 +02:00
Andreas Kling
c40de9275a
LibWeb: Buffer text node character insertions in the new parser
...
Instead of appending character-at-a-time, we now buffer character
insertions in a StringBuilder, and flush them to the relevant node
whenever we start inserting into a new node (and when parsing ends.)
2020-06-03 21:53:08 +02:00
Andreas Kling
a3936f10eb
LibWeb: Fix tokenizing scripts with '<' in them
...
The EMIT_CHARACTER_AND_RECONSUME_IN was emitting the current token
instead of the specified codepoint.
2020-06-02 14:27:53 +02:00
Andreas Kling
410fa5abe0
LibWeb: Parse barebones document without doctype, <html>, etc.
...
Last night I tried making a little test page that had a bunch of <img>
elements and nothing else. It didn't work.
Fix this by correctly adding a synthesized <html> element to the
document if we get something else in the "before html insertion mode.
2020-06-02 08:50:33 +02:00
Andreas Kling
e5ddb76a67
LibWeb: Support "td" and "th" start tags during "in table body"
...
This makes it possible to load Google Image Search results. You can't
see the images yet, but it's still something. :^)
2020-06-01 22:09:09 +02:00
Andreas Kling
77a3710e9d
LibWeb: Tokenize "anything else" in CommentLessThanSignBangDashDash
2020-06-01 20:14:23 +02:00
Andreas Kling
8766e49a7c
LibWeb+Browser: Use the new HTML parser by default
...
You can still run the old parser with "br -O", but the new one is good
enough to be the default parser now. We'll fix issues as we go and
eventually remove the old one completely. :^)
2020-06-01 19:08:31 +02:00
Andreas Kling
db93db8100
LibWeb: Put whining about tokenizer errors behind an #ifdef
...
Real web content has *tons* of tokenizer errors and we don't need to
complain every time as that makes the debug log unbearable.
2020-06-01 18:46:11 +02:00
Andreas Kling
5944abf31c
LibWeb: More parser cases in the "in body" and "after after body" modes
2020-06-01 18:46:11 +02:00
Andreas Kling
a775c2c717
LibWeb: Handle more cases in the SelfClosingStartTag tokenizer state
2020-06-01 18:46:11 +02:00
Andreas Kling
8429551368
LibWeb: Implement more of the "after head" insertion mode
2020-06-01 18:46:11 +02:00
Andreas Kling
f3b09ddd8e
LibWeb: Implement more of the ScriptDataEndTagName tokenizer state
...
Some of this is extremely repetitive. We'll need to rethink how we
do queue/emit to improve this.
2020-05-30 23:00:35 +02:00
Andreas Kling
d058addd74
LibWeb: Handle "dd" and "dt" end tags during "in body"
2020-05-30 23:00:35 +02:00
Andreas Kling
ca6fbefbc9
LibWeb: Support parsing "select" elements (outside of tables)
2020-05-30 19:58:52 +02:00
Andreas Kling
60352c7b9b
LibWeb: Hack the parser to dodge <template> elements in <head> for now
2020-05-30 19:23:04 +02:00
Andreas Kling
1212485348
LibWeb: Fix typo in StackOfOpenElements::topmost_special_node_below()
...
Backwards iteration works better if we actually go backwards! :^)
2020-05-30 18:49:48 +02:00
Andreas Kling
ca23db10ef
LibWeb: Don't crash when encountering <svg> or <math> elements
...
Just treat them like unknown elements for now. :^)
2020-05-30 18:46:39 +02:00
Andreas Kling
756829555a
LibWeb: Parse "textarea" tags during the "in body" insertion mode
...
Had to handle some more cases in the tokenizer to support this.
2020-05-30 18:40:23 +02:00
Andreas Kling
f4778d1ba0
LibWeb: Add missing special tag case in the "in body" insertion mode
2020-05-30 18:26:44 +02:00
Andreas Kling
5818ef2c80
LibWeb: Implement more table-related insertion modes
2020-05-30 18:26:44 +02:00
Andreas Kling
8c96b8174b
LibWeb: Handle AAA situation where there's no formatting element found
...
In this case, we're supposed to return from the AAA and then jump to a
different behavior in the "in body" insertion mode. So now we do that.
2020-05-30 17:47:50 +02:00
Andreas Kling
c9dd459822
LibWeb: Implement some more RAWTEXT stuff in the tokenizer
2020-05-30 17:47:50 +02:00
TheDumpap
d92c9d3772
LibWeb: Implement more of the tokenizer states
...
Slowly adding more unimplemented options for tokenizer states.
2020-05-30 17:47:50 +02:00
Andreas Kling
f662b1ea37
LibWeb: Implement enough parsing to parse the HTML spec front page :^)
...
We can now actually open http://html.spec.whatwg.org/ in Browser.
2020-05-30 13:07:47 +02:00
Andreas Kling
770372ad02
LibWeb: Handle end-of-file token during "in body" insertion mode
2020-05-30 12:40:12 +02:00
Andreas Kling
368044eabd
LibWeb: Flesh out the "in head" insertion mode and add missing cases
2020-05-30 12:28:12 +02:00