Andreas Kling
d058addd74
LibWeb: Handle "dd" and "dt" end tags during "in body"
2020-05-30 23:00:35 +02:00
Andreas Kling
ca6fbefbc9
LibWeb: Support parsing "select" elements (outside of tables)
2020-05-30 19:58:52 +02:00
Andreas Kling
60352c7b9b
LibWeb: Hack the parser to dodge <template> elements in <head> for now
2020-05-30 19:23:04 +02:00
Andreas Kling
1212485348
LibWeb: Fix typo in StackOfOpenElements::topmost_special_node_below()
...
Backwards iteration works better if we actually go backwards! :^)
2020-05-30 18:49:48 +02:00
Andreas Kling
ca23db10ef
LibWeb: Don't crash when encountering <svg> or <math> elements
...
Just treat them like unknown elements for now. :^)
2020-05-30 18:46:39 +02:00
Andreas Kling
756829555a
LibWeb: Parse "textarea" tags during the "in body" insertion mode
...
Had to handle some more cases in the tokenizer to support this.
2020-05-30 18:40:23 +02:00
Andreas Kling
f4778d1ba0
LibWeb: Add missing special tag case in the "in body" insertion mode
2020-05-30 18:26:44 +02:00
Andreas Kling
5818ef2c80
LibWeb: Implement more table-related insertion modes
2020-05-30 18:26:44 +02:00
Andreas Kling
8c96b8174b
LibWeb: Handle AAA situation where there's no formatting element found
...
In this case, we're supposed to return from the AAA and then jump to a
different behavior in the "in body" insertion mode. So now we do that.
2020-05-30 17:47:50 +02:00
Andreas Kling
c9dd459822
LibWeb: Implement some more RAWTEXT stuff in the tokenizer
2020-05-30 17:47:50 +02:00
TheDumpap
d92c9d3772
LibWeb: Implement more of the tokenizer states
...
Slowly adding more unimplemented options for tokenizer states.
2020-05-30 17:47:50 +02:00
Andreas Kling
f662b1ea37
LibWeb: Implement enough parsing to parse the HTML spec front page :^)
...
We can now actually open http://html.spec.whatwg.org/ in Browser.
2020-05-30 13:07:47 +02:00
Andreas Kling
770372ad02
LibWeb: Handle end-of-file token during "in body" insertion mode
2020-05-30 12:40:12 +02:00
Andreas Kling
368044eabd
LibWeb: Flesh out the "in head" insertion mode and add missing cases
2020-05-30 12:28:12 +02:00
Andreas Kling
e82226f3fb
LibWeb: Handle two kinds of deferred script executions
...
This patch adds two script lists to Document:
- Scripts to execute when parsing has finished
- Scripts to execute as soon as possible
Since we don't actually load scripts asynchronously yet (we just do a
synchronous load when parsing the <script> element for simplicity),
these are already loaded by the time we get to "The end" of parsing.
2020-05-30 12:26:15 +02:00
Andreas Kling
62885b5646
LibWeb: Fix accidental swallow of self-closing tag tokens
...
Instead of dropping self-closing tags on the floor, we now emit them
into the token stream. :^)
2020-05-30 11:31:49 +02:00
Andreas Kling
fbd52047bb
LibWeb: Parse "form" tags during the "in body" insertion mode
2020-05-30 11:31:49 +02:00
Andreas Kling
851a0f983a
LibWeb: Tokenizing a semicolon-less HTML entity is (just a) parse error
...
No need to blow chunks over this.
2020-05-30 11:31:49 +02:00
Andreas Kling
b9d5d45eff
LibWeb: Handle an error condition for "a" start tag during "in body"
...
If we have an <a> element on the list of active formatting elements
when hitting another "a" start tag, that's a parse error. Recover by
using the AAA.
2020-05-30 11:31:49 +02:00
Andreas Kling
c8e0426ab9
LibWeb: Parser should prefer the longest matchable HTML entity
...
If we can match both "©" and "©" we should prefer the latter.
Also remove invalid FIXME's about case insensitive entities.
2020-05-30 11:31:49 +02:00
Andreas Kling
1ef5d609d9
AK+LibC: Add TODO() as an alternative to ASSERT_NOT_REACHED()
...
I've been using this in the new HTML parser and it makes it much easier
to understand the state of unfinished code branches.
TODO() is for places where it's okay to end up but we need to implement
something there.
ASSERT_NOT_REACHED() is for places where it's not okay to end up, and
something has gone wrong.
2020-05-30 11:31:49 +02:00
Andreas Kling
cfbd95f42a
LibWeb: Turn a bunch of ASSERT_NOT_REACHED() in the parser into TODO()
2020-05-30 11:31:49 +02:00
Andreas Kling
6854f726ce
LibWeb: Improve support for "a" and "li" during "in body" insertion
...
We can now parse welcome.html once again, without resorting to hacks
or fallbacks during "in body" :^)
2020-05-30 11:31:49 +02:00
Andreas Kling
30d64fccde
LibWeb: Parse "li" start tags in the "in body" insertion mode
2020-05-30 11:31:49 +02:00
Andreas Kling
2b1517f215
LibWeb: Add all branches from the parsing spec to "in body"
...
This makes us crash in TODO() more often, but it's better that we know
what's missing instead of incorrectly ending up on the fallback path.
2020-05-30 11:31:49 +02:00
Andreas Kling
68b1bdc234
LibWeb: Add a way to stop the new HTML parser
...
Some things are specced to "stop parsing", which basically just means
to stop fetching tokens and jump to "The end"
2020-05-28 18:55:18 +02:00
Andreas Kling
00b44ab148
LibWeb: Implement more of the "after body" insertion mode
2020-05-28 18:52:32 +02:00
Andreas Kling
cba5d59adc
LibWeb: Parse comments in the "in body" insertion mode
2020-05-28 18:46:39 +02:00
Andreas Kling
bb2f22577b
LibWeb: Implement a bunch more script-related tokenization states
2020-05-28 18:44:17 +02:00
Andreas Kling
4788bcd6f8
LibWeb: Add HTMLToken::make_character()
...
It's tedious to make character tokens manually all the time.
2020-05-28 18:43:52 +02:00
Andreas Kling
5f8cbe6a1b
LibWeb: Fix HTMLDocumentParser build
2020-05-28 18:20:55 +02:00
Andreas Kling
308cb69329
LibWeb: Remove a misplaced call to close_a_p_element() in "in body"
...
This should only be done for the corresponding start tags.
2020-05-28 18:18:20 +02:00
Andreas Kling
c84212aaba
LibWeb: Add a StackOfOpenElements helper for "popping until a tag name"
2020-05-28 18:18:20 +02:00
Andreas Kling
5e53c45113
LibWeb: Plumb content encoding into the new HTML parser
...
We still don't handle non-ASCII input correctly, but at least now we'll
convert e.g ISO-8859-1 to UTF-8 before starting to tokenize.
This patch also makes "view source" work with the new parser. :^)
2020-05-28 12:35:19 +02:00
Andreas Kling
772b51038e
LibWeb: Parse "input" tags during the "in body" insertion mode
2020-05-28 12:19:18 +02:00
Andreas Kling
7aa7a2078f
LibWeb: Parse "td" start tags during "in cell" insertion mode
2020-05-28 11:46:08 +02:00
Andreas Kling
5c35f3c9ba
LibWeb: Support named character references (e.g "&")
2020-05-28 11:44:19 +02:00
Andreas Kling
ebb1649a52
LibWeb: Implement more table support in the new HTML parser
...
This is enough to parse the Google front page! (Note: I did have to
hack the tokenizer while parsing Google, in order to avoid named
character references screwing everything up. We'll fix that too soon
enough!)
2020-05-28 00:27:46 +02:00
Andreas Kling
7f18c51f4c
LibWeb: Flesh out "reset the insertion mode appropriately" algorithm
2020-05-28 00:27:00 +02:00
Andreas Kling
2a97127faa
LibWeb: Handle various self-closing tags during "in body" insertion
...
We can now parse self-closing "<img>" tags correctly! :^)
2020-05-28 00:25:56 +02:00
Andreas Kling
f69001339f
LibWeb: Handle inline stylesheets a bit better in the new parser
...
While we're still supporting both the old and the new parser, we have
to deal with the way they load inline stylesheet (and scripts) a bit
differently.
The old parser loads all the text content up front, and then notifies
the containing element. The new parser creates the containing element
up front and appends text inside it afterwards.
For now, we simply do an empty "children_changed" notification when
first inserting a text node inside an element. This at least prevents
the CSS parser from choking on a single-character stylesheet.
2020-05-28 00:23:34 +02:00
Andreas Kling
3ce1af27dc
LibWeb: Parse documents without DOCTYPE gracefully
...
Seems like SOMEONE forgot to put a <!DOCTYPE html> on serenityos.org..
No matter, now we can handle it in the new parser! :^)
2020-05-28 00:22:08 +02:00
Andreas Kling
d25ffd3ed8
LibWeb: Fire a DOMContentLoaded event when the new parser is finished
...
With this change, we can finally load and render welcome.html :^)
2020-05-27 23:32:50 +02:00
Andreas Kling
db6cf9b37d
LibWeb: Implement the first half of the Adoption Agency Algorithm
...
The AAA is a somewhat daunting algorithm you have to run for certain
tag when inserted inside the <body> element. The purpose of it is to
resolve issues with mismatched tags.
This patch implements the first half of the AAA. We also move the
"list of active formatting elements" to its own class, since it kept
accumulating little behaviors. "Marker" entries are now signified by
null Element pointers in the list.
2020-05-27 23:22:42 +02:00
Andreas Kling
4c9c6b3a7b
LibWeb: Bring up basic external script execution in the new parser
...
This only works in some narrow cases, but should be enough for our own
welcome.html at least. :^)
2020-05-27 23:02:03 +02:00
Andreas Kling
39b5494aeb
LibWeb: Implement the "after attribute name" tokenizer state
...
One little step at a time towards parsing the monster blob of HTML we
get from twitter.com :^)
2020-05-27 18:30:29 +02:00
Andreas Kling
1b0c39ca60
LibWeb: Handle more benign parse errors in the "in body" insertion mode
2020-05-27 18:30:29 +02:00
Andreas Kling
1de29e3f59
LibWeb: Implement the "self closing start tag" tokenizer state
2020-05-27 18:30:29 +02:00
Andreas Kling
a5ce09f8e3
LibWeb: Implement partial support for numeric character references
2020-05-27 18:30:27 +02:00
TheDumpap
c700a30ce8
LibWeb: Handle additional parser inputs in "initial" and "before html".
2020-05-27 11:10:54 +02:00