1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-28 11:05:09 +00:00
Commit graph

185 commits

Author SHA1 Message Date
Luke
201cc1bfcc LibWeb: Assert we're parsing a fragment on fragment cases
The specification says that parts labelled as a "fragment case" will
only occur when parsing a fragment. It says that if it occurs when
not parsing a fragment, then it is a specification error.

We should probably assume at this point that it's an implementation
error. This fixes a few little mistakes that were caught out by this.

Also moves the context element outside insertion mode reset,
as other (unimplemented) parts refer to it, such as
"adjusted current node".

Also cleans up insertion mode reset.
2020-07-22 00:02:40 +02:00
Andreas Kling
685e006e27 LibWeb: Use "namespace Web::Foo {" since C++20 allows it :^)
Thanks @nico for teaching me about this!
2020-07-21 16:23:08 +02:00
Luke
19d6884529 LibWeb: Implement quirks mode detection
This allows us to determine which mode to render the page in.

Exposes "doctype" and "compatMode" on Document.
Exposes "name", "publicId" and "systemId" on DocumentType.
2020-07-21 01:08:32 +02:00
Nico Weber
e9d18e35d6 LibWeb: Move "Stop parsing!" behind PARSER_DEBUG
This makes SerenityOS's IRC client a lot less chatty.
2020-07-06 17:03:26 +02:00
Luke
2df69317f1 LibWeb: Implement almost all missing tokenizer cases 2020-06-28 16:56:26 +02:00
Andreas Kling
7d3c8d066f LibWeb: Support "pt" length units :^) 2020-06-28 15:25:32 +02:00
Andreas Kling
38d6cc8598 LibWeb: Convert uppercase selector tag names to lowercase internally
This is necessary for some older content to work correctly. There's
probably a nicer (and correct-er) way to do this. Deferring to the
new CSS parser.
2020-06-28 12:58:04 +02:00
Andreas Kling
9e642827fc LibWeb: Don't tolerate unit-less lengths (except 0) in standards mode
"width: 500" is not a valid CSS property in standards mode and should
be ignored.

To plumb the quirks-mode flag into CSS parsing, this patch adds a new
CSS::ParsingContext object that must be passed to the CSS parser.
Currently it only allows you to check the quirks-mode flag. In the
future it will be a good place to put additional information needed
for things like relative URL resolution, etc.

This narrows <div class=parser> on ACID2 to the correct width. :^)
2020-06-28 12:46:40 +02:00
Kevin Meyer
22b20c381f LibWeb: Implement remaining missing tokenizer EOF cases 2020-06-27 13:27:10 +02:00
Andreas Kling
8e6522d034 LibWeb: Implement some missing tokenizer cases for EOF handling 2020-06-26 22:47:07 +02:00
theazgra
6a401a9bde LibWeb: Remove duplicate if branch in fragment parsing.
I noticed in the video the duplicate `if` check. This commit removes
the duplicated branch.
2020-06-26 11:58:53 +02:00
Andreas Kling
6293d1a13c LibWeb+Browser: Remove old HTML parser :^)
The new parser is now used everywhere and it's working pretty well!
2020-06-26 00:53:25 +02:00
Andreas Kling
92d831c25b LibWeb: Implement fragment parsing and use it for Element.innerHTML
This patch implements most of the HTML fragment parsing algorithm and
ports Element::set_inner_html() to it. This was the last remaining user
of the old HTML parser. :^)
2020-06-26 00:53:25 +02:00
Andreas Kling
3fefc7f3e9 LibWeb: Tweak CSS parser to swallow backslash-escaped characters
This isn't the correct way of doing this, but at least it allows the
parsing to progress a bit further in some cases.
2020-06-25 16:52:38 +02:00
Andreas Kling
4b2ac34725 LibWeb: Move the offset, margin and padding boxes into LayoutStyle 2020-06-24 18:06:21 +02:00
Andreas Kling
5744dd43c5 LibWeb: Remove default Length constructor and add make_auto()/make_px()
To prepare for adding an undefined/empty state for Length, let's first
move away from Length() creating an auto value.
2020-06-24 11:08:46 +02:00
Andreas Kling
d0312f6208 LibWeb: Handle empty inputs to the CSS parser
Empty inputs -> empty outputs.
2020-06-23 20:06:45 +02:00
Andreas Kling
3a5af6ef61 LibWeb: Remove hacky old ways of running <script> element contents
Now that we're using the new HTML parser, we don't have to do the weird
"run the script when inserted into the document, uhh, or when the text
content of the script element changes" dance.

Instead, we just follow the spec, and scripts run the way they should.
2020-06-23 16:45:01 +02:00
Andreas Kling
c33d17d363 LibWeb: Fix tokenization of attributes with URL query strings in them
<a href="/foo&amp=bar"> was being tokenized into <a href="/foo&=bar">.
The spec mentions this but I had overlooked it. The bug happens because
we interpreted the "&amp" as a named character reference.
2020-06-23 16:45:01 +02:00
Andreas Kling
07d976716f LibWeb: Remove most uses of the old HTML parser
The only remaining client of the old parser is the fragment parser used
by the Element.innerHTML setter. We'll need to implement a bit more
stuff in the new parser before we can switch that over.
2020-06-21 22:29:05 +02:00
Andreas Kling
dd7cd92de4 LibWeb: Fix two typo bugs in table parsing
These were flushed out by the earlier fix to "table scope". Without the
bad implementation of table scopes, ACID2 stopped parsing correctly.
2020-06-21 17:49:02 +02:00
Andreas Kling
15b5dfc794 LibWeb: A </table> inside <tbody> is not a parse error
This condition was backwards. Fixes parsing of google.com.
2020-06-21 17:42:00 +02:00
Andreas Kling
1c2b6b074e LibWeb: Fix misunderstood implementation of "table" and "select" scopes
These "stack of open elements" scopes are not supposed to include the
base list of element types.
2020-06-21 17:42:00 +02:00
Andreas Kling
966bc05fef LibWeb: Implement more of the foster parenting algorithm in the parser 2020-06-21 17:42:00 +02:00
stelar7
5eb39a5f61 LibWeb: Update parser with more insertion modes :^)
Implements handling of InHeadNoScript, InSelectInTable, InTemplate,
InFrameset, AfterFrameset, and AfterAfterFrameset.
2020-06-21 10:13:31 +02:00
Andreas Kling
6242e029ed LibWeb: Make Element::tag_name() return a const FlyString&
The more generic virtual variant is renamed to node_name() and now only
Element has tag_name(). This removes a huge amount of String ctor/dtor
churn in selector matching.
2020-06-16 19:09:14 +02:00
Andreas Kling
49cd03be95 LibWeb: Fix broken parsing of </form> during "in body" insertion 2020-06-15 20:31:19 +02:00
Andreas Kling
2f26d4c6a1 LibWeb: Fix broken parsing of </select> during "in select" insertion 2020-06-15 19:57:20 +02:00
Andreas Kling
17d26b92f8 LibWeb: Just ignore <script> elements that failed to load the script
We're never gonna be able to run them if we can't load them so just
let it go.
2020-06-15 18:37:48 +02:00
Luke
a01478c858 LibWeb: Fully implement HTML parser "in table" insertion mode
Also fixes some little mistakes in the "in body" insertion mode
that I found whilst cross-referencing.
2020-06-14 14:07:07 +02:00
Luke
6532c1e2fa LibWeb: Implement HTML parser "in column group" insertion mode 2020-06-14 14:07:07 +02:00
Luke
2241b09cd0 LibWeb: Implement HTML parser "in caption" insertion mode 2020-06-14 14:07:07 +02:00
Luke
a1838f676e LibWeb: Implement all CDATA tokenizer states
Even though we haven't implemented any switches to these states yet,
we may as well have them ready for when we do implement the switches.
2020-06-14 13:47:19 +02:00
Luke
821312729a LibWeb: Fully implement all DOCTYPE tokenizer states
Also fixes TagOpen having a seperate emit and reconsume in
ANYTHING_ELSE.
2020-06-14 13:47:19 +02:00
Luke
ab1df177d8 LibWeb: Fully implement all comment tokenizer states 2020-06-14 13:47:19 +02:00
Andreas Kling
47df0cbbc8 LibWeb: Fix broken tokenization of hexadecimal character references
We were interpreting 'A'-'F' as decimal digits which didn't work right.
2020-06-13 13:46:12 +02:00
Andreas Kling
483b371a7b LibWeb: Parse and match the :visited pseudo-class (always fails)
If we don't do this, something like "a:visited" is parsed as "a" which
may then take precedence over a previous "a:link" etc.
2020-06-13 00:23:30 +02:00
Andreas Kling
fdfda6dec2 AK: Make string-to-number conversion helpers return Optional
Get rid of the weird old signature:

- int StringType::to_int(bool& ok) const

And replace it with sensible new signature:

- Optional<int> StringType::to_int() const
2020-06-12 21:28:55 +02:00
Andreas Kling
bd33bfd120 LibWeb: Whine about unrecognized CSS properties in debug log 2020-06-12 14:15:55 +02:00
Andreas Kling
03da686aa2 LibWeb: Ignore backslashes (\) in attribute selectors
This makes us at least parse selectors like [foo=bar\ baz] correctly.
The current solution here is quite hackish but the real fix will come
when we implement a spec-compliant CSS parser.
2020-06-10 15:50:07 +02:00
Andreas Kling
65c4e5cacf LibWeb: Parse and match basic "contains" attribute selectors (~=) 2020-06-10 15:43:41 +02:00
Andreas Kling
e836f09094 LibWeb: Fix parser interpreting "&quot;" as "&quot"
There was a logic mistake in the entity parser that chose the shorter
matching entity instead of the longer. Fix this and make the entity
lists constexpr while we're here.
2020-06-10 10:34:28 +02:00
Andreas Kling
9b17bf3dcd LibWeb: Use HTML::TagNames globals in the new HTML parser 2020-06-07 23:53:16 +02:00
Andreas Kling
1d94ca7cfc LibWeb: Fix codepoint_from_entity() never returning an error
If we don't find a matching entity, return an empty Optional.
2020-06-07 19:13:56 +02:00
Andreas Kling
ab4c03ce2d LibWeb: Fix tokenizer swallowing an extra token after a named entity 2020-06-07 19:09:03 +02:00
Andreas Kling
731685468a LibWeb: Start fleshing out support for relative CSS units
This patch introduces support for more than just "absolute px" units in
our Length class. It now also supports "em" and "rem", which are units
relative to the font-size of the current layout node and the <html>
element's layout node respectively.
2020-06-07 17:55:46 +02:00
Andreas Kling
be6abce44f LibWeb: Handle EOF tokens during "text" insertion 2020-06-06 16:36:18 +02:00
Luke
61d5bec739 LibWeb: Fully implement all script tokenizer states
Also fixes RAWTEXTLessThanSign having a separate emit and reconsume.
2020-06-06 09:55:15 +02:00
Andreas Kling
3337365000 LibWeb: Parse param/source/track start tags during "in body" insertion 2020-06-05 21:59:46 +02:00
Andreas Kling
b4591f0037 LibWeb: Fix parsing of "<textarea></textarea>"
When handling a "textarea" start tag, we have to ignore the next token
if it's an LF ('\n'). However, we were not switching the tokenizer
state before fetching the lookahead token, and this caused us to force
the tokenizer into the RCDATA state too late, effectively getting it
stuck in that state for way longer than it should be.

Fixes #2508.
2020-06-05 12:05:42 +02:00