serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-31 16:12:44 +00:00

Author	SHA1	Message	Date
Andreas Kling	483b371a7b	LibWeb: Parse and match the :visited pseudo-class (always fails) If we don't do this, something like "a:visited" is parsed as "a" which may then take precedence over a previous "a:link" etc.	2020-06-13 00:23:30 +02:00
Andreas Kling	fdfda6dec2	AK: Make string-to-number conversion helpers return Optional Get rid of the weird old signature: - int StringType::to_int(bool& ok) const And replace it with sensible new signature: - Optional<int> StringType::to_int() const	2020-06-12 21:28:55 +02:00
Andreas Kling	bd33bfd120	LibWeb: Whine about unrecognized CSS properties in debug log	2020-06-12 14:15:55 +02:00
Andreas Kling	03da686aa2	LibWeb: Ignore backslashes (\) in attribute selectors This makes us at least parse selectors like [foo=bar\ baz] correctly. The current solution here is quite hackish but the real fix will come when we implement a spec-compliant CSS parser.	2020-06-10 15:50:07 +02:00
Andreas Kling	65c4e5cacf	LibWeb: Parse and match basic "contains" attribute selectors (~=)	2020-06-10 15:43:41 +02:00
Andreas Kling	e836f09094	LibWeb: Fix parser interpreting """ as "&quot" There was a logic mistake in the entity parser that chose the shorter matching entity instead of the longer. Fix this and make the entity lists constexpr while we're here.	2020-06-10 10:34:28 +02:00
Andreas Kling	9b17bf3dcd	LibWeb: Use HTML::TagNames globals in the new HTML parser	2020-06-07 23:53:16 +02:00
Andreas Kling	1d94ca7cfc	LibWeb: Fix codepoint_from_entity() never returning an error If we don't find a matching entity, return an empty Optional.	2020-06-07 19:13:56 +02:00
Andreas Kling	ab4c03ce2d	LibWeb: Fix tokenizer swallowing an extra token after a named entity	2020-06-07 19:09:03 +02:00
Andreas Kling	731685468a	LibWeb: Start fleshing out support for relative CSS units This patch introduces support for more than just "absolute px" units in our Length class. It now also supports "em" and "rem", which are units relative to the font-size of the current layout node and the <html> element's layout node respectively.	2020-06-07 17:55:46 +02:00
Andreas Kling	be6abce44f	LibWeb: Handle EOF tokens during "text" insertion	2020-06-06 16:36:18 +02:00
Luke	61d5bec739	LibWeb: Fully implement all script tokenizer states Also fixes RAWTEXTLessThanSign having a separate emit and reconsume.	2020-06-06 09:55:15 +02:00
Andreas Kling	3337365000	LibWeb: Parse param/source/track start tags during "in body" insertion	2020-06-05 21:59:46 +02:00
Andreas Kling	b4591f0037	LibWeb: Fix parsing of "<textarea></textarea>" When handling a "textarea" start tag, we have to ignore the next token if it's an LF ('\n'). However, we were not switching the tokenizer state before fetching the lookahead token, and this caused us to force the tokenizer into the RCDATA state too late, effectively getting it stuck in that state for way longer than it should be. Fixes #2508.	2020-06-05 12:05:42 +02:00
Andreas Kling	4e71684a3a	LibWeb: Fix missing tokenizer state change in RCDATALessThanSign We can't RECONSUME_IN after we've used EMIT_CHARACTER since we'll have returned from the function.	2020-06-05 12:02:30 +02:00
Andreas Kling	b59f4632d5	LibWeb: Unbreak character reference and DOCTYPE parsing post-UTF-8 Oops, these were still using the byte-offset cursor. My goodness is it unergonomic to index into UTF-8 strings, but Dr. Bugaev says it's good. There is lots of room for improvement here. Just like the rest of the tokenizer and parser. We'll have to do a few optimization passes over them once they mature.	2020-06-04 22:09:36 +02:00
Andreas Kling	b6288163f1	LibWeb: Make the new HTML parser parse input as UTF-8 We already convert the input to UTF-8 before starting the tokenizer, so all this patch had to do was switch the tokenizer to use an Utf8View for its input (and to emit 32-bit codepoints.)	2020-06-04 21:12:17 +02:00
Andreas Kling	19190267a6	LibWeb: Fix incorrectly consumed characters after reference tokens The NumericCharacterReferenceEnd tokenizer state should not advance the input stream.	2020-06-04 16:49:21 +02:00
Andreas Kling	ca33bc7895	LibWeb: Fix tokenization of attributes with empty attributes We were neglecting to emit start tags for tags where the last attribute had no value. Also fix a parse error TODO that I hit while looking at this.	2020-06-04 12:00:09 +02:00
Kyle McLean	b9549078cc	LibWeb: Handle "html" end tag during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	a3bf3a5d68	LibWeb: Handle "xmp" start tag during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	c70bd0ba58	LibWeb: Handle "nobr" start tag during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	22521e57fd	LibWeb: Handle "form" end tag during "in body" if stack of open elements does not contain "template"	2020-06-04 09:09:33 +02:00
Kyle McLean	4edd0643a6	LibWeb: Handle NULL character during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	5e3972a946	LibWeb: Parse "body" end tags during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	1ad81e4833	LibWeb: Parse "br" end tags during "in body"	2020-06-04 09:09:33 +02:00
Kyle McLean	9fca4b56d3	LibWeb: Parse end tags for "applet", "marquee", and "object" during "in body"	2020-06-04 09:09:33 +02:00
Andreas Kling	3c2fbc825c	LibWeb: Call children_changed() on text nodes when flushing characters Now that we flush characters in a single place, we can call the Text's children_changed() from there instead of having a goofy targeted hack for <style> elements. :^)	2020-06-03 22:13:29 +02:00
Andreas Kling	c40de9275a	LibWeb: Buffer text node character insertions in the new parser Instead of appending character-at-a-time, we now buffer character insertions in a StringBuilder, and flush them to the relevant node whenever we start inserting into a new node (and when parsing ends.)	2020-06-03 21:53:08 +02:00
Andreas Kling	a3936f10eb	LibWeb: Fix tokenizing scripts with '<' in them The EMIT_CHARACTER_AND_RECONSUME_IN was emitting the current token instead of the specified codepoint.	2020-06-02 14:27:53 +02:00
Andreas Kling	410fa5abe0	LibWeb: Parse barebones document without doctype, <html>, etc. Last night I tried making a little test page that had a bunch of <img> elements and nothing else. It didn't work. Fix this by correctly adding a synthesized <html> element to the document if we get something else in the "before html insertion mode.	2020-06-02 08:50:33 +02:00
Andreas Kling	e5ddb76a67	LibWeb: Support "td" and "th" start tags during "in table body" This makes it possible to load Google Image Search results. You can't see the images yet, but it's still something. :^)	2020-06-01 22:09:09 +02:00
Andreas Kling	77a3710e9d	LibWeb: Tokenize "anything else" in CommentLessThanSignBangDashDash	2020-06-01 20:14:23 +02:00
Andreas Kling	8766e49a7c	LibWeb+Browser: Use the new HTML parser by default You can still run the old parser with "br -O", but the new one is good enough to be the default parser now. We'll fix issues as we go and eventually remove the old one completely. :^)	2020-06-01 19:08:31 +02:00
Andreas Kling	db93db8100	LibWeb: Put whining about tokenizer errors behind an #ifdef Real web content has tons of tokenizer errors and we don't need to complain every time as that makes the debug log unbearable.	2020-06-01 18:46:11 +02:00
Andreas Kling	5944abf31c	LibWeb: More parser cases in the "in body" and "after after body" modes	2020-06-01 18:46:11 +02:00
Andreas Kling	a775c2c717	LibWeb: Handle more cases in the SelfClosingStartTag tokenizer state	2020-06-01 18:46:11 +02:00
Andreas Kling	8429551368	LibWeb: Implement more of the "after head" insertion mode	2020-06-01 18:46:11 +02:00
Andreas Kling	f3b09ddd8e	LibWeb: Implement more of the ScriptDataEndTagName tokenizer state Some of this is extremely repetitive. We'll need to rethink how we do queue/emit to improve this.	2020-05-30 23:00:35 +02:00
Andreas Kling	d058addd74	LibWeb: Handle "dd" and "dt" end tags during "in body"	2020-05-30 23:00:35 +02:00
Andreas Kling	ca6fbefbc9	LibWeb: Support parsing "select" elements (outside of tables)	2020-05-30 19:58:52 +02:00
Andreas Kling	60352c7b9b	LibWeb: Hack the parser to dodge <template> elements in <head> for now	2020-05-30 19:23:04 +02:00
Andreas Kling	1212485348	LibWeb: Fix typo in StackOfOpenElements::topmost_special_node_below() Backwards iteration works better if we actually go backwards! :^)	2020-05-30 18:49:48 +02:00
Andreas Kling	ca23db10ef	LibWeb: Don't crash when encountering <svg> or <math> elements Just treat them like unknown elements for now. :^)	2020-05-30 18:46:39 +02:00
Andreas Kling	756829555a	LibWeb: Parse "textarea" tags during the "in body" insertion mode Had to handle some more cases in the tokenizer to support this.	2020-05-30 18:40:23 +02:00
Andreas Kling	f4778d1ba0	LibWeb: Add missing special tag case in the "in body" insertion mode	2020-05-30 18:26:44 +02:00
Andreas Kling	5818ef2c80	LibWeb: Implement more table-related insertion modes	2020-05-30 18:26:44 +02:00
Andreas Kling	8c96b8174b	LibWeb: Handle AAA situation where there's no formatting element found In this case, we're supposed to return from the AAA and then jump to a different behavior in the "in body" insertion mode. So now we do that.	2020-05-30 17:47:50 +02:00
Andreas Kling	c9dd459822	LibWeb: Implement some more RAWTEXT stuff in the tokenizer	2020-05-30 17:47:50 +02:00
TheDumpap	d92c9d3772	LibWeb: Implement more of the tokenizer states Slowly adding more unimplemented options for tokenizer states.	2020-05-30 17:47:50 +02:00

1 2 3

149 commits