Luke 
								
							 
						 
						
							
							
							
							
								
							
							
								19d6884529 
								
							 
						 
						
							
							
								
								LibWeb: Implement quirks mode detection  
							
							... 
							
							
							
							This allows us to determine which mode to render the page in.
Exposes "doctype" and "compatMode" on Document.
Exposes "name", "publicId" and "systemId" on DocumentType. 
							
						 
						
							2020-07-21 01:08:32 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Luke 
								
							 
						 
						
							
							
							
							
								
							
							
								2df69317f1 
								
							 
						 
						
							
							
								
								LibWeb: Implement almost all missing tokenizer cases  
							
							
							
						 
						
							2020-06-28 16:56:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kevin Meyer 
								
							 
						 
						
							
							
							
							
								
							
							
								22b20c381f 
								
							 
						 
						
							
							
								
								LibWeb: Implement remaining missing tokenizer EOF cases  
							
							
							
						 
						
							2020-06-27 13:27:10 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								8e6522d034 
								
							 
						 
						
							
							
								
								LibWeb: Implement some missing tokenizer cases for EOF handling  
							
							
							
						 
						
							2020-06-26 22:47:07 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								c33d17d363 
								
							 
						 
						
							
							
								
								LibWeb: Fix tokenization of attributes with URL query strings in them  
							
							... 
							
							
							
							<a href="/foo&=bar"> was being tokenized into <a href="/foo&=bar">.
The spec mentions this but I had overlooked it. The bug happens because
we interpreted the "&" as a named character reference. 
							
						 
						
							2020-06-23 16:45:01 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									stelar7 
								
							 
						 
						
							
							
							
							
								
							
							
								5eb39a5f61 
								
							 
						 
						
							
							
								
								LibWeb: Update parser with more insertion modes :^)  
							
							... 
							
							
							
							Implements handling of InHeadNoScript, InSelectInTable, InTemplate,
InFrameset, AfterFrameset, and AfterAfterFrameset. 
							
						 
						
							2020-06-21 10:13:31 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Luke 
								
							 
						 
						
							
							
							
							
								
							
							
								a1838f676e 
								
							 
						 
						
							
							
								
								LibWeb: Implement all CDATA tokenizer states  
							
							... 
							
							
							
							Even though we haven't implemented any switches to these states yet,
we may as well have them ready for when we do implement the switches. 
							
						 
						
							2020-06-14 13:47:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Luke 
								
							 
						 
						
							
							
							
							
								
							
							
								821312729a 
								
							 
						 
						
							
							
								
								LibWeb: Fully implement all DOCTYPE tokenizer states  
							
							... 
							
							
							
							Also fixes TagOpen having a seperate emit and reconsume in
ANYTHING_ELSE. 
							
						 
						
							2020-06-14 13:47:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Luke 
								
							 
						 
						
							
							
							
							
								
							
							
								ab1df177d8 
								
							 
						 
						
							
							
								
								LibWeb: Fully implement all comment tokenizer states  
							
							
							
						 
						
							2020-06-14 13:47:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								47df0cbbc8 
								
							 
						 
						
							
							
								
								LibWeb: Fix broken tokenization of hexadecimal character references  
							
							... 
							
							
							
							We were interpreting 'A'-'F' as decimal digits which didn't work right. 
							
						 
						
							2020-06-13 13:46:12 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								ab4c03ce2d 
								
							 
						 
						
							
							
								
								LibWeb: Fix tokenizer swallowing an extra token after a named entity  
							
							
							
						 
						
							2020-06-07 19:09:03 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Luke 
								
							 
						 
						
							
							
							
							
								
							
							
								61d5bec739 
								
							 
						 
						
							
							
								
								LibWeb: Fully implement all script tokenizer states  
							
							... 
							
							
							
							Also fixes RAWTEXTLessThanSign having a separate emit and reconsume. 
							
						 
						
							2020-06-06 09:55:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								4e71684a3a 
								
							 
						 
						
							
							
								
								LibWeb: Fix missing tokenizer state change in RCDATALessThanSign  
							
							... 
							
							
							
							We can't RECONSUME_IN after we've used EMIT_CHARACTER since we'll have
returned from the function. 
							
						 
						
							2020-06-05 12:02:30 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								b59f4632d5 
								
							 
						 
						
							
							
								
								LibWeb: Unbreak character reference and DOCTYPE parsing post-UTF-8  
							
							... 
							
							
							
							Oops, these were still using the byte-offset cursor. My goodness is it
unergonomic to index into UTF-8 strings, but Dr. Bugaev says it's good.
There is lots of room for improvement here. Just like the rest of the
tokenizer and parser. We'll have to do a few optimization passes over
them once they mature. 
							
						 
						
							2020-06-04 22:09:36 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								b6288163f1 
								
							 
						 
						
							
							
								
								LibWeb: Make the new HTML parser parse input as UTF-8  
							
							... 
							
							
							
							We already convert the input to UTF-8 before starting the tokenizer,
so all this patch had to do was switch the tokenizer to use an Utf8View
for its input (and to emit 32-bit codepoints.) 
							
						 
						
							2020-06-04 21:12:17 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								19190267a6 
								
							 
						 
						
							
							
								
								LibWeb: Fix incorrectly consumed characters after reference tokens  
							
							... 
							
							
							
							The NumericCharacterReferenceEnd tokenizer state should not advance
the input stream. 
							
						 
						
							2020-06-04 16:49:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								ca33bc7895 
								
							 
						 
						
							
							
								
								LibWeb: Fix tokenization of attributes with empty attributes  
							
							... 
							
							
							
							We were neglecting to emit start tags for tags where the last attribute
had no value.
Also fix a parse error TODO that I hit while looking at this. 
							
						 
						
							2020-06-04 12:00:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								a3936f10eb 
								
							 
						 
						
							
							
								
								LibWeb: Fix tokenizing scripts with '<' in them  
							
							... 
							
							
							
							The EMIT_CHARACTER_AND_RECONSUME_IN was emitting the current token
instead of the specified codepoint. 
							
						 
						
							2020-06-02 14:27:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								77a3710e9d 
								
							 
						 
						
							
							
								
								LibWeb: Tokenize "anything else" in CommentLessThanSignBangDashDash  
							
							
							
						 
						
							2020-06-01 20:14:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								db93db8100 
								
							 
						 
						
							
							
								
								LibWeb: Put whining about tokenizer errors behind an #ifdef  
							
							... 
							
							
							
							Real web content has *tons* of tokenizer errors and we don't need to
complain every time as that makes the debug log unbearable. 
							
						 
						
							2020-06-01 18:46:11 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								a775c2c717 
								
							 
						 
						
							
							
								
								LibWeb: Handle more cases in the SelfClosingStartTag tokenizer state  
							
							
							
						 
						
							2020-06-01 18:46:11 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								f3b09ddd8e 
								
							 
						 
						
							
							
								
								LibWeb: Implement more of the ScriptDataEndTagName tokenizer state  
							
							... 
							
							
							
							Some of this is extremely repetitive. We'll need to rethink how we
do queue/emit to improve this. 
							
						 
						
							2020-05-30 23:00:35 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								756829555a 
								
							 
						 
						
							
							
								
								LibWeb: Parse "textarea" tags during the "in body" insertion mode  
							
							... 
							
							
							
							Had to handle some more cases in the tokenizer to support this. 
							
						 
						
							2020-05-30 18:40:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								c9dd459822 
								
							 
						 
						
							
							
								
								LibWeb: Implement some more RAWTEXT stuff in the tokenizer  
							
							
							
						 
						
							2020-05-30 17:47:50 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									TheDumpap 
								
							 
						 
						
							
							
							
							
								
							
							
								d92c9d3772 
								
							 
						 
						
							
							
								
								LibWeb: Implement more of the tokenizer states  
							
							... 
							
							
							
							Slowly adding more unimplemented options for tokenizer states. 
							
						 
						
							2020-05-30 17:47:50 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								62885b5646 
								
							 
						 
						
							
							
								
								LibWeb: Fix accidental swallow of self-closing tag tokens  
							
							... 
							
							
							
							Instead of dropping self-closing tags on the floor, we now emit them
into the token stream. :^) 
							
						 
						
							2020-05-30 11:31:49 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								851a0f983a 
								
							 
						 
						
							
							
								
								LibWeb: Tokenizing a semicolon-less HTML entity is (just a) parse error  
							
							... 
							
							
							
							No need to blow chunks over this. 
							
						 
						
							2020-05-30 11:31:49 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								1ef5d609d9 
								
							 
						 
						
							
							
								
								AK+LibC: Add TODO() as an alternative to ASSERT_NOT_REACHED()  
							
							... 
							
							
							
							I've been using this in the new HTML parser and it makes it much easier
to understand the state of unfinished code branches.
TODO() is for places where it's okay to end up but we need to implement
something there.
ASSERT_NOT_REACHED() is for places where it's not okay to end up, and
something has gone wrong. 
							
						 
						
							2020-05-30 11:31:49 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								bb2f22577b 
								
							 
						 
						
							
							
								
								LibWeb: Implement a bunch more script-related tokenization states  
							
							
							
						 
						
							2020-05-28 18:44:17 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								5e53c45113 
								
							 
						 
						
							
							
								
								LibWeb: Plumb content encoding into the new HTML parser  
							
							... 
							
							
							
							We still don't handle non-ASCII input correctly, but at least now we'll
convert e.g ISO-8859-1 to UTF-8 before starting to tokenize.
This patch also makes "view source" work with the new parser. :^) 
							
						 
						
							2020-05-28 12:35:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								5c35f3c9ba 
								
							 
						 
						
							
							
								
								LibWeb: Support named character references (e.g "&")  
							
							
							
						 
						
							2020-05-28 11:44:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								39b5494aeb 
								
							 
						 
						
							
							
								
								LibWeb: Implement the "after attribute name" tokenizer state  
							
							... 
							
							
							
							One little step at a time towards parsing the monster blob of HTML we
get from twitter.com :^) 
							
						 
						
							2020-05-27 18:30:29 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								1de29e3f59 
								
							 
						 
						
							
							
								
								LibWeb: Implement the "self closing start tag" tokenizer state  
							
							
							
						 
						
							2020-05-27 18:30:29 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								a5ce09f8e3 
								
							 
						 
						
							
							
								
								LibWeb: Implement partial support for numeric character references  
							
							
							
						 
						
							2020-05-27 18:30:27 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								ecd25ce6c7 
								
							 
						 
						
							
							
								
								LibWeb: Allow HTML tokenizer to emit more than one token  
							
							... 
							
							
							
							Tokens are now put on a queue when emitted, and we always pop from that
queue when returning from next_token(). 
							
						 
						
							2020-05-26 15:50:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								406fd95f32 
								
							 
						 
						
							
							
								
								LibWeb: Flesh out the remaining DOCTYPE related tokenizer states  
							
							... 
							
							
							
							We can now parse public and system identifiers! Not super useful, but
at least we can do it :^) 
							
						 
						
							2020-05-25 19:51:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								556a6eea61 
								
							 
						 
						
							
							
								
								LibWeb: Checking for "DOCTYPE" should be case insensitive in tokenizer  
							
							
							
						 
						
							2020-05-25 19:51:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								45da08a1e6 
								
							 
						 
						
							
							
								
								LibWeb: A whole bunch of work towards spec-compliant <script> elements  
							
							... 
							
							
							
							This is still very unfinished, but there's at least a skeleton of code. 
							
						 
						
							2020-05-24 23:54:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								5d332c1f11 
								
							 
						 
						
							
							
								
								LibWeb: Parse enough to handle a <style> inside a <head> :^)  
							
							
							
						 
						
							2020-05-24 23:54:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								20911efd4d 
								
							 
						 
						
							
							
								
								LibWeb: More work on the HTML parser and tokenizer  
							
							... 
							
							
							
							The parser can now switch the state of the tokenizer! Very webby. :^) 
							
						 
						
							2020-05-24 23:54:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								96cc1138c0 
								
							 
						 
						
							
							
								
								LibWeb: Remove tokenizer's premature character buffering optimization  
							
							
							
						 
						
							2020-05-24 23:54:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Emanuele Torre 
								
							 
						 
						
							
							
							
							
								
							
							
								3f2158bbfe 
								
							 
						 
						
							
							
								
								LibWeb: HtmlTokenizer.cpp: fix ON_WHITESPACE macro  
							
							... 
							
							
							
							The "audible bell" character ('\a' U+0007) was treated as whitespace
while the "line feed" character ('\n' U+000a) was not.
'\a' is no longer considered whitespace.
'\n' is now considered whitespace. 
							
						 
						
							2020-05-24 09:47:28 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								e44c87cfff 
								
							 
						 
						
							
							
								
								LibWeb: Implement enough HTML parsing to handle a small simple DOM :^)  
							
							... 
							
							
							
							We can now parse a little DOM like this:
<!DOCTYPE html>
<html>
    <head></head>
    <body>
        <div></div>
    </body>
</html>
This is pretty slow work, but the incremental progress is satisfying! 
							
						 
						
							2020-05-24 00:49:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								fd1b31d0ff 
								
							 
						 
						
							
							
								
								LibWeb: Start building the tree building part of the new HTML parser  
							
							... 
							
							
							
							This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.
The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)
This is going to become pretty large, but it's pretty cool! 
							
						 
						
							2020-05-24 00:14:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								e45c8b842c 
								
							 
						 
						
							
							
								
								LibWeb: Implement a bit more of DOCTYPE tokenization  
							
							
							
						 
						
							2020-05-23 21:08:25 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								7be36366be 
								
							 
						 
						
							
							
								
								LibWeb: Emit character/comment tokens lazily to accumulate more data  
							
							... 
							
							
							
							Instead of emitting data-bearing tokens immediately, do it lazily at
the next state change. This allows us to accumulate full bursts of
text in between tags instead of having one token per character. :^) 
							
						 
						
							2020-05-23 18:44:32 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								45450c7edc 
								
							 
						 
						
							
							
								
								LibWeb: Make BEGIN_STATE and END_STATE include some {{{ and }}}  
							
							... 
							
							
							
							This makes it a compile error to omit the END_STATE. Also add some more
missing END_STATE's exposed by this (nice!)
Thanks to @predmond for suggesting the multi-pair trick! :^) 
							
						 
						
							2020-05-23 15:25:43 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								2e4147d0fc 
								
							 
						 
						
							
							
								
								LibWeb: Add missing END_STATE for TagName  
							
							... 
							
							
							
							Fixes  #2339 . 
						
							2020-05-23 10:33:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								a58500fdc5 
								
							 
						 
						
							
							
								
								LibWeb: Teach HTMLTokenizer how to tokenize comments  
							
							... 
							
							
							
							We can now correctly tokenize the welcome.html test page. :^) 
							
						 
						
							2020-05-23 01:54:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Andreas Kling 
								
							 
						 
						
							
							
							
							
								
							
							
								6caa5661f3 
								
							 
						 
						
							
							
								
								LibWeb: Teach HTMLTokenizer how to tokenize attributes  
							
							... 
							
							
							
							Properly tokenize single-quoted, double-quoted and unquoted attributes! 
							
						 
						
							2020-05-23 01:22:15 +02:00