1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-20 02:35:07 +00:00
Commit graph

898 commits

Author SHA1 Message Date
Andreas Kling
68b1bdc234 LibWeb: Add a way to stop the new HTML parser
Some things are specced to "stop parsing", which basically just means
to stop fetching tokens and jump to "The end"
2020-05-28 18:55:18 +02:00
Andreas Kling
00b44ab148 LibWeb: Implement more of the "after body" insertion mode 2020-05-28 18:52:32 +02:00
Andreas Kling
cba5d59adc LibWeb: Parse comments in the "in body" insertion mode 2020-05-28 18:46:39 +02:00
Andreas Kling
bb2f22577b LibWeb: Implement a bunch more script-related tokenization states 2020-05-28 18:44:17 +02:00
Andreas Kling
4788bcd6f8 LibWeb: Add HTMLToken::make_character()
It's tedious to make character tokens manually all the time.
2020-05-28 18:43:52 +02:00
Andreas Kling
42243d2e06 LibWeb: Rename Web::HtmlView => Web::PageView
This widget doesn't just view HTML, it views a web page. :^)
2020-05-28 18:22:54 +02:00
Andreas Kling
5f8cbe6a1b LibWeb: Fix HTMLDocumentParser build 2020-05-28 18:20:55 +02:00
Andreas Kling
308cb69329 LibWeb: Remove a misplaced call to close_a_p_element() in "in body"
This should only be done for the corresponding start tags.
2020-05-28 18:18:20 +02:00
Andreas Kling
c84212aaba LibWeb: Add a StackOfOpenElements helper for "popping until a tag name" 2020-05-28 18:18:20 +02:00
Matthew Olsson
5ae9419a06 LibJS: Object index properties have descriptors; Handle sparse indices
This patch adds an IndexedProperties object for storing indexed
properties within an Object. This accomplishes two goals: indexed
properties now have an associated descriptor, and objects now gracefully
handle sparse properties.

The IndexedProperties class is a wrapper around two other classes, one
for simple indexed properties storage, and one for general indexed
property storage. Simple indexed property storage is the common-case,
and is simply a vector of properties which all have attributes of
default_attributes (writable, enumerable, and configurable).

General indexed property storage is for a collection of indexed
properties where EITHER one or more properties have attributes other
than default_attributes OR there is a property with a large index (in
particular, large is '200' or higher).

Indexed properties are now treated relatively the same as storage within
the various Object methods. Additionally, there is a custom iterator
class for IndexedProperties which makes iteration easy. The iterator
skips empty values by default, but can be configured otherwise.
Likewise, it evaluates getters by default, but can be set not to.
2020-05-28 17:17:13 +02:00
Emanuele Torre
0b0036f430 LibWeb: replace some tab characters with spaces
also add missing "#pragma once" in StylePropertiesModel.h
2020-05-28 17:01:31 +02:00
Andreas Kling
0e777c0ac6 LibWeb: Fall back to block layout for unimplemented CSS display values
This seems to have a higher chance of generating somewhat recognizable
content compared to inline layout. This problem will gradually go away
as we implement more display values.
2020-05-28 12:44:34 +02:00
Andreas Kling
3d09bac888 LibWeb: Add default UA style for some table-related elements 2020-05-28 12:43:29 +02:00
Andreas Kling
5e53c45113 LibWeb: Plumb content encoding into the new HTML parser
We still don't handle non-ASCII input correctly, but at least now we'll
convert e.g ISO-8859-1 to UTF-8 before starting to tokenize.
This patch also makes "view source" work with the new parser. :^)
2020-05-28 12:35:19 +02:00
Andreas Kling
772b51038e LibWeb: Parse "input" tags during the "in body" insertion mode 2020-05-28 12:19:18 +02:00
Andreas Kling
7aa7a2078f LibWeb: Parse "td" start tags during "in cell" insertion mode 2020-05-28 11:46:08 +02:00
Andreas Kling
5c35f3c9ba LibWeb: Support named character references (e.g "&") 2020-05-28 11:44:19 +02:00
Andreas Kling
ebb1649a52 LibWeb: Implement more table support in the new HTML parser
This is enough to parse the Google front page! (Note: I did have to
hack the tokenizer while parsing Google, in order to avoid named
character references screwing everything up. We'll fix that too soon
enough!)
2020-05-28 00:27:46 +02:00
Andreas Kling
7f18c51f4c LibWeb: Flesh out "reset the insertion mode appropriately" algorithm 2020-05-28 00:27:00 +02:00
Andreas Kling
2a97127faa LibWeb: Handle various self-closing tags during "in body" insertion
We can now parse self-closing "<img>" tags correctly! :^)
2020-05-28 00:25:56 +02:00
Andreas Kling
f69001339f LibWeb: Handle inline stylesheets a bit better in the new parser
While we're still supporting both the old and the new parser, we have
to deal with the way they load inline stylesheet (and scripts) a bit
differently.

The old parser loads all the text content up front, and then notifies
the containing element. The new parser creates the containing element
up front and appends text inside it afterwards.

For now, we simply do an empty "children_changed" notification when
first inserting a text node inside an element. This at least prevents
the CSS parser from choking on a single-character stylesheet.
2020-05-28 00:23:34 +02:00
Andreas Kling
3ce1af27dc LibWeb: Parse documents without DOCTYPE gracefully
Seems like SOMEONE forgot to put a <!DOCTYPE html> on serenityos.org..
No matter, now we can handle it in the new parser! :^)
2020-05-28 00:22:08 +02:00
Andreas Kling
422e00c806 LibWeb: Add a "quirks mode" flag to Document
This doesn't do anything yet, but it will sooner or later. :^)
2020-05-28 00:20:36 +02:00
Andreas Kling
d25ffd3ed8 LibWeb: Fire a DOMContentLoaded event when the new parser is finished
With this change, we can finally load and render welcome.html :^)
2020-05-27 23:32:50 +02:00
Andreas Kling
db6cf9b37d LibWeb: Implement the first half of the Adoption Agency Algorithm
The AAA is a somewhat daunting algorithm you have to run for certain
tag when inserted inside the <body> element. The purpose of it is to
resolve issues with mismatched tags.

This patch implements the first half of the AAA. We also move the
"list of active formatting elements" to its own class, since it kept
accumulating little behaviors. "Marker" entries are now signified by
null Element pointers in the list.
2020-05-27 23:22:42 +02:00
Andreas Kling
4c9c6b3a7b LibWeb: Bring up basic external script execution in the new parser
This only works in some narrow cases, but should be enough for our own
welcome.html at least. :^)
2020-05-27 23:02:03 +02:00
Andreas Kling
2cb50f6750 LibWeb+Browser: Add ability to run Browser with the new HTML parser
You can now pass "-n" to the browser to use the new HTML parser.
It's not turned on by default since it's still very immature, but this
is a huge step towards bringing it into maturity. :^)
2020-05-27 21:57:30 +02:00
Andreas Kling
35040dd2c4 LibWeb: LayoutMode line_break_policy => LayoutMode layout_mode 2020-05-27 19:52:18 +02:00
Andreas Kling
39b5494aeb LibWeb: Implement the "after attribute name" tokenizer state
One little step at a time towards parsing the monster blob of HTML we
get from twitter.com :^)
2020-05-27 18:30:29 +02:00
Andreas Kling
1b0c39ca60 LibWeb: Handle more benign parse errors in the "in body" insertion mode 2020-05-27 18:30:29 +02:00
Andreas Kling
1de29e3f59 LibWeb: Implement the "self closing start tag" tokenizer state 2020-05-27 18:30:29 +02:00
Andreas Kling
a5ce09f8e3 LibWeb: Implement partial support for numeric character references 2020-05-27 18:30:27 +02:00
Matthew Olsson
dd08c992e8 LibJS: Simplify and normalize publicly-exposed Object functions
Previously, the Object class had many different types of functions for
each action. For example: get_by_index, get(PropertyName),
get(FlyString). This is a bit verbose, so these methods have been
shortened to simply use the PropertyName structure. The methods then
internally call _by_index if necessary. Note that the _by_index
have been made private to enforce this change.

Secondly, a clear distinction has been made between "putting" and
"defining" an object property. "Putting" should mean modifying a
(potentially) already existing property. This is akin to doing "a.b =
'foo'".

This implies two things about put operations:
    - They will search the prototype chain for setters and call them, if
      necessary.
    - If no property exists with a particular key, the put operation
      should create a new property with the default attributes
      (configurable, writable, and enumerable).

In contrast, "defining" a property should completely overwrite any
existing value without calling setters (if that property is
configurable, of course).

Thus, all of the many JS objects have had any "put" calls changed to
"define_property" calls. Additionally, "put_native_function" and
"put_native_property" have had their "put" replaced with "define".

Finally, "put_own_property" has been made private, as all necessary
functionality should be exposed with the put and define_property
methods.
2020-05-27 13:17:35 +02:00
Sergey Bugaev
fce49b3e32 LibGUI: Change GUI::KeyEvent::key() type to KeyCode
...instead of a plain int. Yay for some type safety.
2020-05-27 11:19:38 +02:00
AnotherTest
790915da54 LibWeb: Provide some properties to inspectors of ResourceLoader 2020-05-27 11:13:02 +02:00
TheDumpap
c700a30ce8 LibWeb: Handle additional parser inputs in "initial" and "before html". 2020-05-27 11:10:54 +02:00
Emanuele Torre
8d8c33833f LibWeb: s_initialized should be static in the AttributeNames initialiser 2020-05-27 09:57:38 +02:00
Andreas Kling
4ec8b9f6ee LibWeb: Use FlyString in FontCache keys 2020-05-26 23:45:48 +02:00
Andreas Kling
82444048de LibWeb: Add cached global attribute name FlyStrings
Instead of creating extremely common FlyStrings like "id" and "class"
on demand every time they are needed, we now have AttributeNames.h,
which provides Web::HTML::AttributeNames::{id,class_}

This avoids a bunch of string allocations during selector matching.
2020-05-26 23:45:43 +02:00
Andreas Kling
5069d380a8 LibWeb: Let Element cache its list of classes
Instead of string splitting every time you call Element::has_class(),
we now split the "class" attribute value when it changes, and cache
the individual classes as FlyStrings in Element::m_classes.

This makes has_class() significantly faster and moves the pain point
of selector matching somewhere else.
2020-05-26 23:07:19 +02:00
Andreas Kling
7ed80ae96c LibWeb: Make the CSS parser a little more tolerant to invalid CSS
Sometimes people put a '}' where it doesn't belong, or various other
things go wrong. 99% of the time, it's our fault, but either way,
this patch makes us not crash or infinite-loop in some common cases.

The real solution here is to write a proper CSS lexer-parser according
to the language spec, this is just a hack fix to make more sites load
at all.
2020-05-26 22:31:22 +02:00
Linus Groh
72c52466e0 LibWeb: Add more HTML entities
®, ß and all the lowercase and uppercase umlaut characters.
2020-05-26 22:23:09 +02:00
Andreas Kling
f01af62313 LibWeb: Basic support for display:inline-block with width:auto
We now implement the somewhat fuzzy shrink-to-fit algorithm when laying
out inline-block elements with both block and inline children.

Shrink-to-fit works by doing two speculative layouts of the entire
subtree inside the current block, to compute two things:

1. Preferred minimum width: If we made a line break at every chance we
   had, how wide would the widest line be?
2. Preferred width: We break only when explicitly told to (e.g "<br>")
   How wide would the widest line be?

We then shrink the width of the inline-block element to an appropriate
value based on the above, taking the available width in the containing
block into consideration (sans all the box model fluff.)

To make the speculative layouts possible, plumb a LayoutMode enum
throughout the layout system since it needs to be respected in various
places.

Note that this is quite hackish and I'm sure there are smarter ways to
do a lot of this. But it does kinda work! :^)
2020-05-26 22:02:27 +02:00
FalseHonesty
4e8bcda4d1 LibWeb: Add HTML copyright escape 2020-05-26 22:02:17 +02:00
Kevin Meyer
b85ab86c84 LibWeb: Fix step within reconstruct the active elements
In step 4 of the "renstruct the active formatting elements" algorithm it
says:
  Rewind: If there are no entries before entry in the list of active
  formatting elements, then jump to the step labeled create.

Prior to this patch, the implementation accorded to the spec only for
the first loop iteration.
2020-05-26 21:52:46 +02:00
Andreas Kling
4a9deddb4a LibWeb: The line-height should not be multiplied by the glyph height
This was causing very tall lines on many websites. We can now see the
section header thingy on google.com (although it's broken into lines
where it should not be..) :^)
2020-05-26 21:09:32 +02:00
Andreas Kling
7bb69bb9bf LibWeb: Implement immediate execution in HTMLScriptElement preparation
In some cases, Dr. HTML says we should execute the script right away
even if other scripts are running.
2020-05-26 15:55:18 +02:00
Andreas Kling
ecd25ce6c7 LibWeb: Allow HTML tokenizer to emit more than one token
Tokens are now put on a queue when emitted, and we always pop from that
queue when returning from next_token().
2020-05-26 15:50:05 +02:00
Sergey Bugaev
602c3fdb3a AK: Rename FileSystemPath -> LexicalPath
And move canonicalized_path() to a static method on LexicalPath.

This is to make it clear that FileSystemPath/canonicalized_path() only
perform *lexical* canonicalization.
2020-05-26 14:35:10 +02:00
Andreas Kling
8ff4ebb589 LibWeb: Add Element.getAttribute() and Element.setAttribute() :^) 2020-05-26 12:27:10 +02:00