1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-15 08:34:59 +00:00
Commit graph

147 commits

Author SHA1 Message Date
Sam Atkins
3bd14941c7 LibWeb: Switch to new CSS Parser :^)
Change all the places that were including the deprecated parser, to
include the new one instead, and then delete the old parser code.

`ParentNode::query_selector[_all]()` now treat their input as a
comma-separated list of selectors, instead of just one, and return
elements that match any of the selectors in that list. This is according
to these specs:

- querySelector/querySelectorAll:
https://dom.spec.whatwg.org/#ref-for-dom-parentnode-queryselector%E2%91%A0
- selector matching algorithm:
https://www.w3.org/TR/selectors-4/#match-against-tree
2021-08-02 19:01:25 +04:30
Brian Gianforcaro
217179a39f LibWeb: Remove unused header includes 2021-08-01 08:10:16 +02:00
TheFightingCatfish
08359ba578 LibWeb: Fix regression of "contenteditable" attribute 2021-07-31 17:39:28 +02:00
SeekingBlues
a13a5315a5 LibWeb: Fix incompatibility of attribute "contenteditable"
The previous behavior of mapping a missing value to the "inherit"
state is incompatible. Now, a missing value maps to the "true" state,
which is the expected behavior.
2021-07-28 23:47:58 +02:00
ovf
898b8ffcb6 LibWeb: Avoid assertion failure on parsing numeric character references 2021-07-28 18:32:22 +02:00
K-Adam
95f393ebcd LibWeb: Return null if an unknown canvas context type is requested 2021-07-27 23:48:23 +02:00
ovf
13c7d55320 LibWeb: Fix parsing of character references in attribute values 2021-07-27 00:03:43 +02:00
Andreas Kling
c7d891765c LibGfx: Use "try_" prefix for static factory functions
Also mark them as [[nodiscard]].
2021-07-21 18:02:15 +02:00
Max Wipfli
ccae0cae45 LibWeb: Rename HTMLToken::doctype_data() => ensure_doctype_data()
This renames the accessor to better reflect what it does, as this will
allocate a DoctypeData struct if there is none.
2021-07-17 16:24:57 +04:30
Max Wipfli
519a1cdc22 LibWeb: Change HTMLToken storage architecture
This completely changes how HTMLTokens store their data. Previously,
space was allocated for all token types separately. Now, the HTMLToken's
data is stored in just a String, two booleans and a Variant.

This change reduces sizeof(HTMLToken) from 68 to 32. Also, this reduces
raw tokenization time by around 20 to 50 percent, depending on the page.
Full document parsing time (with HTMLDocumentParser, on a local HTML
page without any dependency files) is reduced by between 4 and 20
percent, depending on the page.

Since tokenizing HTML pages can easily generated 50'000 tokens and more,
the storage has been designed in a way that avoids heap allocations
where possible, while trying to reduce the size of the tokens. The only
tokens which need to allocate on the heap are thus DOCTYPE tokens (max.
1 per document), and tag tokens (but only if they have attributes). This
way, only around 5 percent of all tokens generated need to allocate on
the heap (except for StringImpl allocations).
2021-07-17 16:24:57 +04:30
Max Wipfli
8a4c44db8c LibWeb: Make HTMLTokens non-copyable 2021-07-17 16:24:57 +04:30
Max Wipfli
7eb294df0d LibWeb: Move HTMLToken in HTMLDocumentParser
This replaces a copy construction of an HTMLToken with a move(). This
allows HTMLToken to be made non-copyable in a further commit.
2021-07-17 16:24:57 +04:30
Max Wipfli
2532bdfabf LibWeb: Remove friend class declarations from HTMLToken
Since all interaction with the HTMLToken class now happens over getters
and setters, there is no more need for HTMLTokenizer and
HTMLDocumentParser to have direct access to the members.
2021-07-17 16:24:57 +04:30
Max Wipfli
25cba4387b LibWeb: Add HTMLToken(Type) constructor and use it 2021-07-17 16:24:57 +04:30
Max Wipfli
f2e3c770f9 LibWeb: Use setter for HTMLToken::m_{start,end}_position 2021-07-17 16:24:57 +04:30
Max Wipfli
8b31e41692 LibWeb: Change HTMLToken::m_doctype into named DoctypeData struct
This is in preparation for an upcoming storage change of HTMLToken. In
contrast to the other token types, the accessor can hand out a mutable
reference to allow users to change parts of the DoctypeData easily.
2021-07-17 16:24:57 +04:30
Max Wipfli
918bde98b1 LibWeb: Hide implementation details of HTMLToken attribute list
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
2021-07-17 16:24:57 +04:30
Max Wipfli
15d8635afc LibWeb: User getter+setter for HTMLToken tag name and self-closing flag 2021-07-17 16:24:57 +04:30
Max Wipfli
1aeafcc58b LibWeb: Use getter and setter for Character type HTMLTokens
While storing the code point in a UTF-8 encoded String in horrendously
inefficient, this problem will be addressed at a later stage.
2021-07-17 16:24:57 +04:30
Max Wipfli
e8e9426b4f LibWeb: User getter and setter for Comment type HTMLTokens 2021-07-17 16:24:57 +04:30
Max Wipfli
f886aa15b8 LibWeb: Rename HTMLToken::AttributeBuilder struct to Attribute
This does not contain StringBuilders anymore, so it can do with a
simpler name: Attribute.
2021-07-17 16:24:57 +04:30
Max Wipfli
d82f3eb085 LibWeb: Make HTMLToken::{Position,AttributeBuilder} structs public
There was and is no reason for those to be private. Making them public
also allows us to explicitly specify the return type of some getters.
2021-07-17 16:24:57 +04:30
Max Wipfli
e22a34badb LibWeb: Fix assertion failures in HTMLTokenizer
The *TagName states are all very similar, so it seems to be correct to
apply the fix from #8761 to all of those states.

This fixes #8788.
2021-07-16 11:55:55 +02:00
Max Wipfli
2404ad6897 LibWeb: Fix assertion failure when tokenizing JS regex literals
This fixes parsing the following regular expression: /</g;

It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
2021-07-15 01:47:22 +02:00
Max Wipfli
bb2aed7d76 LibWeb: Correct behavior of Comment* states in HTMLTokenizer
Previously, this would lead to assertion failures when parsing HTML
comments. This fixes #8757.
2021-07-15 00:48:45 +02:00
Max Wipfli
af0b483123 LibWeb: VERIFY an empty builder when emitting tokens in HTMLTokenizer 2021-07-15 00:48:45 +02:00
Max Wipfli
045a6a566b LibWeb: Remove unused HTMLTokenizer::m_input member variable 2021-07-14 23:03:36 +02:00
Max Wipfli
35f32ac170 LibWeb: Change HTMLToken.h to east const style 2021-07-14 23:03:36 +02:00
Max Wipfli
125982943a LibWeb: Change HTMLTokenizer.{cpp,h} to east const style 2021-07-14 23:03:36 +02:00
Gunnar Beutner
300823c314 LibWeb: Use move() when enqueuing tokens in HTMLTokenizer
We're not using the current token anymore once it's enqueued so let's
use move() when enqueuing the tokens.
2021-07-14 23:03:36 +02:00
Gunnar Beutner
c3ad8e9a52 LibWeb: Remove StringBuilder from HTMLToken::m_comment_or_character 2021-07-14 23:03:36 +02:00
Gunnar Beutner
3aa202c432 LibWeb: Remove StringBuilder from HTMLToken::m_tag 2021-07-14 23:03:36 +02:00
Gunnar Beutner
901d71148b LibWeb: Remove StringBuilders from HTMLToken::AttributeBuilder 2021-07-14 23:03:36 +02:00
Gunnar Beutner
992964aa7d LibWeb: Remove StringBuilders from HTMLToken::m_doctype 2021-07-14 23:03:36 +02:00
Gunnar Beutner
2150609590 LibWeb: Remove more unused StringBuilders in HTMLToken
These fields aren't read anywhere but I didn't feel like removing
them outright.
2021-07-14 23:03:36 +02:00
Gunnar Beutner
d9e52997e2 LibWeb: Use an Optional<String> to track the last HTML start tag
Using an HTMLToken object here is unnecessary because the only
attribute we're interested in is the tag_name.
2021-07-14 23:03:36 +02:00
Luke
e9eae9d880 LibWeb: Add extracting character encoding from a meta content attribute
Some Gmail emails contain this.
2021-07-13 20:23:44 +02:00
Adam Hodgen
3e46e8fea8 LibWeb: Fix HTMLTable Element attributes
`Element::tag_name` return an uppercase version of the tag name. However
the `Web::HTML::TagNames` values are all lowercase.

This change fixes that using `Element::local_name`, which returns a
lowercase value.
2021-07-11 14:14:01 +02:00
Luke
a826df773e LibWeb: Make WrapperGenerator generate nullable wrapper types
Previously it was not doing so, and some code relied on this not being
the case.

In particular, set_caption, set_t_head and set_t_foot in
HTMLTableElement relied on this. This commit is not here to fix this,
so I added an assertion to make it equivalent to a reference for now.
2021-07-05 12:39:46 +02:00
Luke
62c015dc96 LibWeb: Implement the adoption steps for <template> elements
While I'm here with the cloning steps, let's implement this too.
2021-07-05 12:39:46 +02:00
Luke
a7fa757dd1 LibWeb: Implement the cloning steps for <template> elements 2021-07-05 12:39:46 +02:00
Luke
f7ad8c0f94 LibWeb: Add DOMParser
This allows you to invoke the HTML document parser and retrieve a
document as though it was loaded as a web page, minus any scripting
ability.

This does not currently support XML parsing.

This is used by YouTube (or more accurately, Web Components Polyfills)
to polyfill templates.
2021-07-05 12:39:46 +02:00
Luke
0ea50d44bf LibWeb: Check if scripting is disabled before running script
This is not a full check, it's just enough to prevent script execution
in DOMParser.
2021-07-05 12:39:46 +02:00
Andreas Kling
c8270dbe2e LibJS: Rename ScriptFunction => OrdinaryFunctionObject
These are basically what the spec calls "ordinary function objects",
so let's have the name reflect that. :^)
2021-06-27 22:36:04 +02:00
Andreas Kling
ba9d5c4d54 LibJS: Rename Function => FunctionObject 2021-06-27 22:36:04 +02:00
Andreas Kling
ee3a73ddbb AK: Rename downcast<T> => verify_cast<T>
This makes it much clearer what this cast actually does: it will
VERIFY that the thing we're casting is a T (using is<T>()).
2021-06-24 19:57:01 +02:00
Andreas Kling
dc65f54c06 AK: Rename Vector::append(Vector) => Vector::extend(Vector)
Let's make it a bit more clear when we're appending the elements from
one vector to the end of another vector.
2021-06-12 13:24:45 +02:00
Ali Mohammad Pur
8b3f8879c1 LibJS: Use an enum class instead of 'bool is_generator'
This avoid confusion in the order of the multiple boolean parameters
that exist.
2021-06-11 19:42:58 +04:30
Ali Mohammad Pur
3234697eca LibJS: Implement generator functions (only in bytecode mode) 2021-06-11 00:30:09 +02:00
Ali Mohammad Pur
71b4433b0d LibWeb+LibSyntax: Implement nested syntax highlighters
And use them to highlight javascript in HTML source.
This commit also changes how TextDocumentSpan::data is interpreted,
as it used to be an opaque pointer, but everyone stuffed an enum value
inside it, which made the values not unique to each highlighter;
that field is now a u64 serial id.
The syntax highlighters don't need to change their ways of stuffing
token types into that field, but a highlighter that calls another
nested highlighter needs to register the nested types for use with
token pairs.
2021-06-07 14:45:49 +04:30