1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-26 20:42:06 +00:00
Commit graph

123 commits

Author SHA1 Message Date
Sam Atkins
6c5450f9ce LibWeb: Report if anything is delaying load event, not the count
Some elements that delay the load event are more complicated than a
simple count will allow for. We'll implement those in a bit!
2023-12-01 10:28:02 +01:00
Andreas Kling
bfd354492e LibWeb: Put most LibWeb GC objects in type-specific heap blocks
With this change, we now have ~1200 CellAllocators across both LibJS and
LibWeb in a normal WebContent instance.

This gives us a minimum heap size of 4.7 MiB in the scenario where we
only have one cell allocated per type. Of course, in practice there will
be many more of each type, so the effective overhead is quite a bit
smaller than that in practice.

I left a few types unconverted to this mechanism because I got tired of
doing this. :^)
2023-11-19 22:00:48 +01:00
Shannon Booth
87a4a5b302 LibWeb: Remove FIXMe's for HTML attribute serialization steps
As far as I can tell all of these steps are just equivalent to using the
qualified name. Add some tests which cover some of these cases, and
remove the FIXME's.
2023-11-11 08:50:25 +01:00
Shannon Booth
96fc1741b5 LibWeb: Return an Optional<String> from HTMLToken::attribute
Move away from using a nullable StringView.
2023-11-11 08:50:25 +01:00
Shannon Booth
72bb928dd8 LibWeb: Add spec comments to HTMLParser::handle_in_body
I have been going down into a bit of a rabbit hole trying to figure out
why the namespace is not getting set up properly on certain attributes.
At one stage, I thought the issue might have been around here where
attributes were being adjusted (it is not). I started adding spec
comments to understand what was happening, and by the time I realised it
wasn't in this place, I was already in too deep!

Add a whole bunch of spec comments, and leave one or two minor FIXME's
where the spec seems to have changed since this was originally
implemented.
2023-11-11 08:50:25 +01:00
Shannon Booth
a8fd4fab00 LibWeb: Port HTMLParser::serialize_html_fragment from DeprecatedString 2023-11-11 08:50:25 +01:00
Shannon Booth
326b34c7c7 LibWeb: Port all callers of Element::namespace to Element::namespace_uri
Removing some more use of DeprecatedFlyString
2023-11-06 11:37:08 +01:00
Shannon Booth
c8a4fc6c1a LibWeb: Port HTML parser quirk public IDs to StringView
These were DeprecatedFlyStrings, but had no reason to be. We were not
making use of the O(1) lookup, so instead of porting it over to a
FlyString, just make it a StringView.
2023-11-06 11:37:08 +01:00
Shannon Booth
1f8d72da8e LibWeb: Port HTMLToken::to_deprecated_string to new AK String 2023-11-06 11:37:08 +01:00
Shannon Booth
4821d284c6 LibWeb: Add support for inline SVG element scripts 2023-11-05 11:16:16 +00:00
Shannon Booth
e5d45eeeb1 LibWeb: Properly append attributes to element when creating an Element
The main behavioural difference here is that the full qualified name is
appended to the element, rather than just the local name and value.
2023-11-05 11:16:16 +00:00
Shannon Booth
8fbf72b5bf LibWeb: Port HTMLToken prefix and namespace to Optional<FlyString>
Previously these were DeprecatedStrings that contained a null state.
After the null state was removed, the nullability of these members was
broken. This doesn't seem to cause any problems currently as the HTML
parser is not inserting attributes with their full qualified name, but
after we fix that problem, this bug surfaces.
2023-11-05 11:16:16 +00:00
Shannon Booth
fcde808308 LibWeb: Avoid copy of local_name in HTMLParser::create_element_for 2023-11-05 11:16:16 +00:00
Shannon Booth
907be5a96e LibWeb: Add spec comment for HTMLParser::adjusted_current_node
I've found myself looking at this function a bunch while debugging.
2023-11-05 11:16:16 +00:00
Andreas Kling
3ff81dcb65 LibWeb: Make Web::Namespace::Foo strings be FlyString
This required dealing with a *lot* of fallout, but it's all basically
just switching from DeprecatedFlyString to either FlyString or
Optional<FlyString> in a hundred places to accommodate the change.
2023-11-04 21:28:30 +01:00
Andreas Kling
6b20a109c6 LibWeb: Pass DOM namespace strings as FlyString in more places 2023-11-04 21:28:30 +01:00
Andreas Kling
b341aeb5c1 LibWeb: Switch HTMLToken and HTMLTokenizer to String & FlyString 2023-11-04 21:28:30 +01:00
Andreas Kling
f052823f5f LibWeb: Use FlyString for create_element() namespace strings 2023-11-04 21:28:30 +01:00
Shannon Booth
79ed72adb4 LibWeb: Port HTMLToken::make_start_tag from DeprecatedFlyString 2023-10-08 08:11:48 -04:00
Shannon Booth
7aac7002d1 LibWeb: Port SVG::TagNames from DeprecatedFlyString 2023-10-08 08:11:48 -04:00
Shannon Booth
d8635fe541 LibWeb: Port HTMLParser local name and value from DeprecatedString 2023-10-08 08:11:48 -04:00
Shannon Booth
4321606bba LibWeb: Port Element interface from DeprecatedString
This is the last IDL interface which was using DeprecatedString! :^)
2023-10-06 08:25:40 +02:00
Shannon Booth
ff72436448 LibWeb: Add a FlyString version of Element::tag_name
Renaming the DeprecatedString version of this function to
deprecated_tag_name. A FlyString is used here as we often need to
perform equality checks here, and the HTMLParser already has tag_name as
a FlyString.

Remove a FIXME while we're at it - we were already following the spec
there, and we still are :^)
2023-10-03 14:47:53 +01:00
Shannon Booth
9303e9e76f LibWeb: Port Element::local_name and TagNames from Deprecated String
Which pretty much needs to be done together due to the amount of places
where they are compared together.

This also involves porting over StackOfOpenElements over to FlyString
from DeprecatedFly string to prevent a gazillion calls to
`.to_deprecated_fly_string` calls in HTMLParser.
2023-10-03 14:47:53 +01:00
Shannon Booth
60c32f39a1 LibWeb: Do not crash when parsing a SVG script element
Just leave a FIXME dbgln message instead. This works around a crash seen
in html5test.com.
2023-09-23 11:41:57 +02:00
Shannon Booth
6de9d2820f LibWeb: Add spec comments to 'process the rules for foreign content' 2023-09-23 11:41:57 +02:00
Shannon Booth
b603e860af LibWeb: Port CharacterData from DeprecatedString to String
The existing implementation has some pre-existing issues where it is
incorrectly assumes that byte offsets are given through the IDL instead
of UTF-16 code units. While making these changes, leave some FIXMEs for
that.
2023-09-19 10:54:07 +02:00
Shannon Booth
e74031a396 LibWeb: Port Document interface from DeprecatedString to String 2023-09-16 11:17:19 +02:00
Shannon Booth
bcb6851c07 LibWeb: Port Text interface from DeprecatedString to String 2023-09-06 11:44:45 -04:00
Shannon Booth
cc1e4c5cb3 LibWeb: Port Comment interface from DeprecatedString to String 2023-09-06 11:44:45 -04:00
Andrew Kaster
6e64bf5464 LibWeb: Remove outdated old_queue_global_event_with_document
The FIXME here describes an old constraint on JS Interpreters which no
longer holds. It hails from a time when we had the global object and
JS realm attached to the document.
2023-08-28 12:57:05 +02:00
Shannon Booth
ebdfe2e863 LibWeb: Port DocumentType from DeprecatedString to String 2023-08-27 05:34:54 +02:00
MacDue
71baa8c31a LibWeb: Add CSSPixels::nearest_value_for(FloatingPoint)
This is intended to annotate conversions from unknown floating-point
values to CSSPixels, and make it more obvious the fp value will be
rounded to the nearest fixed-point value.
2023-08-26 23:53:45 +02:00
MacDue
360c0eb509 LibWeb: Remove implicit conversion from float and double to CSSPixels
In general it is not safe to convert any arbitrary floating-point value
to CSSPixels. CSSPixels has a resolution of 0.015625, which for small
values (e.g. scale factors between 0 and 1), can produce bad results
if converted to CSSPixels then scaled back up. In the worst case values
can underflow to zero and produce incorrect results.
2023-08-26 23:53:45 +02:00
Sam Atkins
8a8cc18cf4 LibWeb: Make StyleValue constructors infallible 2023-08-19 17:34:22 +02:00
Andreas Kling
e2740bd19d LibWeb: Don't overwrite existing text content when flushing HTML parser
If we run an inline script from the HTML parser, it may append a text
node to the current insertion point.

If there was text content immediately following the script element,
we would previously overwrite the script-inserted text content, due to
an oversight in the way we select an appropriate insertion point

This patch fixes the issue by only inserting parser content into
existing text nodes if they are empty.
2023-08-16 12:16:05 +02:00
Andreas Kling
72c9f56c66 LibJS: Make Heap::allocate<T>() infallible
Stop worrying about tiny OOMs. Work towards #20449.

While going through these, I also changed the function signature in many
places where returning ThrowCompletionOr<T> is no longer necessary.
2023-08-13 15:38:42 +02:00
Jonah
0b2da4f8c6 LibWeb: Add the default user agent MathML stylesheet
We now apply MathML's default user agent style sheet along with other
default styles. This sheet is not mixed in with the other styles in
CSS/Default.css because it is a namespaced stylesheet and so has to
be its own sheet.
2023-08-12 07:59:23 +01:00
Jonah
442602bec8 LibWeb: Generate MathML Elements
We will now generate MathML elements when parsing HTML.
2023-08-12 07:59:23 +01:00
Andreas Kling
22a858a0cb LibWeb: Don't parse inline style sheets during HTML fragment parsing
Some websites (like Reddit) like to instantiate "components" by setting
innerHTML to a huge chunk of stuff. Sometimes those huge chunks of stuff
contain inline style sheets (i.e `<style>` elements).

Before this change, we would end up parsing the CSS in those elements
multiple times, because we had no way of knowing that we were within
a fragment parser's temporary document.

This patch avoids the extra CSS parsing work by adding adding a flag to
Document that tells us it's being used by the fragment parser. Then, we
simply avoid parsing CSS for style elements in such documents. The CSS
then gets parsed immediately upon insertion into the proper DOM.
2023-08-09 17:09:28 +02:00
Shannon Booth
2b46e6f664 Everywhere: Update copyrights with my new serenityos.org e-mail :^) 2023-07-15 16:21:29 +02:00
Andreas Kling
5cdb394400 LibWeb: Make HTML parser flush all pending tokens in "in table text"
There were multiple bugs in the parsing algorithm for handling text
occurring inside a `table` element:

- When there was pending non-whitespace text inside a table, we only
  flushed one token instead of all pending tokens.

- Also, we didn't even flush one of the right tokens, but instead the
  token that caused the flush to happen.

- Once we started flushing the right tokens, it turned out we had not
  yet implemented character insertion points expressed as "before X".

- Finally, we were not exiting the "in table text" mode after flushing
  pending tokens, effectively getting us stuck in that mode until EOF.
2023-07-03 11:50:58 +02:00
Andreas Kling
8c3e5137f7 LibWeb: Add spec comments to HTML parser "in table text" insertion mode 2023-07-03 11:50:58 +02:00
Andreas Kling
87f0c1c353 LibWeb: Add spec comments to HTML parser "in table" insertion mode
Also remove some overly anxious FIXMEs about slight variance in spec
language. :^)
2023-07-03 11:50:58 +02:00
Andreas Kling
bac500b9ad LibWeb: Add spec comments to HTML parser "in row" insertion mode 2023-07-03 11:50:58 +02:00
Zhiyuan Guo
83345ba698 LibWeb: Don't crash when document.write a script with src attr
To abort the processing of any nested invocations of the tokenizer,
just return is enough in this case.
During the process of pending parsing blocking script, the
is_ready_to_be_parser_executed() check should be applied on the
blocking script, not the original script.
2023-06-03 12:22:01 +02:00
Shannon Booth
bc54560e59 LibWeb: Add Web::HTML::parse_legacy_color_value
This function follows the "rules for parsing a legacy color value"
which is used in some legacy attributes, such as 'bgcolor' in the body
element.
2023-05-28 13:24:37 +02:00
Sam Atkins
d16600a48b LibWeb: Propagate errors from StyleValue construction
Turns out we create a lot of these, mostly from places that don't return
ErrorOr. The yak stack grows.
2023-05-06 08:07:28 +02:00
Luke Wilde
f52ede23aa LibWeb: Return from "the end" during HTML fragment parsing
This will examine the algorithm known as "the end" from the HTML
specification, which executes when parsing HTML markup has completed,
and it's potential to observably run script or change certain
attributes.

This currently executes in our engine when parsing HTML received from
the internet during navigation, using document.{open,write,close},
setting the innerHTML attribute or using DOMParser. The latter two are
only possible by executing script.

This has been causing some issues in our engine, which will be shown
later, so we are considering removing the call to "the end" for these
two cases.

Spoiler: the implications of running "the end" for DOMParser will be
considered in the future. It is the only script-created HTML/XML parser
remaining after this commit that uses "the end", including it's XML
variant implemented as XMLDocumentBuilder::document_end().

This will only focus on setting the innerHTML attribute, which falls
under "HTML fragment parsing", which starts here in the specification:
https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments
44dd824764/Userland/Libraries/LibWeb/HTML/Parser/HTMLParser.cpp (L3491)

While you may notice our HTMLParser::parse_html_fragment returns `void`
and assume this means no scripts are executed because of our use of
`WebIDL::ExceptionOr<T>` and `JS::ThrowCompletionOr<T>`, note that
dispatched events will execute arbitrary script via a callback, catch
any exceptions, report them and not propagate them. This means that
while a function does not return an exception type, it can still
potentially execute script.

A breakdown of the steps of "the end" in the context of HTML fragment
parsing and its observability follows:
https://html.spec.whatwg.org/multipage/parsing.html#the-end
44dd824764/Userland/Libraries/LibWeb/HTML/Parser/HTMLParser.cpp (L221)

1. No-op, as we don't currently have speculative HTML parsing. Even if
   we did, we would instantly return after stopping the speculative
   HTML parser anyway.

2. No-op, document.{open,write,close} are not accessible from the
   temporary document.

3. No-op, document.readyState, window.navigation.timing and the
   readystatechange event are not accessible from the created temporary
   document.

4. This is presumably done so that reentrant invocation of the HTML
   parser from document.{write,close} during the firing of the events
   after step 4 ends up parsing from a clean state. This is a no-op, as
   the events after step 4 do not fire and are not accessible.

5. No-op, we set HTMLScriptElement::m_already_started to true when
   creating it whilst parsing an HTML fragment, which causes
   HTMLScriptElement::prepare_script to instantly bail, meaning
   `scripts_to_execute_when_parsing_has_finished` is always empty.

6. No-op, tasks are considered not runnable when the document does not
   have a browsing context, which is always the case in fragment
   parsing. Additionally, window.navigation.timing and the
   DOMContentLoaded event aren't reachable from the temporary document.

7. Almost a no-op, `scripts_to_execute_as_soon_as_possible` is always
   empty for the same reason as step 4. However, this step uses an
   unconditional `spin_until` call, which _is_ observable and causes
   one of the alluded to issues, which will be talked about later.

8. No-op, as delaying the load event has no purpose in this case, as
   the task in step 9 will set the current document readiness to
   "complete" and then return immediately after, as the temporary
   document has no browsing context, skipping the Window load event.
   However, this step causes another alluded to issue, which will be
   talked about later.

9. No-op, for the same reason as step 6. Additionally,
   document.readyState is not accessible from the temporary document
   and the temporary document has no browsing context, so navigation
   timing, the Window load event, the pageshow event, the Document load
   event and the `<iframe>` load steps are not executed at all.

10. No-op, as this flag is only set from window.print(), which is not
    accessible for this document.

11. No-op, as the temporary document is not accessible from anything
    else and will be immediately destroyed after HTML fragment parsing.

Additionally, browsing context containers (`<iframe>`, `<frame>` and
`<object>`) cannot run in documents with no browsing context:

- `<iframe>` and `<frame>` use "create a new child navigable":
https://html.spec.whatwg.org/multipage/document-sequences.html#create-a-new-child-navigable
44dd824764/Userland/Libraries/LibWeb/HTML/BrowsingContextContainer.cpp (L43-L45)

> 2. Let group be element's node document's browsing context's
     top-level browsing context's group.

This requires the element's node document's browsing context to be
non-null, but it is always null with the temporary document created for
HTML fragment parsing.

This is protected against here for `<iframe>`:
https://html.spec.whatwg.org/multipage/iframe-embed-object.html#the-iframe-element:the-iframe-element-6
44dd824764/Userland/Libraries/LibWeb/HTML/HTMLIFrameElement.cpp (L45)

> When an iframe element element is inserted into a document whose
  browsing context is non-null, the user agent must run these steps:
  1. Create a new child navigable for element.

This is currently not protected against for `<frame>` in the
specification:
https://html.spec.whatwg.org/multipage/obsolete.html#active-frame-element

> A frame element is said to be an active frame element when it is in a
  document.

> When a frame element element is created as an active frame element,
  or becomes an active frame element after not having been one, the
  user agent must run these steps:
>     1. Create a new child navigable for element.

However, since this would cause a null dereference, this is actually a
specification issue. See: https://github.com/whatwg/html/issues/9136

- `<object>` uses "queue an element task" and has a browsing context
  null check.
https://html.spec.whatwg.org/multipage/iframe-embed-object.html#the-object-element:queue-an-element-task
44dd824764/Userland/Libraries/LibWeb/HTML/HTMLObjectElement.cpp (L58)
44dd824764/Userland/Libraries/LibWeb/HTML/HTMLObjectElement.cpp (L105)

> ...the user agent must queue an element task on the DOM manipulation
  task source given the object element to run the following steps to
  (re)determine what the object element represents.

As established above, tasks are not runnable in documents with null
browsing contexts. However, for avoidance of doubt, it checks if the
document's browsing context is null, and if so, it falls back to
representing the element's children and gets rid of any child navigable
the `<object>` element may have.

> 2. If the element has an ancestor media element, or has an ancestor
     object element that is not showing its fallback content, or if the
     element is not in a document whose browsing context is non-null,
     or if the element's node document is not fully active, or if the
     element is still in the stack of open elements of an HTML parser
     or XML parser, or if the element is not being rendered, then jump
     to the step below labeled fallback.

> 4. Fallback: The object element represents the element's children.
     This is the element's fallback content. Destroy a child navigable
     given the element.

This check also protects against an `<object>` element being adopted
from a document which has a browsing context to one that doesn't during
the time between the element task being queued and then executed.

This means a browsing context container cannot be ran, meaning browsing
context containers cannot access their parent document and access the
properties and events mentioned in steps 1-11 above, or use
document.{open,write,close} on the parent document.

Another potential avenue of running script via HTML fragment parsing
is via custom elements being in the markup, which need to be
synchronously upgraded. For example:
```
<custom-element></custom-element>
```

However, this is already protected against in the spec:
https://html.spec.whatwg.org/multipage/parsing.html#create-an-element-for-the-token
44dd824764/Userland/Libraries/LibWeb/HTML/Parser/HTMLParser.cpp (L643)

> 7. If definition is non-null and the parser was not created as part
     of the HTML fragment parsing algorithm, then let will execute
     script be true. Otherwise, let it be false.

It is protected against overall by disabling custom elements via
returning `null` for all custom element definition lookups if the
document has no browsing context, which is the case for the temporary
document:
https://html.spec.whatwg.org/multipage/custom-elements.html#look-up-a-custom-element-definition
44dd824764/Userland/Libraries/LibWeb/DOM/Document.cpp (L2106-L2108)

> 2. If document's browsing context is null, return null.

This is because the document doesn't have an associated Window, meaning
there will be no associated CustomElementRegistry object.

After running the HTML fragment parser, all of the child nodes are
removed the temporary document and then adopted into the context
element's node document. Skipping the `pre_remove` steps as they are
not relevant in this case, let's first examine Node::remove()'s
potential to execute script, then examine Document::adopt_node() after.
https://dom.spec.whatwg.org/#concept-node-remove
44dd824764/Userland/Libraries/LibWeb/DOM/Node.cpp (L534)

1-7. Does not run any script, it just keeps a copy of some data that
     will be needed later in the algorithm and directly modifies live
     range attributes. However, since this relies on Range objects
     containing the temporary document, the Range steps are no-ops.

8. Though this uses the temporary document, it does not contain any
   NodeIterator objects as no script should have run, thus this
   callback will not be entered. Even if the document _did_ have
   associated NodeIterators, NodeIterator::run_pre_removing_steps does
   not execute any script.

9-11. Does not run any script, it just keeps a copy of some data that
      will be needed later in the algorithm and performs direct tree
      mutation to remove the node from the node tree.

12-14. "assign slottables" and step 13 queue mutation observer
       microtasks via "signal a slot change". However, since this is
       done _after_ running "the end", the "spin the event loop" steps
       in that algorithm does not affect this. Remember that queued
       microtasks due not execute during this algorithm for the next
       few steps.

Sidenote:
Microtasks are supposed to be executed when the JavaScript execution
context stack is empty. Since HTMLParser::parse_html_fragment is only
called from script, the stack will never be empty whilst it is running,
so microtasks will not run until some time after we exit this function.

15. This could potentially run script, let's have a look at the
    removal steps we currently have implemented in our engine:

- HTMLIFrameElement::removed_from()
  https://html.spec.whatwg.org/multipage/iframe-embed-object.html#the-iframe-element:the-iframe-element-7
  44cf92616e/Userland/Libraries/LibWeb/HTML/HTMLIFrameElement.cpp (L102)

  Since browsing context containers cannot create child browsing
  contexts (as shown above), this code will do nothing. This will also
  hold true when we implement HTMLFrameElement::removed_from() in the
  future.

- FormAssociatedElement::removed_from()
  44cf92616e/Userland/Libraries/LibWeb/HTML/FormAssociatedElement.h (L36)
  
  This calls `form_node_was_removed` which can then potentially call
  `reset_form_owner`. However, `reset_form_owner` only does tree
  traversal to find the appropriate form owner and does not execute
  any script. After calling `form_node_was_removed` it then calls
  `form_associated_element_was_removed`, which is a virtual function
  that no one currently overrides, meaning no script is executed.

- HTMLBaseElement::removed_from()
  44dd824764/Userland/Libraries/LibWeb/HTML/HTMLBaseElement.cpp (L45)
  
  This will call `Document::update_base_element` to do tree traversal
  to find out the new first `<base>` element with an href attribute and
  thus does not execute any script.

- HTMLStyleElement::removed_from()
  https://html.spec.whatwg.org/multipage/semantics.html#update-a-style-block
  44dd824764/Userland/Libraries/LibWeb/HTML/HTMLStyleElement.cpp (L49)
  
  This will call `update_a_style_block`, which will parse the `<style>`
  element's text content as CSS and create a style sheet from it. This
  does not execute any script.
  
In summary, step 15 does not currently execute any script and ideally
shouldn't in the future when we implement more `removed_from` steps.

16. Does not run any script, just saves a copy of a variable.

17. Queues a "disconnectedCallback" custom elements callback. This will
    execute script in the future, but not here.
    
18. Performs step 15 and 17 in combination for each of the node's
    descendants. This will not execute any script.
    
19. Does not run any script, it performs a requirement of mutation
    observers by adding certain things to a list.

20. Does not execute any script, as mutation observer callbacks are
    done via microtasks.

21. This will not execute script, as the parent is always the temporary
    document in HTML fragment parsing. There is no Document children
    changed steps, so this step is a no-op.
    
We then do layout invalidation which is our own addition, but this also
does not execute any script.

In short, removing a node does not execute any script. It could execute
script in the future, but since this is done by tasks, it will not
execute until we are outside of HTMLParser::parse_html_fragment.

Let's look at adopting a node:
https://dom.spec.whatwg.org/#concept-node-adopt
44dd824764/Userland/Libraries/LibWeb/DOM/Document.cpp (L1414)

1. Does not run script, it just keeps a reference to the temporary
   document.

2. No-op, we removed the node above.

3.1. Does not execute script, it simply updates all descendants of
     the removed node to be in the context element's node document.

3.2. Does not execute script, see node removal step 17.

3.3. This could potentially execute script, let's have a look at the
     adopting steps we have implemented in our engine:

- HTMLTemplateElement::adopted_from()
  https://html.spec.whatwg.org/multipage/scripting.html#the-template-element:concept-node-adopt-ext
  44dd824764/Userland/Libraries/LibWeb/HTML/HTMLTemplateElement.cpp (L38)

  This simply adopts the `<template>` element's DocumentFragment node
  into its inert document. This does not execute any script.
  
We then have our own addition of adopting NodeIterators over to the
context element's document, but this does not execute any script.

In short, adopting a node does not execute any script.

After adopting the nodes to the context element's document, HTML
fragment parsing is complete and the temporary document is no longer
accessible at all.

Document and element event handlers are also not accessible, even if
the event bubbles. This is simply because the temporary document is not
accessible, so tree traversal, IDL event handler attributes and
EventTarget#addEventListener are not accessible, on the document or any
descendants. Document is also not an Element, so element event handler
attributes do not apply.

In summary, this establishes that HTML fragment parsers should not run
any user script or internal C++ code that relies on things set up by
"the end". This means that the attributes set up and events fired by
"the end" are not observable in this case. This may have not explored
every single possible avenue, but the general assertion should still
hold. However, this assertion is violated by "the end" containing two
unconditional "spin the event loop" invocations and causes issues with
live web content, so we seek to avoid them.

As WebKit, Blink and Gecko have been able to get away with doing fast
path optimizations for HTML fragment parsing which don't setup
navigation timing, run events, etc. it is presumed we are able to get
away with not running "the end" for HTML fragment parsing as well.
WebKit: c69be377e1/Source/WebCore/dom/DocumentFragment.cpp (L90-L98)
Blink: 15444426f9/third_party/blink/renderer/core/editing/serializers/serialization.cc (L681-L702)
Gecko: 6fc2f6d533/dom/base/FragmentOrElement.cpp (L1991-L2002)

Removing the call to "the end" fixes at least a couple of issues:
- Inserting `<img>` elements via innerHTML causes us to spin forever.

  This regressed in 2413de7e10
  
  This is because `m_load_event_delayer.clear()` is performed inside an
  element task callback. Because of the reasons stated above, this will
  never execute. This caused us to spin forever on step 8 of "the end",
  which is delaying the load event.
  
  This affected Google Docs and Google Maps, never allowing them to
  progress after performing this action. I have also seen it cause a
  Scorecard Research `<img>` beacon in a `<noscript>` element inserted
  via innerHTML to spin forever. This presumably affects many more
  sites as well.
  
  Given that the Window load event is not fired for HTML fragment
  parsers, spinning the event loop to delay the load event does not
  change anything, meaning this step can be skipped entirely.
  
- Microtask timing is messed up by the unconditional `spin_until`s on
  steps 7 and 8.
  
  "Spin the event loop" causes an unconditional microtask checkpoint:
  https://html.spec.whatwg.org/multipage/webappapis.html#spin-the-event-loop
  44dd824764/Userland/Libraries/LibWeb/HTML/EventLoop/EventLoop.cpp (L54)
  
  > 3. Let old stack be a copy of the JavaScript execution context
       stack.
  > 4. Empty the JavaScript execution context stack.
  > 5. Perform a microtask checkpoint.
  > 6.2.1. Replace the JavaScript execution context stack with old
           stack.
           
  This broke YouTube with the introduction of custom elements, as
  custom elements use microtasks to upgrade elements and call
  callbacks. See https://github.com/whatwg/html/issues/8646 for a full
  example reduced from YouTube's JavaScript.
  
  Another potential fix for this issue is to remove the above steps
  from "spin the event loop". However, since we have another issue with
  the use of "spin the event loop", it would be best to just avoid
  both calls to it.

Considering all of the above, removing the call to "the end" is the way
forward for HTML fragment parsing, as all of it should be a no-op.

This is done by not simply returning from "the end" if the HTML parser
was created for HTML fragment parsing.

The end.
2023-04-11 21:32:30 +02:00
Kenneth Myhra
cbefab21be LibWeb: Port fire_a_page_transition_event() to new FlyString 2023-04-09 17:27:27 +02:00