1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-24 00:42:07 +00:00
Commit graph

104 commits

Author SHA1 Message Date
Linus Groh
1422bd45eb LibWeb: Move Window from DOM directory & namespace to HTML
The Window object is part of the HTML spec. :^)
https://html.spec.whatwg.org/multipage/window-object.html
2022-03-08 00:30:30 +01:00
Andreas Kling
1061c863f8 LibWeb: Fix issue where double-quoted doctype system ID was not captured
We were storing double-quoted system ID's in the public ID field.

1% progression on ACID3. :^)
2022-03-02 12:30:15 +01:00
Luke Wilde
46c0d0f7ae LibWeb: Associate form elements with a form in parsing and dynamically
This makes it available for all form associated elements and not just
select and input elements. It also makes it more spec compliant,
especially around the form attribute.

The main thing missing is re-associating form elements with a form
attribute when the form attribute changes or an element with an ID
is inserted/removed or has its ID changed.
2022-03-01 23:19:41 +01:00
Andreas Kling
8b2499b112 LibWeb: Make document.write() work while document is parsing
This necessitated making HTMLParser ref-counted, and having it register
itself with Document when created. That makes it possible for scripts to
add new input at the current parser insertion point.

There is now a reference cycle between Document and HTMLParser. This
cycle is explicitly broken by calling Document::detach_parser() at the
end of HTMLParser::run().

This is a huge progression on ACID3, from 31% to 49%! :^)
2022-02-21 22:00:28 +01:00
Lorenz Steinert
db789813c9 LibWeb: Add basic support for dynamic markup insertion
This implements basic support for dynamic markup insertion, adding
 * Document::open()
 * Document::write(Vector<String> const&)
 * Document::writeln(Vector<String> const&)
 * Document::close()

The HTMLParser is modified to make it possible to create a
script-created parser which initially only contains a HTMLTokenizer
without any data. Aditionally the HTMLParser::run method gains an
overload which does not modify the Document and does not run
HTMLParser::the_end() so that we can reenter the parser at a later time.
Furthermore all FIXMEs that consern the insertion point are implemented
wich is defined in the HTMLTokenizer. Additionally the following
member-variables of the HTMLParser are now exposed by getter funcions:
 * m_tokenizer
 * m_aborted
 * m_script_nesting_level

The HTMLTokenizer is modified so that it contains an insertion
point which keeps track of where the next input from the Document::write
functions will be inserted. The insertion point is implemented as the
charakter offset into m_decoded_input and a boolean describing if the
insertion point is defined. Functions to update, check and {re}store the
insertion point are also added.
The function HTMLTokenizer::insert_eof is added to tell a script-created
parser that document::close was called and HTMLParser::the_end() should
be called.
Lastly an explicit default constructor is added to HTMLTokenizer to
create a empty HTMLTokenizer into which data can be inserted.
2022-02-21 18:26:43 +01:00
Adam Hodgen
b6eaefa87d LibWeb: Fix 'Comment end state' in HTML Tokenizer
Also, update the expected hash in the LibWeb TestHTMLTokenizer
regression test.

This is due to the "This comment has a few too many dashes." comment
token being updated.
2022-02-21 16:31:45 +01:00
Adam Hodgen
d73bb2633c LibWeb: Implement tokenization newline preprocessing
Newline normalization will replace \r and \r\n with \n.

The spec specifically states
> Before the tokenization stage, the input stream must be preprocessed
> by normalizing newlines.
wheras this is implemented the processing during the tokenization
itself.

This should still exhibit the same behaviour, while keeping the
tokenization logic in the same place.
2022-02-21 16:31:45 +01:00
Adam Hodgen
c6fcdd0f93 LibWeb: Fix off by one error in HTML Tokenizer
In 'NamedCharacterReference' we attempt to lookup the code point by a
identifier, eg apos; becomes '

This is done by passing the entire rest of the document to the
`HTML::code_points_from_entity` function.

However, before this change we didn't sent the final character which
meant if the document ended in a named character reference the lookup
would fail.
2022-02-21 16:31:45 +01:00
Luke Wilde
9845164f6a LibWeb: Handle markers when reconstructing active formatting elements
The entry we get from the active formatting elements list during the
Rewind step of "reconstruct the active formatting elements" can be a
marker. Previously we assumed it was not a marker, which can trigger
an assertion failure with certain malformed HTML.

If the entry in this step is a marker, the spec simply ignores it.
This is step 6 of the algorithm.

This also makes the index unsigned, as this algorithm is a no-op if
the list is empty.

Additionally, this also adds spec comments to this algorithm.

Fixes #12668.
2022-02-20 10:59:42 +01:00
Andreas Kling
25504f6a1b LibWeb: Use Vector::clear_with_capacity() in HTMLTokenizer
This avoids constantly reallocating the Vector<HTMLToken>.
2022-02-19 14:45:59 +01:00
Linus Groh
06948df393 LibWeb: Fail gracefully when reaching the unimplemented part of the AAA
Pages such as https://html5test.com are testing all sorts of weird,
incomplete, and wrong HTML but can be useful or at least interesting for
development - let's try to avoid crashing the process.
2022-02-15 23:24:34 +01:00
Linus Groh
892f6394b8 LibWeb: Implement state switch for "[CDATA[" in HTML parser 2022-02-15 23:24:34 +01:00
Linus Groh
3f7086f91a LibWeb: Add an optional pointer to an HTMLParser to the HTMLTokenizer
This is needed to access the 'adjusted current node' in the 'Markup
declaration open state'. We don't want to create a full parser for
something like syntax highlighting, so it's optional (null) by default.
2022-02-15 23:24:34 +01:00
Linus Groh
9130ecfd5e LibWeb: Remove unused HTMLParser function declaration
There is no implementation of this function:
HTMLParser::stack_of_open_elements_has_element_with_tag_name_in_scope
2022-02-15 23:24:34 +01:00
Linus Groh
f61fb08492 LibWeb: Add spec links to each HTML tokenizer state section
I didn't add full spec comments this time, but this is better than
nothing :^)
2022-02-15 23:24:34 +01:00
Andreas Kling
1347c5032b LibWeb: Add spec comments to the StackOfOpenElements class 2022-02-15 02:05:53 +01:00
Andreas Kling
5cdbea4ae0 LibWeb: Rename element_before() => element_immediately_above()
This matches the spec terminology around the "stack of open elements".
2022-02-15 02:05:53 +01:00
Andreas Kling
6fe333607d LibWeb: Add spec comments to find_appropriate_place_for_inserting_node() 2022-02-15 02:05:53 +01:00
Karol Kosek
c157c2148f LibWeb: Don't emit current token on EOF in HTML Tokenizer
Emitting tokens on EOF caused an infinite loop, freezing the app, which
could be a bit annoying when writing an HTML comment at the end of
the file in Text Editor. :^)
2022-02-14 12:50:44 +03:30
Karol Kosek
fb5e2670d6 LibWeb: Fix highlighting HTML comments
Commit b193351a99 caused the HTML comments to flash when changing
the text cursor. Also, when double-clicking on a comment, the selection
started from the beginning of the file instead.

The following message was displaying when `TOKENIZER_TRACE_DEBUG`
was enabled:

    (Tokenizer::nth_last_position) Invalid position requested: 4th-last
    of 4. Returning (0-0).

Changing the `nth_last_position` to 3 fixes this. I'm guessing that's
because the parser is at that moment on the second hyphen of the `<!--`
string, so it has to go back only by three characters.
2022-02-14 12:50:44 +03:30
MacDue
b193351a99 LibWeb: Fix off-by-one in HTMLTokenizer::restore_to()
The difference should be between m_utf8_iterator and the
the new position, if m_prev_utf8_iterator is used one fewer
source position is popped than required.

This issue was not apparent on most pages since restore_to
used for tokens such  <!doctype> that are normally
followed by a newline that resets the column to zero,
but it can be seen on pages with minified HTML.
2022-02-13 14:51:09 +00:00
Luke Wilde
f71f404e0c LibWeb: Introduce the Environment Settings Object
The environment settings object is effectively the context a piece of
script is running under, for example, it contains the origin,
responsible document, realm, global object and event loop for the
current context. This effectively replaces ScriptExecutionContext, but
it cannot be removed in this commit as EventTarget still depends on it.

https://html.spec.whatwg.org/multipage/webappapis.html#environment-settings-object
2022-02-08 17:47:44 +00:00
Sam Atkins
197759e30f LibWeb: Fix off-by-one error when highlighting unquoted HTML attributes
This fixes #11166
2021-12-10 21:27:13 +01:00
Sam Atkins
7196570f9b LibWeb: Cast unused smart-pointer return values to void 2021-12-05 15:31:03 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Timothy Flynn
e01dfaac9a LibWeb: Implement Attribute closer to the spec and with an IDL file
Note our Attribute class is what the spec refers to as just "Attr". The
main differences between the existing implementation and the spec are
just that the spec defines more fields.

Attributes can contain namespace URIs and prefixes. However, note that
these are not parsed in HTML documents unless the document content-type
is XML. So for now, these are initialized to null. Web pages are able to
set the namespace via JavaScript (setAttributeNS), so these fields may
be filled in when the corresponding APIs are implemented.

The main change to be aware of is that an attribute is a node. This has
implications on how attributes are stored in the Element class. Nodes
are non-copyable and non-movable because these constructors are deleted
by the EventTarget base class. This means attributes cannot be stored in
a Vector or HashMap as these containers assume copyability / movability.
So for now, the Vector holding attributes is changed to hold RefPtrs to
attributes instead. This might change when attribute storage is
implemented according to the spec (by way of NamedNodeMap).
2021-10-17 13:51:10 +01:00
Brian Gianforcaro
7defb893a9 LibWeb: Remove dead "outer loop" code in adoption agency algorithm 2021-10-10 13:48:04 +02:00
Luke Wilde
c0a64f7317 LibWeb: Check for HTML integration points in the tree constructor
This particularly implements these two points:
- "If the adjusted current node is an HTML integration point and the
   token is a start tag"
- "If the adjusted current node is an HTML integration point and the
   token is a character token"

This also adds spec comments to the tree constructor.
2021-10-01 12:26:41 +02:00
Andreas Kling
831fdcaabc LibWeb: Add the PageTransitionEvent interface and fire "pageshow" events
We now fire "pageshow" events at the appropriate time during document
loading (done by the parser.)

Note that there are no corresponding "pagehide" events yet.
2021-09-26 12:47:51 +02:00
Andreas Kling
508edcd217 LibWeb: Add a "page showing" flag to documents
This will be used to determine whether "pageshow" and "pagehide" events
are appropriate. We won't actually make use of it until we implement
more of history traversal and document unloading.
2021-09-26 12:47:51 +02:00
Andreas Kling
a2f77a2e39 LibWeb: Implement "update the current document readiness" from spec
The only difference from what we were already doing is that setting the
same ready state twice no longer fires a "readystatechange" event.
I don't think that could happen in practice though.
2021-09-26 12:47:51 +02:00
Andreas Kling
8496024756 LibWeb: Store HTML document ready state as an enum 2021-09-26 12:47:51 +02:00
Andreas Kling
dbba0a520f LibWeb: Allow HTML parser to delay delivery of the document "load" event
We will now spin in "the end" until there are no more "things delaying
the load event". Of course, nothing actually uses this yet, and there
are a lot of things that need to.
2021-09-26 02:00:00 +02:00
Andreas Kling
e7af6af626 LibWeb: Implement more of HTMLParser::the_end() and bring closer to spec 2021-09-26 00:52:19 +02:00
Andreas Kling
e452550fda LibWeb: Split out "The end" from the HTML parsing spec to a function
Also add a spec link and some comments.
2021-09-26 00:04:33 +02:00
Andreas Kling
f67648f872 LibWeb: Rename HTMLDocumentParser => HTMLParser 2021-09-25 23:36:43 +02:00
Ben Wiederhake
32e98d0924 Libraries: Use AK::Variant default initialization where appropriate 2021-09-21 04:22:52 +04:30
Andreas Kling
c34da16089 LibWeb: Make <script src> loads partially async (by following the spec)
Instead of firing up a network request and synchronously blocking for it
to finish via a nested event loop, we now start an asynchronous request
when encountering <script src>.

Once the script load finishes (or fails), it gets executed at one of the
synchronization points in the HTML parser.

This solves some long-standing issues with random unexpected events
getting dispatched in the middle of parsing.
2021-09-20 17:22:25 +02:00
Andreas Kling
e11ae33c66 LibWeb: Pop entire stack of open elements at the end of parsing 2021-09-20 17:22:25 +02:00
Andreas Kling
cb895edad4 LibWeb: Move Attribute into the DOM namespace 2021-09-16 01:39:47 +02:00
Andreas Kling
70398645f3 LibWeb: Improvements to error handling in HTML foreign content parsing
Follow the spec more closely when encountering an invalid start or end
tag during foreign content parsing.
2021-09-14 23:49:45 +02:00
Luke Wilde
f62477c093 LibWeb: Implement HTML fragment serialisation and use it in innerHTML
The previous implementation was about a half implementation and was
tied to Element::innerHTML. This separates it and puts it into
HTMLDocumentParser, as this is in the parsing section of the spec.

This provides a near finished HTML fragment serialisation algorithm,
bar namespaces in attributes and the `is` value.
2021-09-14 02:09:18 +02:00
Idan Horowitz
4629f2e4ad LibWeb: Add the Web::URL namespace and move URLEncoder to it
This namespace will be used for all interfaces defined in the URL
specification, like URL and URLSearchParams.

This has the unfortunate side-effect of requiring us to use the fully
qualified AK::URL name whenever we want to refer to the AK class, so
this commit also fixes all such references.
2021-09-13 01:43:10 +02:00
Andreas Kling
882c7b1295 LibWeb: Spin the event loop in HTML parser until scripts can run
Call HTML::EventLoop::spin_until() from the HTML parser when deciding
whether we can run a script yet.

Note that spin_until() actually doesn't do any work yet.
2021-09-09 02:30:54 +02:00
TheFightingCatfish
08359ba578 LibWeb: Fix regression of "contenteditable" attribute 2021-07-31 17:39:28 +02:00
ovf
898b8ffcb6 LibWeb: Avoid assertion failure on parsing numeric character references 2021-07-28 18:32:22 +02:00
ovf
13c7d55320 LibWeb: Fix parsing of character references in attribute values 2021-07-27 00:03:43 +02:00
Max Wipfli
ccae0cae45 LibWeb: Rename HTMLToken::doctype_data() => ensure_doctype_data()
This renames the accessor to better reflect what it does, as this will
allocate a DoctypeData struct if there is none.
2021-07-17 16:24:57 +04:30
Max Wipfli
519a1cdc22 LibWeb: Change HTMLToken storage architecture
This completely changes how HTMLTokens store their data. Previously,
space was allocated for all token types separately. Now, the HTMLToken's
data is stored in just a String, two booleans and a Variant.

This change reduces sizeof(HTMLToken) from 68 to 32. Also, this reduces
raw tokenization time by around 20 to 50 percent, depending on the page.
Full document parsing time (with HTMLDocumentParser, on a local HTML
page without any dependency files) is reduced by between 4 and 20
percent, depending on the page.

Since tokenizing HTML pages can easily generated 50'000 tokens and more,
the storage has been designed in a way that avoids heap allocations
where possible, while trying to reduce the size of the tokens. The only
tokens which need to allocate on the heap are thus DOCTYPE tokens (max.
1 per document), and tag tokens (but only if they have attributes). This
way, only around 5 percent of all tokens generated need to allocate on
the heap (except for StringImpl allocations).
2021-07-17 16:24:57 +04:30
Max Wipfli
8a4c44db8c LibWeb: Make HTMLTokens non-copyable 2021-07-17 16:24:57 +04:30