mirror of
https://github.com/RGBCube/serenity
synced 2025-05-31 15:48:12 +00:00
LibWeb: Change HTMLToken storage architecture
This completely changes how HTMLTokens store their data. Previously, space was allocated for all token types separately. Now, the HTMLToken's data is stored in just a String, two booleans and a Variant. This change reduces sizeof(HTMLToken) from 68 to 32. Also, this reduces raw tokenization time by around 20 to 50 percent, depending on the page. Full document parsing time (with HTMLDocumentParser, on a local HTML page without any dependency files) is reduced by between 4 and 20 percent, depending on the page. Since tokenizing HTML pages can easily generated 50'000 tokens and more, the storage has been designed in a way that avoids heap allocations where possible, while trying to reduce the size of the tokens. The only tokens which need to allocate on the heap are thus DOCTYPE tokens (max. 1 per document), and tag tokens (but only if they have attributes). This way, only around 5 percent of all tokens generated need to allocate on the heap (except for StringImpl allocations).
This commit is contained in:
parent
8a4c44db8c
commit
519a1cdc22
2 changed files with 93 additions and 48 deletions
|
@ -52,9 +52,15 @@ String HTMLToken::to_string() const
|
|||
builder.append("} }");
|
||||
}
|
||||
|
||||
if (type() == HTMLToken::Type::Comment || type() == HTMLToken::Type::Character) {
|
||||
if (is_comment()) {
|
||||
builder.append(" { data: '");
|
||||
builder.append(m_comment_or_character.data);
|
||||
builder.append(comment());
|
||||
builder.append("' }");
|
||||
}
|
||||
|
||||
if (is_character()) {
|
||||
builder.append(" { data: '");
|
||||
builder.append_code_point(code_point());
|
||||
builder.append("' }");
|
||||
}
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue