LibJS: Share "parsed identifiers" between copied JS::Lexer instances

When we save/load state in the parser, we preserve the lexer state by simply making a copy of it. This was made extremely heavy by the lexer keeping a cache of all parsed identifiers. It keeps the cache to ensure that StringViews into parsed Unicode escape sequences don't become dangling views when the Token goes out of scope. This patch solves the problem by replacing the Vector<FlyString> which was used to cache the identifiers with a ref-counted HashTable<FlyString> instead. Since the purpose of the cache is just to keep FlyStrings alive, it's fine for all Lexer instances to share the cache. And as a bonus, using a HashTable instead of a Vector replaces the O(n) accesses with O(1) ones. This makes a 1.9 MiB JavaScript file parse in 0.6s instead of 24s. :^)
2025-07-27 23:07:35 +00:00 · 2021-09-10 23:18:00 +02:00 · 2021-09-10 23:18:00 +02:00 · d7578ddebb
commit d7578ddebb
parent 7684e4f726
2 changed files with 9 additions and 5 deletions
--- a/Userland/Libraries/LibJS/Lexer.h
+++ b/Userland/Libraries/LibJS/Lexer.h
@ -82,9 +82,13 @@ private:
    static HashMap<String, TokenType> s_two_char_tokens;
    static HashMap<char, TokenType> s_single_char_tokens;

-    // Resolved identifiers must be kept alive for the duration of the parsing stage, otherwise
-    // the only references to these strings are deleted by the Token destructor.
-    Vector<FlyString> m_parsed_identifiers;
+    struct ParsedIdentifiers : public RefCounted<ParsedIdentifiers> {
+        // Resolved identifiers must be kept alive for the duration of the parsing stage, otherwise
+        // the only references to these strings are deleted by the Token destructor.
+        HashTable<FlyString> identifiers;
+    };
+
+    RefPtr<ParsedIdentifiers> m_parsed_identifiers;
 };

 }