1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-25 00:27:35 +00:00

LibRegex: Generate a search tree when patterns would benefit from it

This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
    - a
    | - b
    | | - c
    | | | - <accept>
    | | - <accept>
    | - c
    | | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.

This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
This commit is contained in:
Ali Mohammad Pur 2023-07-28 21:02:34 +03:30 committed by Andreas Kling
parent 18f4b6c670
commit 4e69eb89e8
4 changed files with 347 additions and 152 deletions

View file

@ -1047,6 +1047,8 @@ TEST_CASE(optimizer_alternation)
Array tests {
// Pattern, Subject, Expected length
Tuple { "a|"sv, "a"sv, 1u },
Tuple { "a|a|a|a|a|a|a|a|a|b"sv, "a"sv, 1u },
Tuple { "ab|ac|ad|bc"sv, "bc"sv, 2u },
};
for (auto& test : tests) {