1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-15 10:04:59 +00:00
Commit graph

10 commits

Author SHA1 Message Date
Hendiadyoin1
b674de6957 LibRegex: Add some implied auto qualifiers 2021-12-21 18:17:28 -08:00
Ali Mohammad Pur
b8f03bb072 LibRegex: Make append_alternation() significantly faster
...by flattening the underlying bytecode chunks first.
Also avoid calling DisjointChunks::size() inside a loop.
This is a very significant improvement in performance, making the
compilation of a large regex with lots of alternatives take only ~100ms
instead of many minutes (I ran out of patience waiting for it) :^)
2021-12-21 22:10:07 +01:00
Ali Mohammad Pur
d2e51fafa9 LibRegex: Merge alternations based on blocks and not instructions
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes #11247.

Also adds the minimal test case(s) from that issue.
2021-12-15 19:36:45 +03:30
Ali Mohammad Pur
387df06385 LibRegex: Avoid rewriting a+ as a* as part of atomic rewriting
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes #10952.
2021-11-18 09:09:22 +01:00
Ali Mohammad Pur
ac856cb965 LibRegex: Don't ignore empty alternatives in append_alternation()
Doing so would cause patterns like `(a|)` to not match the empty string.
2021-10-29 15:57:59 +02:00
Ali Mohammad Pur
8f722302d9 LibRegex: Use a match table for character classes
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
2021-10-03 19:16:36 +02:00
Andreas Kling
2758d99bbc LibRegex: Flatten bytecode before performing optimizations
This avoids doing DisjointChunks traversal for every bytecode access,
significantly reducing startup time for large regular expressions.
2021-09-29 18:45:26 +02:00
Ali Mohammad Pur
741886a4c4 LibRegex: Make the optimizer understand references and capture groups
Otherwise the fork in patterns like `(1+)\1` would be (incorrectly)
optimized away.
2021-09-15 15:52:28 +04:30
Ali Mohammad Pur
bf0315ff8f LibRegex: Avoid excessive Vector copy when compiling regexps
Previously we would've copied the bytecode instead of moving the chunks
around, use the fancy new DisjointChunks<T> abstraction to make that
happen automagically.
This decreases vector copies and uses of memmove() by nearly 10x :^)
2021-09-14 21:33:15 +04:30
Ali Mohammad Pur
246ab432ff LibRegex: Add a basic optimization pass
This currently tries to convert forking loops to atomic groups, and
unify the left side of alternations.
2021-09-13 14:38:53 +04:30