1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-24 17:22:31 +00:00
Commit graph

23 commits

Author SHA1 Message Date
Ali Mohammad Pur
7f530c0753 LibRegex: Bail out of atomic rewrite if a block doesn't contain compares
If a block jumps before performing a compare, we'd need to recursively
find the first of the jumped-to block. While this is doable, it's not
really worth spending the time as most such cases won't actually qualify
for atomic loop rewrite anyway.
Fixes an invalid rewrite when `.+` is followed by an alternation, e.g.
/.+(a|b|c)/.
2023-02-15 10:14:26 +01:00
Ali Mohammad Pur
af441bb939 LibRegex: Consider the inverse=true case when finding pattern overlap
Previously we were only checking for overlap when the range wasn't in
inverse mode, which made us miss things like /[^x]x/; this patch makes
it so we don't miss that.
2023-02-15 10:14:26 +01:00
Ben Wiederhake
8a331d4fa0 Everywhere: Move AK/Debug.h include to using files or remove 2023-01-02 20:27:20 -05:00
Ali Mohammad Pur
598dc74a76 LibRegex: Partially implement the ECMAScript unicodeSets proposal
This skips the new string unicode properties additions, along with \q{}.
2022-07-20 21:25:59 +01:00
Ali Mohammad Pur
fe46b2c141 LibRegex: Correctly track current inversion state in the optimizer
This is currently not important as we do not nest TemporaryInverse.
2022-07-10 14:26:03 +02:00
Ali Mohammad Pur
9c5febe800 LibRegex: Flush compare tables before entering a permanent inverse state 2022-07-10 14:26:03 +02:00
Ali Mohammad Pur
7d01ee63d6 LibRegex: Use proper CharRange constructor instead of bit_casting
Otherwise the range order would be inverted.
2022-07-05 07:19:13 +02:00
Ali Mohammad Pur
6e655b7f89 LibRegex: Fully interpret the Compare Op when looking for overlaps
We had a really naive and simplistic implementation, which lead to
various issues where the optimiser incorrectly rewrote the regex to use
atomic groups; this commit fixes that.
2022-07-04 23:09:53 +02:00
Ali Mohammad Pur
97a333608e LibRegex: Make codegen+optimisation for alternatives much faster
Just a little thinking outside the box, and we can now parse and
optimise a million copies of "a|" chained together in just a second :^)
2022-02-20 11:53:59 +01:00
Ali Mohammad Pur
3b0943d24c LibRegex: Correct the alternative matching order when one is empty
Previously we were compiling `/a|/` into what effectively would be
`/|a`, which is clearly incorrect.
2022-02-14 11:30:50 +01:00
Ali Mohammad Pur
6a4c8a66ae LibRegex: Only skip full instructions when optimizing alternations
It makes no sense to skip half of an instruction, so make sure to skip
only full instructions!
2022-02-09 21:02:24 +00:00
Ali Mohammad Pur
cd83325c7c LibRegex: Preserve capture groups and matches across ForkReplace
This makes the (flawed) ForkStay inserted as a loop header unnecessary,
and finally fixes LibRegex rewriting weird loops in weird ways.
2022-01-22 00:35:49 +00:00
Ali Mohammad Pur
bfe8f312f3 LibRegex: Correct jump offset to the start of the loop block
Previously we were jumping to the new end of the previous block (created
by the newly inserted ForkStay), correct the offset to jump to the
correct block as shown in the comments.
Fixes #12033.
2022-01-21 18:14:08 +03:30
Hendiadyoin1
b674de6957 LibRegex: Add some implied auto qualifiers 2021-12-21 18:17:28 -08:00
Ali Mohammad Pur
b8f03bb072 LibRegex: Make append_alternation() significantly faster
...by flattening the underlying bytecode chunks first.
Also avoid calling DisjointChunks::size() inside a loop.
This is a very significant improvement in performance, making the
compilation of a large regex with lots of alternatives take only ~100ms
instead of many minutes (I ran out of patience waiting for it) :^)
2021-12-21 22:10:07 +01:00
Ali Mohammad Pur
d2e51fafa9 LibRegex: Merge alternations based on blocks and not instructions
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes #11247.

Also adds the minimal test case(s) from that issue.
2021-12-15 19:36:45 +03:30
Ali Mohammad Pur
387df06385 LibRegex: Avoid rewriting a+ as a* as part of atomic rewriting
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes #10952.
2021-11-18 09:09:22 +01:00
Ali Mohammad Pur
ac856cb965 LibRegex: Don't ignore empty alternatives in append_alternation()
Doing so would cause patterns like `(a|)` to not match the empty string.
2021-10-29 15:57:59 +02:00
Ali Mohammad Pur
8f722302d9 LibRegex: Use a match table for character classes
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
2021-10-03 19:16:36 +02:00
Andreas Kling
2758d99bbc LibRegex: Flatten bytecode before performing optimizations
This avoids doing DisjointChunks traversal for every bytecode access,
significantly reducing startup time for large regular expressions.
2021-09-29 18:45:26 +02:00
Ali Mohammad Pur
741886a4c4 LibRegex: Make the optimizer understand references and capture groups
Otherwise the fork in patterns like `(1+)\1` would be (incorrectly)
optimized away.
2021-09-15 15:52:28 +04:30
Ali Mohammad Pur
bf0315ff8f LibRegex: Avoid excessive Vector copy when compiling regexps
Previously we would've copied the bytecode instead of moving the chunks
around, use the fancy new DisjointChunks<T> abstraction to make that
happen automagically.
This decreases vector copies and uses of memmove() by nearly 10x :^)
2021-09-14 21:33:15 +04:30
Ali Mohammad Pur
246ab432ff LibRegex: Add a basic optimization pass
This currently tries to convert forking loops to atomic groups, and
unify the left side of alternations.
2021-09-13 14:38:53 +04:30