serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-20 05:22:27 +00:00

Author	SHA1	Message	Date
Ali Mohammad Pur	b8f03bb072	LibRegex: Make append_alternation() significantly faster ...by flattening the underlying bytecode chunks first. Also avoid calling DisjointChunks::size() inside a loop. This is a very significant improvement in performance, making the compilation of a large regex with lots of alternatives take only ~100ms instead of many minutes (I ran out of patience waiting for it) :^)	2021-12-21 22:10:07 +01:00
Ali Mohammad Pur	d2e51fafa9	LibRegex: Merge alternations based on blocks and not instructions The instructions can have dependencies (e.g. Repeat), so only unify equal blocks instead of consecutive instructions. Fixes #11247. Also adds the minimal test case(s) from that issue.	2021-12-15 19:36:45 +03:30
Ali Mohammad Pur	387df06385	LibRegex: Avoid rewriting `a+` as `a` as part of atomic rewriting The initial `ForkStay` is only needed if the looping block has a following block, if there's no following block or the following block does not attempt to match anything, we should not insert the ForkStay, otherwise we would be rewriting `a+` as `a` by allowing the 'end' to be executed. Fixes #10952.	2021-11-18 09:09:22 +01:00
Ali Mohammad Pur	ac856cb965	LibRegex: Don't ignore empty alternatives in append_alternation() Doing so would cause patterns like `(a\|)` to not match the empty string.	2021-10-29 15:57:59 +02:00
Ali Mohammad Pur	8f722302d9	LibRegex: Use a match table for character classes Generate a sorted, compressed series of ranges in a match table for character classes, and use a binary search to find the matches. This is about a 3-4x speedup for character class match performance. :^)	2021-10-03 19:16:36 +02:00
Andreas Kling	2758d99bbc	LibRegex: Flatten bytecode before performing optimizations This avoids doing DisjointChunks traversal for every bytecode access, significantly reducing startup time for large regular expressions.	2021-09-29 18:45:26 +02:00
Ali Mohammad Pur	741886a4c4	LibRegex: Make the optimizer understand references and capture groups Otherwise the fork in patterns like `(1+)\1` would be (incorrectly) optimized away.	2021-09-15 15:52:28 +04:30
Ali Mohammad Pur	bf0315ff8f	LibRegex: Avoid excessive Vector copy when compiling regexps Previously we would've copied the bytecode instead of moving the chunks around, use the fancy new DisjointChunks<T> abstraction to make that happen automagically. This decreases vector copies and uses of memmove() by nearly 10x :^)	2021-09-14 21:33:15 +04:30
Ali Mohammad Pur	246ab432ff	LibRegex: Add a basic optimization pass This currently tries to convert forking loops to atomic groups, and unify the left side of alternations.	2021-09-13 14:38:53 +04:30

9 commits