LibRegex: Implement and use a REPEAT operation for bytecode repetition

Currently, when we need to repeat an instruction N times, we simply add that instruction N times in a for-loop. This doesn't scale well with extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1. Instead, add a new REPEAT bytecode operation to defer this loop from the parser to the runtime executor. This allows the parser to complete sans any loops (for this instruction), and allows the executor to bail early if the repeated bytecode fails. Note: The templated ByteCode methods are to allow the Posix parsers to continue using u32 because they are limited to N = 2^20.
2025-10-15 16:32:06 +00:00 · 2021-08-12 11:02:46 -04:00 · 2021-08-12 11:02:46 -04:00 · 9509433e25
commit 9509433e25
parent a0b72f5ad3
7 changed files with 103 additions and 16 deletions
--- a/Userland/Libraries/LibRegex/RegexByteCode.cpp
+++ b/Userland/Libraries/LibRegex/RegexByteCode.cpp
@ -175,6 +175,9 @@ void ByteCode::ensure_opcodes_initialized()
        case OpCodeId::SaveRightNamedCaptureGroup:
            s_opcodes[i] = make<OpCode_SaveRightNamedCaptureGroup>();
            break;
+        case OpCodeId::Repeat:
+            s_opcodes[i] = make<OpCode_Repeat>();
+            break;
        }
    }
    s_opcodes_initialized = true;
@ -850,4 +853,23 @@ Vector<String> const OpCode_Compare::variable_arguments_to_string(Optional<Match
    }
    return result;
 }
+
+ALWAYS_INLINE ExecutionResult OpCode_Repeat::execute(MatchInput const&, MatchState& state) const
+{
+    VERIFY(count() > 0);
+
+    if (id() >= state.repetition_marks.size())
+        state.repetition_marks.resize(id() + 1);
+    auto& repetition_mark = state.repetition_marks.at(id());
+
+    if (repetition_mark == count() - 1) {
+        repetition_mark = 0;
+    } else {
+        state.instruction_position -= offset() + size();
+        ++repetition_mark;
+    }
+
+    return ExecutionResult::Continue;
+}
+
 }