1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-18 10:05:07 +00:00
Commit graph

19 commits

Author SHA1 Message Date
Timothy Flynn
9509433e25 LibRegex: Implement and use a REPEAT operation for bytecode repetition
Currently, when we need to repeat an instruction N times, we simply add
that instruction N times in a for-loop. This doesn't scale well with
extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1.

Instead, add a new REPEAT bytecode operation to defer this loop from the
parser to the runtime executor. This allows the parser to complete sans
any loops (for this instruction), and allows the executor to bail early
if the repeated bytecode fails.

Note: The templated ByteCode methods are to allow the Posix parsers to
continue using u32 because they are limited to N = 2^20.
2021-08-15 11:43:45 +01:00
Timothy Flynn
f1ce998d73 LibRegex+LibJS: Combine named and unnamed capture groups in MatchState
Combining these into one list helps reduce the size of MatchState, and
as a result, reduces the amount of memory consumed during execution of
very large regex matches.

Doing this also allows us to remove a few regex byte code instructions:
ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference.
Named groups now behave the same as unnamed groups for these operations.
Note that SaveRightNamedCaptureGroup still exists to cache the matched
group name.

This also removes the recursion level from the MatchState, as it can
exist as a local variable in Matcher::execute instead.
2021-08-15 11:43:45 +01:00
Timothy Flynn
2e4b6fd1ac LibRegex: Ensure escaped code points are exactly 4 digits in length 2021-08-15 11:43:45 +01:00
Ali Mohammad Pur
15f95220ae AK+Everywhere: Delete Variant's default constructor
This was exposed to the user by mistake, and even accumulated a bunch of
users that didn't blow up out of sheer luck.
2021-08-13 17:31:39 +04:30
Timothy Flynn
df14d11a11 LibRegex: Disallow invalid interval qualifiers in Unicode mode
Fixes all remaining 'built-ins/RegExp/property-escapes' test262 tests.
2021-08-11 13:11:01 +02:00
Timothy Flynn
484ccfadc3 LibRegex: Support property escapes of Unicode script extensions 2021-08-04 13:50:32 +01:00
Timothy Flynn
06088df729 LibRegex: Support property escapes of the Unicode script property
Note that unlike binary properties and general categories, scripts must
be specified in the non-binary (Script=Value) form.
2021-08-04 13:50:32 +01:00
Timothy Flynn
1e10d6d7ce LibRegex: Support property escapes of Unicode General Categories
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.
2021-08-02 21:02:09 +04:30
Timothy Flynn
d485cf29d7 LibRegex+LibUnicode: Begin implementing Unicode property escapes
This supports some binary property matching. It does not support any
properties not yet parsed by LibUnicode, nor does it support value
matching (such as Script_Extensions=Latin).
2021-07-30 21:26:31 +01:00
Ali Mohammad Pur
36bfc912fc LibRegex: Switch to east-const style 2021-07-23 21:19:21 +04:30
Ali Mohammad Pur
c8b2199251 LibRegex: Clear previous capture group contents in ECMA262 mode
ECMA262 requires that the capture groups only contain the values from
the last iteration, e.g. `((c)(a)?(b))` should _not_ contain 'a' in the
second capture group when matching "cabcb".
2021-07-23 21:19:21 +04:30
Ali Mohammad Pur
11a8476cf4 LibRegex: Use the parser state capture group count in BRE
Otherwise the users won't know how many capture groups are in the
parsed regular expression.
2021-07-10 23:14:08 +04:30
Ali Mohammad Pur
54d89609de LibRegex: Add support for the Basic POSIX regular expressions
This implements the internal regex stuff for #8506.
2021-07-10 13:33:08 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
AnotherTest
c128b3fd91 LibRegex: Remove 'ReadDigitFollowPolicy' as it's no longer needed
Thanks to @GMTA: 1b071455b1 (r49343474)
2021-04-10 12:10:45 +02:00
Jelle Raaijmakers
db321db5f4 LibRegex: Parse \0 as a zero-byte instead of 0x30 ("0")
This was causing some regexes to trip up. Fixes #6202.
2021-04-09 21:53:14 +02:00
AnotherTest
6bbb26fdaf LibRegex: Allow references to capture groups that aren't parsed yet
This only applies to the ECMA262 parser.
This behaviour is an ECMA262-specific quirk, such references always
generate zero-length matches (even on subsequent passes).
Also adds a test in LibJS's test suite.

Fixes #6039.
2021-04-01 21:55:47 +02:00
AnotherTest
f05e518cbc LibRegex: Implement section B.1.4. of the ECMA262 spec
This allows the parser to deal with crazy patterns like the one
in #5517.
2021-02-27 07:31:01 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Renamed from Libraries/LibRegex/RegexParser.h (Browse further)