1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-16 19:35:08 +00:00
Commit graph

67 commits

Author SHA1 Message Date
Ali Mohammad Pur
11a8476cf4 LibRegex: Use the parser state capture group count in BRE
Otherwise the users won't know how many capture groups are in the
parsed regular expression.
2021-07-10 23:14:08 +04:30
Ali Mohammad Pur
1c584e9d80 LibRegex: Correctly parse BRE bracket expressions
Commonly, bracket expressions are in fact, enclosed in brackets.
2021-07-10 22:58:24 +04:30
Ali Mohammad Pur
daa6d99e6e LibRegex: Add support for non-extended regular expressions in regcomp()
Fixes part of #8506.
2021-07-10 13:33:08 +02:00
Ali Mohammad Pur
54d89609de LibRegex: Add support for the Basic POSIX regular expressions
This implements the internal regex stuff for #8506.
2021-07-10 13:33:08 +02:00
Ali Mohammad Pur
addfa1e82e LibRegex: Make the bytecode transformation functions static
They were pretty confusing when compared with other non-transforming
functions.
2021-07-10 13:33:08 +02:00
Timothy Flynn
0f0ac37b56 LibRegex: Break from execution loop when the sticky flag is set
If the sticky flag is set, the regex execution loop should break
immediately even if the execution was a failure. The specification for
several RegExp.prototype methods (e.g. exec and @@split) rely on this
behavior.
2021-07-09 19:45:55 +01:00
Timothy Flynn
65003241e4 LibRegex: Allow dollar signs in ECMA262 named capture groups
Fixes 1 test262 test.
2021-07-06 22:33:17 +01:00
Andrew Kaster
5e8a0c014e LibRegex: Make regex::Regex move-constructible and move-assignable
For some reason the default move constructor and default move-assign
operator were deleted, so we explicitly default them instead.
2021-06-30 08:18:28 +04:30
Andreas Kling
e59bf87374 Userland: Replace VERIFY(is<T>) with verify_cast<T>
Instead of doing a VERIFY(is<T>(x)) and *then* casting it to T, we can
just do the cast right away with verify_cast<T>. :^)
2021-06-24 21:13:09 +02:00
sin-ack
74d76528d6 LibRegex: Display correct position for Compare in REGEX_DEBUG
When REGEX_DEBUG is enabled, LibRegex dumps a table of information
regarding the state of the regex bytecode execution. The Compare opcode
manipulates state.string_position directly, so the string_position value
cannot be used to display where the comparison started; therefore, this
patch introduces a new variable to keep track of where we were before
the comparison happened.
2021-06-16 16:30:12 +04:30
sin-ack
6b2e264093 LibRegex: Fix incorrect case-sensitive comparisons
A tiny typo was introduced in bc8d16ad which caused all case insensitive
comparisons to fail.
2021-06-16 16:30:12 +04:30
Gunnar Beutner
5bfe601152 LibRegex: Remove unused code 2021-06-14 16:09:58 +04:30
Gunnar Beutner
a167941852 LibRegex: Use a plain pointer for OpCode::m_state 2021-06-14 16:09:58 +04:30
Gunnar Beutner
d3c2a3caea LibRegex: Avoid initialization checks in get_opcode_by_id() 2021-06-14 16:09:58 +04:30
Gunnar Beutner
794dc368f1 LibRegex: Avoid prepending items to vectors 2021-06-14 16:09:58 +04:30
Gunnar Beutner
214410b397 LibRegex: Avoid making unnecessary string copies 2021-06-14 16:09:58 +04:30
Gunnar Beutner
281f39073d LibRegex: Make get_opcode() return a reference
Previously this would return a pointer which could be null if the
requested opcode was invalid. This should never be the case though
so let's VERIFY() that instead.
2021-06-14 16:09:58 +04:30
Gunnar Beutner
cd49fb0229 LibRegex: Remove return value for setters 2021-06-14 16:09:58 +04:30
Gunnar Beutner
1fb4471506 LibRegex: Use a plain array to store opcodes
Using a hash map is unnecessary because the number of opcodes and their
IDs never change.
2021-06-14 16:09:58 +04:30
Gunnar Beutner
d476144565 Userland: Allow building SerenityOS with -funsigned-char
Some of the code assumed that chars were always signed while that is
not the case on ARM hosts.

Also, some of the code tried to use EOF (-1) in a way similar to what
fgetc() does, however instead of storing the characters in an int
variable a char was used.

While this seemed to work it also meant that character 0xFF would be
incorrectly seen as an end-of-file.

Careful reading of fgetc() reveals that fgetc() stores character
data in an int where valid characters are in the range of 0-255 and
the EOF value is explicitly outside of that range (usually -1).
2021-06-13 18:52:58 +02:00
Andreas Kling
dc65f54c06 AK: Rename Vector::append(Vector) => Vector::extend(Vector)
Let's make it a bit more clear when we're appending the elements from
one vector to the end of another vector.
2021-06-12 13:24:45 +02:00
Linus Groh
939da41fa1 LibRegex: Fix compilation errors on my host machine
I have no idea *why*, but this stopped working suddenly:

    return { { .code_point = '-', .is_character_class = false } };

Fails with:

    error: could not convert ‘{{'-', false}}’ from
    ‘<brace-enclosed initializer list>’ to
    ‘AK::Optional<regex::CharClassRangeElement>

Might be related to 66f15c2 somehow, going one past that commit makes
the build work again, however reverting the commit doesn't. Not sure
what's up with that.

Consider this patch a band-aid until we can find the reason and an
actual fix...

Compiler version:
gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3)
2021-06-06 09:26:07 +01:00
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Linus Groh
a5903ac4b6 LibRegex: Hide stray dbgln() behind REGEX_DEBUG 2021-06-02 18:31:43 +01:00
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Linus Groh
dac0554fa0 LibRegex: Replace fprintf()/printf() with warnln()/outln()/dbgln() 2021-05-31 17:43:54 +01:00
Linus Groh
d60ebbbba6 Revert "Userland: static vs non-static constexpr variables"
This reverts commit 800ea8ea96.

Booting the system no longer worked after these changes.
2021-05-21 10:30:52 +01:00
Lenny Maiorani
800ea8ea96 Userland: static vs non-static constexpr variables
Problem:
- `static` variables consume memory and sometimes are less
  optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
  every time the function is run.

Solution:
- If a global `static` variable is only used in a single function then
  move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
  `constexpr`.
2021-05-21 10:07:06 +01:00
Andreas Kling
79ff1902aa LibRegex: Convert StringBuilder::appendf() => AK::Format 2021-05-07 21:12:09 +02:00
Brian Gianforcaro
6e918e4e02 Tests: Move LibRegex tests to Tests/LibRegex 2021-05-06 17:54:28 +02:00
Gunnar Beutner
6cf59b6ae9 Everywhere: Turn #if *_DEBUG into dbgln_if/if constexpr 2021-05-01 21:25:06 +02:00
Brian Gianforcaro
cf0640c870 Build: Remove unused ${REGEX_SOURCES} from the tests CMakeLists.txt 2021-04-29 10:37:26 +02:00
Linus Groh
dbe72fd962 Everywhere: Remove empty line after function body opening curly brace 2021-04-25 20:20:00 +02:00
Andrew Kaster
35c0a6c54d AK+Userland: Move AK/TestSuite.h into LibTest and rework Tests' CMake
As many macros as possible are moved to Macros.h, while the
macros to create a test case are moved to TestCase.h. TestCase is now
the only user-facing header for creating a test case. TestSuite and its
helpers have moved into a .cpp file. Instead of requiring a TEST_MAIN
macro to be instantiated into the test file, a TestMain.cpp file is
provided instead that will be linked against each test. This has the
side effect that, if we wanted to have test cases split across multiple
files, it's as simple as adding them all to the same executable.

The test main should be portable to kernel mode as well, so if
there's a set of tests that should be run in self-test mode in kernel
space, we can accomodate that.

A new serenity_test CMake function streamlines adding a new test with
arguments for the test source file, subdirectory under /usr/Tests to
install the test application and an optional list of libraries to link
against the test application. To accomodate future test where the
provided TestMain.cpp is not suitable (e.g. test-js), a CUSTOM_MAIN
parameter can be passed to the function to not link against the
boilerplate main function.
2021-04-25 09:36:49 +02:00
Linus Groh
a4c1860bfc LibRegex: Put to dbgln()s behind REGEX_DEBUG 2021-04-23 20:52:12 +02:00
Ali Mohammad Pur
bf9c04a3da LibRegex: Implement multiline stateful matches 2021-04-23 10:05:04 +02:00
Ali Mohammad Pur
bb40d4d5ff LibRegex: Do not attempt to find more matches when one match is needed 2021-04-23 10:05:04 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
de7062af9c LibRegex: Unbreak the ALL_DEBUG build 2021-04-22 09:23:28 +02:00
Andreas Kling
c68dcf45b6 LibRegex: Convert String::format() => String::formatted() 2021-04-21 23:49:02 +02:00
AnotherTest
5a14f7ea2f LibRegex: Generate a 'Compare' op for empty character classes
Otherwise it would match zero-length strings.
Fixes #6256.
2021-04-12 08:54:58 +02:00
AnotherTest
c128b3fd91 LibRegex: Remove 'ReadDigitFollowPolicy' as it's no longer needed
Thanks to @GMTA: 1b071455b1 (r49343474)
2021-04-10 12:10:45 +02:00
AnotherTest
1b071455b1 LibRegex: Treat brace quantifiers with invalid contents as literals
Fixes #6208.
2021-04-10 09:16:03 +02:00
AnotherTest
25d336bc27 LibRegex: Take the regex as a const reference in print_bytecode() 2021-04-10 09:16:03 +02:00
AnotherTest
e9279d1790 LibRegex: Allow a '?' suffix for brace quantifiers
This fixes another compat point in #6042.
2021-04-10 09:16:03 +02:00
AnotherTest
8d7bcc2476 LibRegex: Give ByteCode a copy ctor and and a move assignment operator
Previously all move assignments were actually copies. oops.
2021-04-10 09:16:03 +02:00
Jelle Raaijmakers
db321db5f4 LibRegex: Parse \0 as a zero-byte instead of 0x30 ("0")
This was causing some regexes to trip up. Fixes #6202.
2021-04-09 21:53:14 +02:00
AnotherTest
ade97d4094 LibRegex: Make sure there are as many group matches as actual matches
Fixes #6131.
2021-04-05 09:02:06 +02:00
AnotherTest
1bdc1cf77e LibRegex: Consider named capture groups as normal capture groups too 2021-04-05 09:02:06 +02:00
AnotherTest
be0182d049 LibRegex: Reset capture group indices when resetting parser state 2021-04-05 09:02:06 +02:00