serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-14 11:32:19 +00:00

Author	SHA1	Message	Date
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Linus Groh	babfc13c84	Everywhere: Remove 'clang-format off' comments that are no longer needed https://github.com/SerenityOS/serenity/pull/15654#issuecomment-1322554496	2022-12-03 23:52:23 +00:00
Linus Groh	d26aabff04	Everywhere: Run clang-format	2022-12-03 23:52:23 +00:00
Ali Mohammad Pur	253f4de302	LibRegex: Use spans<4> to avoid allocating small vectors This path is hit a lot, and alloc/free of this vector was showing up on profiles, so get rid of it.	2022-11-17 20:13:04 +03:30
Ali Mohammad Pur	f1851346d3	LibRegex: Use a copy-on-write vector for fork state	2022-11-17 20:13:04 +03:30
Ali Mohammad Pur	cfcd6e770c	LibRegex: Don't copy forked results twice	2022-11-17 20:13:04 +03:30
Ali Mohammad Pur	464ac85a1b	LibRegex: Avoid copying MatchInput when getting argument descriptions	2022-11-17 20:13:04 +03:30
Ali Mohammad Pur	00326a63ed	LibRegex: Don't treat ForkReplace* as new forks	2022-11-09 21:28:54 +01:00
Daniel Bertalan	4296425bd8	Everywhere: Remove redundant inequality comparison operators C++20 can automatically synthesize `operator!=` from `operator==`, so there is no point in writing such functions by hand if all they do is call through to `operator==`. This fixes a compile error with compilers that implement P2468 (Clang 16 currently). This paper restores the C++17 behavior that if both `T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators makes the rewriting possible again. See https://reviews.llvm.org/D134529#3853062	2022-11-06 10:25:08 -07:00
Tim Schumacher	ce2f1b845f	Everywhere: Mark dependencies of most targets as PRIVATE Otherwise, we end up propagating those dependencies into targets that link against that library, which creates unnecessary link-time dependencies. Also included are changes to readd now missing dependencies to tools that actually need them.	2022-11-01 14:49:09 +00:00
Tim Schumacher	7834e26ddb	Everywhere: Explicitly link all binaries against the LibC target Even though the toolchain implicitly links against -lc, it does not know where it should get LibC from except for the sysroot. In the case of Clang this causes it to pick up the LibC stub instead, which might be slightly outdated and feature missing symbols. This is currently not an issue that manifests because we pass through the dependency on LibC and other libraries by accident, which causes CMake to link against the LibC target (instead of just the library), and thus points the linker at the build output directory. Since we are looking to fix that in the upcoming commits, let's make sure that everything will still be able to find the proper LibC first.	2022-11-01 14:49:09 +00:00
Andrew Kaster	1ae0cfd08b	CMake+Userland: Use CMakeLists from Userland to build Lagom Libraries Also do this for Shell. This greatly simplifies the CMakeLists in Lagom, replacing many glob patterns with a big list of libraries. There are still a few special libraries that need some help to conform to the pattern, like LibELF and LibWebView. It also lets us remove essentially all of the Serenity or Lagom binary directory detection logic from code generators, as now both projects directories enter the generator logic from the same place.	2022-10-16 16:36:39 +02:00
Gunnar Beutner	a650c74b27	AK+Toolchain: Make char and wchar_t behave on AARCH64 By default char and wchar_t are unsigned on AARCH64. This fixes a bunch of related compiler errors.	2022-10-14 13:01:13 +02:00
Andrew Kaster	828441852f	Everywhere: Replace uses of __serenity__ with AK_OS_SERENITY Now that we have OS macros for essentially every supported OS, let's try to use them everywhere.	2022-10-10 12:23:12 +02:00
Andrew Kaster	896d4e8dc1	LibRegex: Don't build LibRegex/C/Regex.cpp on Lagom This file implements the POSIX APIs from <regex.h>, and is not suitable for inclusion in a Lagom build. If we do include it, it will override the host's regex functions and wreak havoc if it's resolved before the host's implementation.	2022-10-10 12:23:12 +02:00
Ali Mohammad Pur	578d73943a	LibC+LibRegex: Move central regex definitions into LibC/bits This decouples LibRegex from the serenity LibC. Fixes #15251.	2022-09-20 12:57:21 +01:00
Ben Wiederhake	7c5e30daaa	Everywhere: Fix badly-formatted includes	2022-09-17 04:00:54 +00:00
Tim Schumacher	8763dbcccc	Everywhere: Remove a bunch of dead write-only variables LLVM 15 now warns (and thus errors) about this, and there is really no point in keeping them.	2022-09-16 05:39:28 +00:00
Ali Mohammad Pur	660d2b53b1	LibRegex: Account for eof after \<x> when 'x' leads to legacy behaviour	2022-09-12 16:03:57 +04:30
Ali Mohammad Pur	48442059fc	LibRegex: Consume exactly two chars for escaped characters We were previously consuming an extra char afterwards, which could be the charclass terminator, leading to possible OOB accesses.	2022-09-12 16:03:57 +04:30
Timothy Flynn	48cb15283a	LibRegex: Explicitly check if a character falls into a table-based range Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think the character "u" fell into the range "a-s" because neither of the conditions "u > s && U > s" or "u < a && U < a" would be true, resulting in the lookup falling back to assuming the character is in the range. Instead, first explicitly check if the character falls into the range, rather than checking if it falls outside the range. If the explicit checks fail, then we know the character is outside the range.	2022-08-29 16:34:47 -04:00
Ali Mohammad Pur	e43b478920	LibRegex: Check code unit count range when accessing by code unit count	2022-07-20 21:25:59 +01:00
Ali Mohammad Pur	598dc74a76	LibRegex: Partially implement the ECMAScript unicodeSets proposal This skips the new string unicode properties additions, along with \q{}.	2022-07-20 21:25:59 +01:00
Ali Mohammad Pur	7734914909	LibRegex: Refactor parsing 'CharacterEscape' out of 'AtomEscape' The ECMA262 spec has this as a separate production, and we need it to be split up for a future commit.	2022-07-20 21:25:59 +01:00
Ali Mohammad Pur	b908f9f6ef	LibRegex: Pass parse flags as a struct instead of multiple arguments	2022-07-20 21:25:59 +01:00
sin-ack	5422691f07	LibRegex: Remove RegexStringView(char const) constructor This allowed passing in a nullptr for the StringView which will not be possible once StringView(char const) is removed.	2022-07-12 23:11:35 +02:00
sin-ack	fbc771efe9	Everywhere: Use default StringView constructor over nullptr While null StringViews are just as bad, these prevent the removal of StringView(char const*) as that constructor accepts a nullptr. No functional changes.	2022-07-12 23:11:35 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
sin-ack	c70f45ff44	Everywhere: Explicitly specify the size in StringView constructors This commit moves the length calculations out to be directly on the StringView users. This is an important step towards the goal of removing StringView(char const*), as it moves the responsibility of calculating the size of the string to the user of the StringView (which will prevent naive uses causing OOB access).	2022-07-12 23:11:35 +02:00
Ali Mohammad Pur	d348eaf305	LibRegex: Treat inverted Compare entries as disjunctions [^XYZ] is not(X \| Y \| Z), we used to translate this to not(X) \| not(Y) \| not(Z), this commit makes LibRegex interpret this pattern as not(X) & not(Y) & not(Z).	2022-07-10 14:26:03 +02:00
Ali Mohammad Pur	fe46b2c141	LibRegex: Correctly track current inversion state in the optimizer This is currently not important as we do not nest TemporaryInverse.	2022-07-10 14:26:03 +02:00
Ali Mohammad Pur	9c5febe800	LibRegex: Flush compare tables before entering a permanent inverse state	2022-07-10 14:26:03 +02:00
Ali Mohammad Pur	b85666b3d2	LibRegex: Fix lookup table-based range checks in Compare The lowercase version of a range is not required to be a valid range, instead of casefolding the range and making it invalid, check twice with both cases of the input character (which are the same as the input if not insensitive). This time includes an actual test :^)	2022-07-09 01:00:44 +00:00
Ali Mohammad Pur	5f012778b8	LibRegex: Use the correct values for comparing LUT entries Previously we were ignoring the insensitive flag for LUT lookups.	2022-07-05 07:19:13 +02:00
Ali Mohammad Pur	7d01ee63d6	LibRegex: Use proper CharRange constructor instead of bit_casting Otherwise the range order would be inverted.	2022-07-05 07:19:13 +02:00
Ali Mohammad Pur	6e655b7f89	LibRegex: Fully interpret the Compare Op when looking for overlaps We had a really naive and simplistic implementation, which lead to various issues where the optimiser incorrectly rewrote the regex to use atomic groups; this commit fixes that.	2022-07-04 23:09:53 +02:00
Ali Mohammad Pur	1409a48da6	LibRegex: Check inverse_matched after every op, not just at the end Fixes #13755. Co-Authored-By: Damien Firmenich <fir.damien@gmail.com>	2022-04-22 10:02:39 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
Ali Mohammad Pur	97a333608e	LibRegex: Make codegen+optimisation for alternatives much faster Just a little thinking outside the box, and we can now parse and optimise a million copies of "a\|" chained together in just a second :^)	2022-02-20 11:53:59 +01:00
Ali Mohammad Pur	4be7239626	LibRegex: Make parse_disjunction() consume all disjunctions in one frame This helps us not blow up when too many disjunctions are chained togther in the regex we're parsing. Fixes #12615.	2022-02-20 11:53:59 +01:00
Ali Mohammad Pur	627bbee055	LibRegex: Allow quantifiers after quantifiable assertions While quantifying assertions is very much meaningless, the specification allows them with annex B's extended grammar for browsers, so read and apply the quantifiers. Fixes #12373.	2022-02-20 11:53:59 +01:00
Ali Mohammad Pur	3b0943d24c	LibRegex: Correct the alternative matching order when one is empty Previously we were compiling `/a\|/` into what effectively would be `/\|a`, which is clearly incorrect.	2022-02-14 11:30:50 +01:00
Ali Mohammad Pur	6a4c8a66ae	LibRegex: Only skip full instructions when optimizing alternations It makes no sense to skip half of an instruction, so make sure to skip only full instructions!	2022-02-09 21:02:24 +00:00
Timothy Flynn	2212aa2388	LibRegex: Support non-ASCII whitespace characters when matching \s or \S ECMA-262 defines \s as: Return the CharSet containing all characters corresponding to a code point on the right-hand side of the WhiteSpace or LineTerminator productions. The LineTerminator production is simply: U+000A, U+000D, U+2028, or U+2029. Unfortunately there isn't a Unicode property that covers just those code points. The WhiteSpace production is: U+0009, U+000B, U+000C, U+FEFF, or any code point with the Space_Separator general category. If the Unicode generators are disabled, this will fall back to ASCII space code points.	2022-02-05 22:30:10 +03:30
Timothy Flynn	3729fd06fa	LibRegex: Do not return an Optional from Regex::Matcher::execute The code path that could return an optional no longer exists as of commit: `a962ee020a`	2022-02-05 19:06:50 +03:30
Timothy Flynn	27d3de1f17	LibRegex: Do not continue searching input when the sticky bit is set This partially reverts commit `a962ee020a`. When the sticky bit is set, the global bit should basically be ignored except by external callers who want their own special behavior. For example, RegExp.prototype [ @@match ] will use the global flag to accumulate consecutive matches. But on the first failure, the regex loop should break.	2022-02-05 19:06:50 +03:30
Ali Mohammad Pur	a962ee020a	LibJS+LibRegex: Don't repeat regex match in regexp_exec() LibRegex already implements this loop in a more performant way, so all LibJS has to do here is to return things in the right shape, and not loop over the input string. Previously this was a quadratic operation on string length, which lead to crazy execution times on failing regexps - now it's nice and fast :^) Note that a Regex test has to be updated to remove the stateful flag as it repeats matching on multiple strings.	2022-02-05 00:09:32 +01:00
Ali Mohammad Pur	2b028f6faa	LibRegex+LibJS: Avoid searching for more than one match in JS RegExps All of JS's regular expression APIs only want a single match, so avoid trying to produce more (which will be discarded anyway).	2022-02-05 00:09:32 +01:00
Ali Mohammad Pur	5fac41f733	LibRegex: Implement ECMA262 multiline matching without splitting lines As ECMA262 regex allows `[^]` and literal newlines to match newlines in the input string, we shouldn't split the input string into lines, rather simply make boundaries and catchall patterns capable of checking for these conditions specifically.	2022-01-26 00:53:09 +03:30
Ali Mohammad Pur	aa20210119	LibRegex: Don't return empty vectors from RegexStringView::lines() Instead, return a vector of one empty string.	2022-01-26 00:53:09 +03:30

1 2 3 4 5 ...

273 commits