We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
C++20 can automatically synthesize `operator!=` from `operator==`, so
there is no point in writing such functions by hand if all they do is
call through to `operator==`.
This fixes a compile error with compilers that implement P2468 (Clang
16 currently). This paper restores the C++17 behavior that if both
`T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be
rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators
makes the rewriting possible again.
See https://reviews.llvm.org/D134529#3853062
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.
This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.
Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
Also do this for Shell.
This greatly simplifies the CMakeLists in Lagom, replacing many glob
patterns with a big list of libraries. There are still a few special
libraries that need some help to conform to the pattern, like LibELF and
LibWebView.
It also lets us remove essentially all of the Serenity or Lagom binary
directory detection logic from code generators, as now both projects
directories enter the generator logic from the same place.
This file implements the POSIX APIs from <regex.h>, and is not suitable
for inclusion in a Lagom build. If we do include it, it will override
the host's regex functions and wreak havoc if it's resolved before the
host's implementation.
Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think
the character "u" fell into the range "a-s" because neither of the
conditions "u > s && U > s" or "u < a && U < a" would be true, resulting
in the lookup falling back to assuming the character is in the range.
Instead, first explicitly check if the character falls into the range,
rather than checking if it falls outside the range. If the explicit
checks fail, then we know the character is outside the range.
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
[^XYZ] is not(X | Y | Z), we used to translate this to
not(X) | not(Y) | not(Z), this commit makes LibRegex interpret this
pattern as not(X) & not(Y) & not(Z).
The lowercase version of a range is not required to be a valid range,
instead of casefolding the range and making it invalid, check twice with
both cases of the input character (which are the same as the input if
not insensitive).
This time includes an actual test :^)
We had a really naive and simplistic implementation, which lead to
various issues where the optimiser incorrectly rewrote the regex to use
atomic groups; this commit fixes that.
While quantifying assertions is very much meaningless, the specification
allows them with annex B's extended grammar for browsers, so read and
apply the quantifiers.
Fixes#12373.
ECMA-262 defines \s as:
Return the CharSet containing all characters corresponding to a code
point on the right-hand side of the WhiteSpace or LineTerminator
productions.
The LineTerminator production is simply: U+000A, U+000D, U+2028, or
U+2029. Unfortunately there isn't a Unicode property that covers just
those code points.
The WhiteSpace production is: U+0009, U+000B, U+000C, U+FEFF, or any
code point with the Space_Separator general category.
If the Unicode generators are disabled, this will fall back to ASCII
space code points.
This partially reverts commit a962ee020a.
When the sticky bit is set, the global bit should basically be ignored
except by external callers who want their own special behavior. For
example, RegExp.prototype [ @@match ] will use the global flag to
accumulate consecutive matches. But on the first failure, the regex
loop should break.
LibRegex already implements this loop in a more performant way, so all
LibJS has to do here is to return things in the right shape, and not
loop over the input string.
Previously this was a quadratic operation on string length, which lead
to crazy execution times on failing regexps - now it's nice and fast :^)
Note that a Regex test has to be updated to remove the stateful flag as
it repeats matching on multiple strings.
As ECMA262 regex allows `[^]` and literal newlines to match newlines in
the input string, we shouldn't split the input string into lines, rather
simply make boundaries and catchall patterns capable of checking for
these conditions specifically.