Grep supports only extended regular expressions, and is able to handle one pattern
handed over via -e or directly after the options. Also, multiple files can be
handed over. Recursive mode is outstanding, but no real magic :^)
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
We already have a wrap() in EventWrapperFactory.cpp. It would be nice
to generate that at some point but it will require a lot more work on
the wrapper generator.
Inline layout nodes cannot have block children (except inline-block,
of course.)
When encountering a block box child of an inline, we now hoist the
block up to the inline's containing block, and also wrap any preceding
inline siblings in an anonymous wrapper block.
This improves the ACID2 situation quite a bit (although we still need
floats to really bring it home.)
I also took this opportunity to move all tree building logic into
Layout::TreeBuilder, to continue the theme of absolving our LayoutNode
objects of responsibilities. :^)
Rather than waiting until we get the first mouse packet, enable the
absolute mode immediately. This avoids having to click first to be
able to move the mouse.
CodeQL is a static analysis technology that was purchased by GitHub
and has been tightly integrated into the platform. It's different
from most other static analysis solutions because it's based on a
database built from your codebase, and then language specific rules
can be executed against that database. The rules are fully user
extensible, and are written in a datalog/query language.
The default cpp language rules coming from CodeQL will probably find
some issues, the ability to easily write custom rules/queries will
lend it self nicely to allowing us to validate Serenity specific
semantics are followed throughout the code.
References:
- https://www.youtube.com/watch?v=AMzGorD28Ks
- https://securitylab.github.com/tools/codeql
The Lexer constructor calls consume() once, which initializes m_position
to be > 0 and sets m_character. consume() calls is_line_terminator(),
which wasn't accounting for this state.
When creating a StringImpl for a C string that starts with a null-byte,
we would ignore the explicitly given length and return the empty
StringImpl - presumably to check for "\0", but this leads to false
positives ("\0foo") so let's only care about the length.
This was broken with the JS::Parser::Error position changes, but I don't
actually see a reason to do anything with the parser errors here, so
let's remove it and consider simply not crashing a success. :^)
This patch adds a 128-byte inline buffer that we use before switching
to using a dynamically growing ByteBuffer.
This allows us to avoid heap allocations in many cases, and totally
incidentally also speeds up @nico's favorite test, "disasm /bin/id"
more than 2x. :^)
If the percentage is 100, we were trying to get the heat gradient pixel
at (100, 0), which was one pixel past the end. Fix this by making the
heat gradient 101 pixels wide :^)
The implementation uses atomics and futexes (yay!) and is heavily based on the
implementation I did for my learning project named "Let's write synchronization
primitives" [0].
That project, in fact, started when I tried to implement pthread_once() for
Serenity (because it was needed for another project of mine, stay tuned ;) ) and
was not very sure I got every case right. So now, after learning some more about
code patterns around atomics and futexes, I am reasonably sure, and it's time to
contribute the implementation of pthread_once() to Serenity :^)
[0] To be published at https://github.com/bugaevc/lets-write-sync-primitives
If a receiver is given, e.g. via Reflect.get/set(), forward it to the
target object's get()/put() or use it as last argument of the trap
function. The default value is the Proxy object itself.
We need to not only add a record for a reference, but we need
to copy the reference count on fork as well, because the code
in the fork assumes that it has the same amount of references,
still.
Also, once all references are dropped when a process is disowned,
delete the shared buffer.
Fixes#4076
This makes misses in the BlockBasedFS's LRU block cache faster by
storing the cache entries in one of two doubly-linked list.
Dirty and clean cache entries are kept in two separate lists, and
move between them when their state changes. This can probably be
improved upon further.
If the inode's block list cache is empty, we forgot to assign the
result of computing the block list. The fact that this worked anyway
makes me wonder when we actually don't have a cache..
Thanks to szyszkienty for spotting this! :^)
Instead of doing a linear scan of the entire cache when doing a lookup,
we now have a nice O(1) HashMap in front of the cache.
The cache miss case can still be improved, this patch really only helps
the cache hit case.
This dramatically improves cached filesystem I/O. :^)
Since we're using UserOrKernelBuffers, SMAP will be automatically
disabled when we actually access the buffer later on. There's no need
to disable it wholesale across the entire read/write operations.
The CE button on the windows calculator was used to clear the current
entry rather than the current error. This commit changes the
calculator's CE button such that it now clears the current value being
entered into the keypad and errors are now cleared at the end of every
successful operation.
Previously GUI::Actions which were constructed without initializing
m_shortcut could be activated via an invalid GUI::Shortcut.
Steps to reproduce:
It was possible to enable TextEditor's markdown preview by pressing
Ctrl or Alt (Cmd/Ctrl) keys, which should not happen, as this Action
did not specify a shortcut.
This fix should apply to all other cases where actions where declared
without specifying a shortcut.
Language servers will now receive an open file instead of just its path. This
means the language servers no longer need to access the filesystem to open the
file themselves.
The C++ language server now has no filesystem access whatsoever (although we
might need to relax this in the future if it learns to complete #include paths),
while the Shell language server can read /etc/passwd (it wants that in order to
get the user's home directory) and browse (but not read!) the whole file system
tree for completing paths.
This is a new "browse" permission that lets you open (and subsequently list
contents of) directories underneath the path, but not regular files or any other
types of files.
It is now possible to use the special IPC::File type in message arguments. In
C++, the type is nothing more than a wrapper over a file descriptor. But when
serializing/deserializing IPC::File arguments, LibIPC will use the sendfd/recvfd
kernel APIs instead of sending the integer inline.
This makes it quite convenient to pass files over IPC, and will allow us to
significantly tighten sandboxes in the future :^)
Closes https://github.com/SerenityOS/serenity/issues/3643
Much like with Vector::append(), you may want to append multiple items in one
go. It's actually more important to do this for prepending, because you don't
want to copy the rest of items further each time.
We should never resume a thread by directly setting it to Running state.
Instead, if a thread was in Running state when stopped, record the state
as Runnable.
Fixes#4150