In a previous commit I moved everything into the new subdirectories in
FileSystem/SysFS directory without trying to actually make changes in
the code itself too much. Now it's time to split the code to make it
more readable and understandable, hence this change occurs now.
As with the previous commit, we put a distinction between filesystems
that require a file description and those which don't, but now in a much
more readable mechanism - all initialization properties as well as the
create static method are grouped to create the FileSystemInitializer
structure. Then when we need to initialize an instance, we iterate over
a table of these structures, checking for matching structure and then
validating the given arguments from userspace against the requirements
to ensure we can create a valid instance of the requested filesystem.
This exposes the child processes for a process as a directory
of symlinks to the respective /proc entries for each child.
This makes for an easier and possibly more efficient way
to find and count a process's children. Previously the only
method was to parse the entire /proc/all JSON file.
Coverage tools like LLVM's source-based coverage or GNU's --coverage
need to be able to write out coverage files from any binary, regardless
of its security posture. Not ignoring these pledges and veils means we
can't get our coverage data out without playing some serious tricks.
However this is pretty terrible for normal exeuction, so only skip these
checks when we explicitly configured userspace for coverage.
AnonymousFile always allocates in multiples of a page size when created
with anon_create. This is especially an issue if we use AnonymousFile
shared memory to store a shared data structure that isn't exactly a
multiple of a page in size. Therefore, we can just allow mmaps of
AnonymousFile to map only an initial part of the shared memory.
This makes SharedSingleProducerCircularQueue work when it's introduced
later.
This is important for dmidecode because it does an fstat on the DMI
blobs, trying to figure out their size. Because we already know the size
of the blobs when creating the SysFS components, there's no performance
penalty whatsoever, and this allows dmidecode to not use the /dev/mem
device as a fallback.
The obsolete ttyname and ptsname syscalls are removed.
LibC doesn't rely on these anymore, and it helps simplifying the Kernel
in many places, so it's an overall an improvement.
In addition to that, /proc/PID/tty node is removed too as it is not
needed anymore by userspace to get the attached TTY of a process, as
/dev/tty (which is already a character device) represents that as well.
Error codes can leak information about veiled paths, if the path
resolution fails with e.g. EACCESS.
This is non-trivial to fix, as there is a group of error codes we want
to propagate to the caller, such as ENOMEM.
VirtualFileSystem::mkdir() relies on resolve_path() returning an error,
since it is only interested in the out_parent passed as a pointer. Since
resolve_path_without_veil returns an error, no process veil validation
is done by resolve_path() in that case. Due to this problem, mkdir()
should use resolve_path_without_veil() and then manually validate if the
parent directory of the to-be-created directory is unveiled with 'c'
permissions.
This fixes a bug where the mkdir syscall would not respect the process
veil at all.
Previously, VirtualFileSystem::resolve_path() could return a non-null
RefPtr<Custody>* out_parent even if the function errored because the
path has been veiled.
If code relies on recieving the parent custody even if the path is
veiled, it should just call resolve_path_without_veil and do the veil
validation manually. This is because it could be that the parent is
unveiled but the child isn't or the other way round.
Rename the bound socket accessor from socket() to bound_socket().
Also return RefPtr<LocalSocket> instead of a raw pointer, to make it
harder for callers to mess up.
A mutex is useful when we need to be able to block the current thread
until it's available. This is overkill for OpenFileDescriptor.
First off, this patch wraps the main state member variables inside a
SpinlockProtected<State> to enforce synchronized access. This also
avoids "free locking" where figuring out which variables are guarded
by which lock is left as an unamusing exercise for the reader.
Then we remove mutex locking from the functions that simply call through
to the underlying File or Inode, since those fields never change anyway,
and the target objects perform their own synchronization.
InodeWatcher::register_inode was already partially fallible, but the
insertion of the inodes and watch descriptions into their respective
hash maps was not. Note that we cannot simply TRY the insertion into
both, as that could result in an inconsistent state, instead we must
remove the inode from the inode hash map if the insertion into the
watch description hash map failed.
Before this commit all consume_until overloads aside from the Predicate
one would consume (and ignore) the stop char/string, while the
Predicate overload would not, in order to keep behaviour consistent,
the other overloads no longer consume the stop char/string as well.
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
These functions used to return booleans which withheld useful
error information for callers. Internally they would suppress
and convert Error objects. We now log or propagate these errors
up the stack.