1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-31 12:38:12 +00:00
Commit graph

1143 commits

Author SHA1 Message Date
Andrew Kaster
100fb38c3e Kernel+Userland: Move LibC/sys/ioctl_numbers to Kernel/API/Ioctl.h
This header has always been fundamentally a Kernel API file. Move it
where it belongs. Include it directly in Kernel files, and make
Userland applications include it via sys/ioctl.h rather than directly.
2023-01-21 10:43:59 -07:00
Andrew Kaster
ddea37b521 Kernel+LibC: Move name length constants to Kernel/API from limits.h
Reduce inclusion of limits.h as much as possible at the same time.

This does mean that kmalloc.h is now including Kernel/API/POSIX/limits.h
instead of LibC/limits.h, but the scope could be limited a lot more.
Basically every file in the kernel includes kmalloc.h, and needs the
limits.h include for PAGE_SIZE.
2023-01-21 10:43:59 -07:00
Liav A
7aebbe52b9 Meta: Fix copyright header in Kernel/Syscalls/jail.cpp file
I wrote that file in 2022, not Andreas in 2018.
2023-01-14 09:57:04 +01:00
Liav A
16b6e644d7 Kernel: Require "stdio" pledge promise when calling get_root_session_id 2023-01-13 13:41:30 +01:00
Andreas Kling
5dcc58d54a Kernel+LibCore: Make %sid path parsing not take ages
Before this patch, Core::SessionManagement::parse_path_with_sid() would
figure out the root session ID by sifting through /sys/kernel/processes.

That file can take quite a while to generate (sometimes up to 40ms on my
machine, which is a problem on its own!) and with no caching, many of
our programs were effectively doing this multiple times on startup when
unveiling something in /tmp/session/%sid/

While we should find ways to make generating /sys/kernel/processes fast
again, this patch addresses the specific problem by introducing a new
syscall: sys$get_root_session_id(). This extracts the root session ID
by looking directly at the process table and takes <1ms instead of 40ms.

This cuts WebContent process startup time by ~100ms on my machine. :^)
2023-01-10 19:32:31 +01:00
Liav A
04221a7533 Kernel: Mark Process::jail() method as const
We really don't want callers of this function to accidentally change
the jail, or even worse - remove the Process from an attached jail.
To ensure this never happens, we can just declare this method as const
so nobody can mutate it this way.
2023-01-07 03:44:59 +03:30
yyny
fb2be937ac Kernel: Allow sending SIGCONT to processes in the same group
Allow sending `SIGCONT` to processes that share the same `pgid`.
This is allowed in Linux aswell.

Also fixes a FIXME :^)
2023-01-03 18:13:11 +01:00
yyny
9ca979846c Kernel: Add sid and pgid to Credentials
There are places in the kernel that would like to have access
to `pgid` credentials in certain circumstances.

I haven't found any use cases for `sid` yet, but `sid` and `pgid` are
both changed with `sys$setpgid`, so it seemed sensical to add it.

In Linux, `man 7 credentials` also mentions both the session id and
process group id, so this isn't unprecedented.
2023-01-03 18:13:11 +01:00
kleines Filmröllchen
a6a439243f Kernel: Turn lock ranks into template parameters
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
  obtain any lock rank just via the template parameter. It was not
  previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
  of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
  readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
  wanted in the first place (or made use of). Locks randomly changing
  their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
  taken in the right order (with the right lock rank checking
  implementation) as rank information is fully statically known.

This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
2023-01-02 18:15:27 -05:00
Liav A
e598f22768 Kernel: Disallow executing SUID binaries if process is jailed
Check if the process we are currently running is in a jail, and if that
is the case, fail early with the EPERM error code.

Also, as Brian noted, we should also disallow attaching to a jail in
case of already running within a setid executable, as this leaves the
user with false thinking of being secure (because you can't exec new
setid binaries), but the current program is still marked setid, which
means that at the very least we gained permissions while we didn't
expect it, so let's block it.
2022-12-30 15:49:37 -05:00
Timon Kruiper
a3cbaa3449 Kernel: Move ThreadRegisters into arch-specific directory
These are architecture-specific anyway, so they belong in the Arch
directory. This commit also adds ThreadRegisters::set_initial_state to
factor out the logic in Thread.cpp.
2022-12-29 19:32:20 -07:00
Liav A
91db482ad3 Kernel: Reorganize Arch/x86 directory to Arch/x86_64 after i686 removal
No functional change.
2022-12-28 11:53:41 +01:00
Liav A
5ff318cf3a Kernel: Remove i686 support 2022-12-28 11:53:41 +01:00
Liav A
8585b2dc23 Kernel/Memory: Add option to annotate region mapping as immutable
We add this basic functionality to the Kernel so Userspace can request a
particular virtual memory mapping to be immutable. This will be useful
later on in the DynamicLoader code.

The annotation of a particular Kernel Region as immutable implies that
the following restrictions apply, so these features are prohibited:
- Changing the region's protection bits
- Unmapping the region
- Annotating the region with other virtual memory flags
- Applying further memory advises on the region
- Changing the region name
- Re-mapping the region
2022-12-16 01:02:00 -07:00
Liav A
6c0486277e Kernel: Reintroduce the msyscall syscall as the annotate_mapping syscall
This syscall will be used later on to ensure we can declare virtual
memory mappings as immutable (which means that the underlying Region is
basically immutable for both future annotations or changing the
protection bits of it).
2022-12-16 01:02:00 -07:00
Agustin Gianni
ac40090583 Kernel: Add the auxiliary vector to the stack size validation
This patch validates that the size of the auxiliary vector does not
exceed `Process::max_auxiliary_size`. The auxiliary vector is a range
of memory in userspace stack where the kernel can pass information to
the process that will be created via `Process:do_exec`.

The reason the kernel needs to validate its size is that the about to
be created process needs to have remaining space on the stack.
Previously only `argv` and `envp` were taken into account for the
size validation, with this patch, the size of `auxv` is also
checked. All three elements contain values that a user (or an
attacker) can specify.

This patch adds the constant `Process::max_auxiliary_size` which is
defined to be one eight of the user-space stack size. This is the
approach taken by `Process:max_arguments_size` and
`Process::max_environment_size` which are used to check the sizes
of `argv` and `envp`.
2022-12-14 15:09:28 +00:00
sin-ack
ef6921d7c7 Kernel+LibC+LibELF: Set stack size based on PT_GNU_STACK during execve
Some programs explicitly ask for a different initial stack size than
what the OS provides. This is implemented in ELF by having a
PT_GNU_STACK header which has its p_memsz set to the amount that the
program requires. This commit implements this policy by reading the
p_memsz of the header and setting the main thread stack size to that.
ELF::Image::validate_program_headers ensures that the size attribute is
a reasonable value.
2022-12-11 19:55:37 -07:00
sin-ack
9b425b860c Kernel+LibC+Tests: Implement pwritev(2)
While this isn't really POSIX, it's needed by the Zig port and was
simple enough to implement.
2022-12-11 19:55:37 -07:00
sin-ack
70337f3a4b Kernel+LibC: Implement setregid(2)
This copies and adapts the setresgid syscall, following in the footsteps
of setreuid and setresuid.
2022-12-11 19:55:37 -07:00
sin-ack
2a502fe232 Kernel+LibC+LibCore+UserspaceEmulator: Implement faccessat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
sin-ack
d5fbdf1866 Kernel+LibC+LibCore: Implement renameat(2)
Now with the ability to specify different bases for the old and new
paths.
2022-12-11 19:55:37 -07:00
sin-ack
eb5389e933 Kernel+LibC+LibCore: Implement mkdirat(2) 2022-12-11 19:55:37 -07:00
sin-ack
6445a706cf Kernel+LibC: Implement readlinkat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
sin-ack
9850a69cd1 Kernel+LibC+LibCore: Implement symlinkat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
Andreas Kling
4277e2d58f Kernel: Add some spec links and comments to sys$posix_fallocate() 2022-11-29 11:09:19 +01:00
Andreas Kling
961e1e590b Kernel: Make sys$posix_fallocate() fail with ENODEV on non-regular files
Previously we tried to determine if `fd` refers to a non-regular file by
doing a stat() operation on the file.

This didn't work out very well since many File subclasses don't
actually implement stat() but instead fall back to failing with EBADF.

This patch fixes the issue by checking for regular files with
File::is_regular_file() instead.
2022-11-29 11:09:19 +01:00
Andreas Kling
9249bcb5aa Kernel: Remove unnecessary FIXME in sys$posix_fallocate()
This syscall doesn't need to do anything for ENOSPC, as that is already
handled by its callees.
2022-11-29 11:09:19 +01:00
Liav A
718ae68621 Kernel+LibCore+LibC: Implement support for forcing unveil on exec
To accomplish this, we add another VeilState which is called
LockedInherited. The idea is to apply exec unveil data, similar to
execpromises of the pledge syscall, on the current exec'ed program
during the execve sequence. When applying the forced unveil data, the
veil state is set to be locked but the special state of LockedInherited
ensures that if the new program tries to unveil paths, the request will
silently be ignored, so the program will continue running without
receiving an error, but is still can only use the paths that were
unveiled before the exec syscall. This in turn, allows us to use the
unveil syscall with a special utility to sandbox other userland programs
in terms of what is visible to them on the filesystem, and is usable on
both programs that use or don't use the unveil syscall in their code.
2022-11-26 12:42:15 -07:00
Andreas Kling
5556b27e38 Kernel: Update tv_nsec field when using utimensat() with UTIME_NOW
We were only updating the tv_sec field and leaving UTIME_NOW in tv_nsec.
2022-11-24 16:56:27 +01:00
Liav A
9559682f5c Kernel: Disallow jail creation from a process within a jail
We now disallow jail creation from a process within a jail because there
is simply no valid use case to allow it, and we will probably not enable
this behavior (which is considered a bug) again.

Although there was no "real" security issue with this bug, as a process
would still be denied to join that jail, there's an information reveal
about the amount of jails that are or were present in the system.
2022-11-13 16:58:54 -07:00
Liav A
3cc0d60141 Kernel: Split the Ext2FileSystem.{cpp,h} files into smaller components 2022-11-08 02:54:48 -07:00
Liav A
1c91881a1d Kernel: Split the ISO9660FileSystem.{cpp,h} files to smaller components 2022-11-08 02:54:48 -07:00
Liav A
fca3b7f1f9 Kernel: Split the DevPtsFS files into smaller components 2022-11-08 02:54:48 -07:00
Liav A
3fc52a6d1c Kernel: Split the Plan9FileSystem.{cpp,h} file into smaller components 2022-11-08 02:54:48 -07:00
Liav A
3906dd3aa3 Kernel: Split the ProcFS core file into smaller components 2022-11-08 02:54:48 -07:00
Liav A
e882b2ed05 Kernel: Split the FATFileSystem.{cpp,h} files into smaller components 2022-11-08 02:54:48 -07:00
Liav A
5e6101dd3e Kernel: Split the TmpFS core files into smaller components 2022-11-08 02:54:48 -07:00
Liav A
f53149d5f6 Kernel: Split the SysFS core files into smaller components 2022-11-08 02:54:48 -07:00
Liav A
5e062414c1 Kernel: Add support for jails
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).

A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
2022-11-05 18:00:58 -06:00
Andreas Kling
9c46fb7337 Kernel: Make sys$msyscall() not take the big lock
This function is already serialized by the address space lock.
2022-11-05 18:54:39 +01:00
kleines Filmröllchen
259bfe05b1 Kernel: Set priority of all threads within a process if requested
This is intended to reflect the POSIX sched_setparam API, which has some
cryptic language
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_04_01
) that as far as I can tell implies we should prioritize process
scheduling policies over thread scheduling policies. Technically this
means that a process must have its own sets of policies that are
considered first by the scheduler, but it seems unlikely anyone relies
on this behavior in practice. So we just override all thread's policies,
making them (at least before calls to pthread_setschedparam) behave
exactly like specified on the surface.
2022-10-27 11:30:19 +01:00
kleines Filmröllchen
bbe40ae632 Kernel: Prevent regular users from accessing other processes' threads 2022-10-27 11:30:19 +01:00
kleines Filmröllchen
b8567d7a9d Kernel: Make scheduler control syscalls more generic
The syscalls are renamed as they no longer reflect the exact POSIX
functionality. They can now handle setting/getting scheduler parameters
for both threads and processes.
2022-10-27 11:30:19 +01:00
demostanis
3e8b5ac920 AK+Everywhere: Turn bool keep_empty to an enum in split* functions 2022-10-24 23:29:18 +01:00
Gunnar Beutner
ce4b66e908 Kernel: Add support for MSG_NOSIGNAL and properly send SIGPIPE
Previously we didn't send the SIGPIPE signal to processes when
sendto()/sendmsg()/etc. returned EPIPE. And now we do.

This also adds support for MSG_NOSIGNAL to suppress the signal.
2022-10-24 15:49:39 +02:00
Liav A
0fd7b688af Kernel: Introduce support for using FileSystem object in multiple mounts
The idea is to enable mounting FileSystem objects across multiple mounts
in contrast to what happened until now - each mount has its own unique
FileSystem object being attached to it.

Considering a situation of mounting a block device at 2 different mount
points at in system, there were a couple of critical flaws due to how
the previous "design" worked:
1. BlockBasedFileSystem(s) that pointed to the same actual device had a
separate DiskCache object being attached to them. Because both instances
were not synchronized by any means, corruption of the filesystem is most
likely achieveable by a simple cache flush of either of the instances.
2. For superblock-oriented filesystems (such as the ext2 filesystem),
lack of synchronization between both instances can lead to severe
corruption in the superblock, which could render the entire filesystem
unusable.
3. Flags of a specific filesystem implementation (for example, with xfs
on Linux, one can instruct to mount it with the discard option) must be
honored across multiple mounts, to ensure expected behavior against a
particular filesystem.

This patch put the foundations to start fix the issues mentioned above.
However, there are still major issues to solve, so this is only a start.
2022-10-22 16:57:52 -04:00
Liav A
965afba320 Kernel/FileSystem: Add a few missing includes
In preparation to future commits, we need to ensure that
OpenFileDescription.h doesn't include the VirtualFileSystem.h file to
avoid include loops.
2022-10-22 16:57:52 -04:00
Liav A
97f8927da6 Kernel: Remove the DevTmpFS class
Although this code worked quite well, it is considered to be a code
duplication with the TmpFS code which is more tested and works quite
well for a variety of cases. The only valid reason to keep this
filesystem was that it enforces that no regular files will be created at
all in the filesystem. Later on, we will re-introduce this feature in a
sane manner. Therefore, this can be safely removed after SystemServer no
longer uses this filesystem type anymore.
2022-10-22 19:18:15 +02:00
Timon Kruiper
9827c11d8b Kernel: Move InterruptDisabler out of Arch directory
The code in this file is not architecture specific, so it can be moved
to the base Kernel directory.
2022-10-17 20:11:31 +02:00
Undefine
135ca3fa1b Kernel: Add support for the FAT32 filesystem
This commit adds read-only support for the FAT32 filesystem. It also
includes support for long file names.
2022-10-14 18:36:40 -06:00