Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
We previously ignored these values in the return value of
load_elf_object, which causes us to not allocate a TLS region for
statically-linked programs.
This implements a fallback to munmap that unmaps multiple regions at a
time, with splitting some when needed.
The way it is implemented is possibly not optimal, due to it searching
without looking into the cache
The previous architecture had a huge flaw: the pointer to the protected
data was itself unprotected, allowing you to overwrite it at any time.
This patch reorganizes the protected data so it's part of the Process
class itself. (Actually, it's a new ProcessBase helper class.)
We use the first 4 KB of Process objects themselves as the new storage
location for protected data. Then we make Process objects page-aligned
using MAKE_ALIGNED_ALLOCATED.
This allows us to easily turn on/off write-protection for everything in
the ProcessBase portion of Process. :^)
Thanks to @bugaevc for pointing out the flaw! This is still not perfect
but it's an improvement.
Process member variable like m_euid are very valuable targets for
kernel exploits and until now they have been writable at all times.
This patch moves m_euid along with a whole bunch of other members
into a new Process::ProtectedData struct. This struct is remapped
as read-only memory whenever we don't need to write to it.
This means that a kernel write primitive is no longer enough to
overwrite a process's effective UID, you must first unprotect the
protected data where the UID is stored. :^)
This returns ENOSYS if you are running in the real kernel, and some
other result if you are running in UserspaceEmulator.
There are other ways we could check if we're inside an emulator, but
it seemed easier to just ask. :^)
Switch to using type-safe bitwise operators for the BlockFlags class,
this cleans up a lot of boilerplate casts which are necessary when the
enum is declared as `enum class`.
The expression
(u8*)params.m_stack_location + stack_size
… causes UBSan to spit out the warning
KUBSAN: addition of unsigned offset to 0x00000002 overflowed to 0xb0000003
… even though there is no actual overflow happening here.
This can be reproduced by running:
$ syscall create_thread 0 [ 0 0 0 0 0xb0000001 2 ]
Technically, this is a true-positive: The C++-reference is incredibly strict
about pointer-arithmetic:
> A pointer to non-array object is treated as a pointer to the first element
> of an array with size 1. […] [A]ttempts to generate a pointer that isn't
> pointing at an element of the same array or one past the end invoke
> undefined behavior.
https://en.cppreference.com/w/cpp/language/operator_arithmetic
Frankly, this feels silly. So let's just use FlatPtr instead.
Found by fuzz-syscalls. Undocumented bug.
Note that FlatPtr is an unsigned type, so
user_esp.value() - 4
is defined even if we end up with a user_esp of 0 (this can happen for example
when params.m_stack_size = 0 and params.m_stack_location = 0). The result would
be a Kernelspace-pointer, which would then be immediately flagged by
'MM.validate_user_stack' as invalid, as intended.
Since we know for sure that the virtual memory regions in the new
process being created are not being used on any CPU, there's no need
to do TLB flushes for every mapped page.
Dynamic Vector allocations in sys$select() were showing up in the
full-system profile and since there will never be more than FD_SETSIZE
file descriptors to worry about, we can confidently add enough inline
capacity to this Vector that it never has to kmalloc.
To compensate for the increased stack usage, reduce the size of the
FDInfo struct while we're here. :^)
The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.
This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.
This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
The superuser can now call sys$profiling_enable() with PID -1 to enable
profiling of all running threads in the system. The perf events are
collected in a global PerformanceEventBuffer (currently 32 MiB in size.)
The events can be accessed via /proc/profile
If we can't allocate a PerformanceEventBuffer to store the profiling
events, we now fail sys$profiling_enable() and sys$perf_event()
with ENOMEM instead of carrying on with a broken buffer.
I don't dare touch the multi-threading logic and locking mechanism, so it stays
timespec for now. However, this could and should be changed to AK::Time, and I
bet it will simplify the "increment_time_since_boot()" code.
This commit is very invasive, because Thread likes to take a pointer and write
to it. This means that translating between timespec/timeval/Time would have been
more difficult than just changing everything that hands a raw pointer to Thread,
in bulk.
fuzz-syscalls found a bunch of unaligned accesses into struct sigaction
via this syscall. This patch fixes that issue by porting the syscall
to Userspace<T> which we should have done anyway. :^)
Fixes#5500.
This was another vestige from a long time ago, when exiting a thread
would mutate global data structures that were only protected by the
interrupt flag.
This was necessary in the past when crash handling would modify
various global things, but all that stuff is long gone so we can
simplify crashes by leaving the interrupt flag alone.
Make more of the kernel compile in 64-bit mode, and make some things
pointer-size-agnostic (by using FlatPtr.)
There's a lot of work to do here before the kernel will even compile.