This returns ENOSYS if you are running in the real kernel, and some
other result if you are running in UserspaceEmulator.
There are other ways we could check if we're inside an emulator, but
it seemed easier to just ask. :^)
Switch to using type-safe bitwise operators for the BlockFlags class,
this cleans up a lot of boilerplate casts which are necessary when the
enum is declared as `enum class`.
Increase type-safety moving the MemoryManager APIs which take a
Region::Access to actually use that type instead of a `u8`.
Eventually the actually m_access can be moved there as well, but
I hit some weird bug where it wasn't using the correct operators
in `set_access_bit(..)` even though it's declared (and tested).
Something to fix-up later.
According to the Intel manual: "After reset, all bits (except bit 0) in
XCR0 are cleared to zero; XCR0[0] is set to 1."
Sadly we can't trust this, for example VirtualBox starts with
bits 0-4 set, so let's do it ourselves.
Fixes#5653
The expression
(u8*)params.m_stack_location + stack_size
… causes UBSan to spit out the warning
KUBSAN: addition of unsigned offset to 0x00000002 overflowed to 0xb0000003
… even though there is no actual overflow happening here.
This can be reproduced by running:
$ syscall create_thread 0 [ 0 0 0 0 0xb0000001 2 ]
Technically, this is a true-positive: The C++-reference is incredibly strict
about pointer-arithmetic:
> A pointer to non-array object is treated as a pointer to the first element
> of an array with size 1. […] [A]ttempts to generate a pointer that isn't
> pointing at an element of the same array or one past the end invoke
> undefined behavior.
https://en.cppreference.com/w/cpp/language/operator_arithmetic
Frankly, this feels silly. So let's just use FlatPtr instead.
Found by fuzz-syscalls. Undocumented bug.
Note that FlatPtr is an unsigned type, so
user_esp.value() - 4
is defined even if we end up with a user_esp of 0 (this can happen for example
when params.m_stack_size = 0 and params.m_stack_location = 0). The result would
be a Kernelspace-pointer, which would then be immediately flagged by
'MM.validate_user_stack' as invalid, as intended.
Instead of declaring a reserved area from byte 0x160 to 0x400, we
change the declaration of TimerStructure array to be 32 units.
Also, a static_assert was added, to ensure that the calculation is
right.
This reverts commit af22204488.
According to the HPET specification, each theoretical comparator takes
32 bytes in the MMIO space.
Although I hardly believe that any system will implement all 32
comparators, in practice if a machine happens to have more than 3
comparators, we need to address the comparators correctly if we want to
use them.
This class is used in the AHCI code to handle a big request of
read/write to the disk. If we happen to encounter such request,
we will get the needed amount of physical pages from the
already-allocated physical pages in AHCIPort, and with that we
will create a ScatterList that will create a Region that maps
all of these pages in a contiguous virtual memory range.
Then, we could easily copy to/from this range, before and after
calling the operation on the StorageDevice as needed with
read or write operations.
The hierarchy is AHCIController, AHCIPortHandler, AHCIPort and
SATADiskDevice. Each AHCIController has at least one AHCIPortHandler.
An AHCIPortHandler is an interrupt handler that takes care of
enumeration of handled AHCI ports when an interrupt occurs. Each
AHCIPort takes care of one SATADiskDevice, and later on we can add
support for Port multiplier.
When we implement support of Message signalled interrupts, we can spawn
many AHCIPortHandlers, and allow each one of them to be responsible for
a set of AHCIPorts.
As it turns out, Dr. POSIX doesn't require that post-mmap() changes
to a file are reflected in the memory mappings. So we don't actually
have to care about the file size changing (or the contents.)
IIUC, as long as all the MAP_SHARED mappings that refer to the same
inode are in sync, we're good.
This means that VMObjects don't need resizing capabilities. I'm sure
there are ways we can take advantage of this fact.
Add Bitmap::view() and forward most of the calls to BitmapView since
the code was identical.
Bitmap is now primarily concerned with its dynamically allocated
backing store and BitmapView deals with the rest.
Mostly due to the fact that clang-format allows aligned comments via
AlignTrailingComments.
We could also use raw string literals in inline asm, which clang-format
deals with properly (and would be nicer in a lot of places).
Instead of keeping AnonymousVMObject::m_cow_map in an OwnPtr<Bitmap>,
just make the Bitmap a regular value member. This increases the size
of the VMObject by 8 bytes, but removes some of the kmalloc/kfree spam
incurred by sys$fork().
Since we know for sure that the virtual memory regions in the new
process being created are not being used on any CPU, there's no need
to do TLB flushes for every mapped page.
Dynamic Vector allocations in sys$select() were showing up in the
full-system profile and since there will never be more than FD_SETSIZE
file descriptors to worry about, we can confidently add enough inline
capacity to this Vector that it never has to kmalloc.
To compensate for the increased stack usage, reduce the size of the
FDInfo struct while we're here. :^)
The full system profiling functionality is useful for profiling the
boot performance of the system. Add a new kernel boot option to start
the system with profiling enabled. This lets you disable and view a
profile once the system is booted.
You can use it by running:
```
$ run.sh qcmd boot_prof
```
Previously all of the CommandLine parsing was spread out around the
Kernel. Instead move it all into the Kernel CommandLine class, and
expose a strongly typed API for querying the state of options.
Previously, the instruction fetch flag of the page fault handler
did not have the currect binary representation, and would always
return false. This aligns these flags.
The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.
This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.
This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
The superuser can now call sys$profiling_enable() with PID -1 to enable
profiling of all running threads in the system. The perf events are
collected in a global PerformanceEventBuffer (currently 32 MiB in size.)
The events can be accessed via /proc/profile
If we can't allocate a PerformanceEventBuffer to store the profiling
events, we now fail sys$profiling_enable() and sys$perf_event()
with ENOMEM instead of carrying on with a broken buffer.
This was the original approach before we switched to get_fast_random()
which wasn't fast enough, so we added a buffer.
Unfortunately that buffer is racy and we can actually skid past the end
of it and continue fetching "random" offsets from the adjacent memory
for a while, until we run out of kernel data segment and trip a fault.
Instead of making this even more convoluted, let's just go back to the
pleasantly simple (RDTSC & 0xff) approach. :^)
Fixes#4912.