I am not sure why 096cecb95e disabled the stack protector and sanitizers
for all files, but this is not necessary.
Only the pre_init code needs to run without them, as that code runs
identity mapped.
This is for some reason needed for riscv64 clang, as otherwise the
kernel.map file would grow too big to fit in its section inside the
kernel image.
None of our other architectures have temporary locals in their
kernel.map.
We initialize the MMU by first setting up the page tables for the
kernel image and the initial kernel stack.
Then we jump to a identity mapped page which makes the newly created
kernel root page table active by setting `satp` and then jumps to
`init`.
This device will be used by userspace to read mouse packets from all
mouse devices that are attached to the machine.
This change is a preparation before we can enable seamless hotplug
capabilities in WindowServer for mouse devices, without any major change
on the userspace side.
This commit adds minimal support for compiler-instrumentation based
memory access sanitization.
Currently we only support detection of kmalloc redzone accesses, and
kmalloc use-after-free accesses.
Support for inline checks (for improved performance), and for stack
use-after-return and use-after-return detection is left for future PRs.
This adds a simple EHCI driver that currently only interrogates the
device and checks if all ports are addressable via associated legacy
controllers (companion controllers), and warns if this is not the case.
This also adds a lot of the other data structures needed for actually
driving the controller, but these are currently not hooked up to
anything.
To test this run with `SERENITY_EXTRA_QEMU_ARGS="--device usb-ehci"`
or the q35 machine type
This commit adds all necessary includes, so all functions are properly
declared.
PCI.cpp is moved to PCI/Initializer.cpp, as that matches the header
path.
When writing to /sys/kernel/request_panic it will do a kernel panic.
Trying to truncate the node will result in kernel panic with a slightly
different message.
This is necessary for being able to use the qemu `-kernel` option.
The QEMU virt machine uses OpenSBI's FW_DYNAMIC feature to pass
the kernel entry address, which is the virtual entry point address
specified in the kernel ELF. If we instead `objcopy` the kernel into a
raw binary, OpenSBI will jump to the physical kernel load address, which
is what we want it to do.
There's no need to have separate syscall for this kind of functionality,
as we can just have a device node in /dev, called "beep", that allows
writing tone generation packets to emulate the same behavior.
In addition to that, we remove LibC sysbeep function, as this function
was never being used by any C program nor it was standardized in any
way.
Instead, we move the userspace implementation to LibCore.
A bit old but a relatively uncomplicated device capable of outputting
1920x1080 video with 32-bit color. Tested with a Voodoo 3 3000 16MB
PCI card. Resolution switching from DisplaySettings also works.
If the requested mode contains timing information, it is used directly.
Otherwise, display timing values are selected from the EDID. First the
detailed timings are checked, and then standard and established
timings for which there is a matching DMT mode. The driver does not
(yet) read the actual EDID, so the generic EDID in DisplayConnector now
includes a set of common display modes to make this work.
The driver should also be compatible with the Voodoo Banshee, 4 and 5
but I don't have these cards to test this with. The PCI IDs of these
cards are included as a commented line in case someone wants to give it
a try.
About half of the Processor code is common across architectures, so
let's share it with a templated base class. Also, other code that can be
shared in some ways, like FPUState and TrapFrame functions, is adjusted
here. Functions which cannot be shared trivially (without internal
refactoring) are left alone for now.
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
The VirtIO specification defines many types of devices with different
purposes, and it also defines 3 possible transport mediums where devices
could be connected to the host machine.
We only care about the PCIe transport, but this commit puts the actual
foundations for supporting the lean MMIO transport too in the future.
To ensure things are kept abstracted but still functional, the VirtIO
transport code is responsible for what is deemed as related to an actual
transport type - allocation of interrupt handlers and tinkering with low
level transport-related registers, etc.
The name for this directory is a bit awkward. Also, the distinction of
constant information is not really valuable as I thought it would be, so
let's bring that information back into the /sys/kernel directory.
The name "variables" is a bit awkward and what the directory entries are
really about is kernel configuration so let's make it clear with the new
name.
This is a minimal set of changes to allow `serenity.sh build riscv64` to
successfully generate the build environment and start building. This
includes some, but not all, assembly stubs that will be needed later on;
they are currently empty.
While LLD and mold support RELR "packed" relocations on all
architectures, the BFD linker currently only implements them on x86-64
and POWER.
This fixes two issues:
- The Kernel had it enabled even for AArch64 + GCC, which led to the
following being printed: `warning: -z pack-relative-relocs ignored`.
- The userland always had it disabled, even in the supported AArch64 +
Clang/mold scenarios.
Two non-functional changes:
- Remove pointless `-latomic` flag. It was specified via
`add_compile_options`, which only affects compilation and not linking,
so the library was never actually linked into the kernel. In fact, we
do not even build `libatomic` for our toolchain.
- Do not disable `-Wnonnull`. The warning-causing code was fixed at some
point.
This commit also removes `-mstrict-align` from the userland. Our target
AArch64 hardware natively supports unaligned accesses without a
significant performance penalty. Allowing the compiler to insert
unaligned accesses into aligned-as-written code allows for some
performance optimizations in fact. We keep this option turned on in the
kernel to preserve correctness for MMIO, as that might be sensitive to
alignment.
For a long time, our shutdown procedure has basically been:
- Acquire big process lock.
- Switch framebuffer to Kernel debug console.
- Sync and lock all file systems so that disk caches are flushed and
files are in a good state.
- Use firmware and architecture-specific functionality to perform
hardware shutdown.
This naive and simple shutdown procedure has multiple issues:
- No processes are terminated properly, meaning they cannot perform more
complex cleanup work. If they were in the middle of I/O, for instance,
only the data that already reached the Kernel is written to disk, and
data corruption due to unfinished writes can therefore still occur.
- No file systems are unmounted, meaning that any important unmount work
will never happen. This is important for e.g. Ext2, which has
facilites for detecting improper unmounts (see superblock's s_state
variable) and therefore requires a proper unmount to be performed.
This was also the starting point for this PR, since I wanted to
introduce basic Ext2 file system checking and unmounting.
- No hardware is properly shut down beyond what the system firmware does
on its own.
- Shutdown is performed within the write() call that asked the Kernel to
change its power state. If the shutdown procedure takes longer (i.e.
when it's done properly), this blocks the process causing the shutdown
and prevents any potentially-useful interactions between Kernel and
userland during shutdown.
In essence, current shutdown is a glorified system crash with minimal
file system cleanliness guarantees.
Therefore, this commit is the first step in improving our shutdown
procedure. The new shutdown flow is now as follows:
- From the write() call to the power state SysFS node, a new task is
started, the Power State Switch Task. Its only purpose is to change
the operating system's power state. This task takes over shutdown and
reboot duties, although reboot is not modified in this commit.
- The Power State Switch Task assumes that userland has performed all
shutdown duties it can perform on its own. In particular, it assumes
that all kinds of clean process shutdown have been done, and remaining
processes can be hard-killed without consequence. This is an important
separation of concerns: While this commit does not modify userland, in
the future SystemServer will be responsible for performing proper
shutdown of user processes, including timeouts for stubborn processes
etc.
- As mentioned above, the task hard-kills remaining user processes.
- The task hard-kills all Kernel processes except itself and the
Finalizer Task. Since Kernel processes can delay their own shutdown
indefinitely if they want to, they have plenty opportunity to perform
proper shutdown if necessary. This may become a problem with
non-cooperative Kernel tasks, but as seen two commits earlier, for now
all tasks will cooperate within a few seconds.
- The task waits for the Finalizer Task to clean up all processes.
- The task hard-kills and finalizes the Finalizer Task itself, meaning
that it now is the only remaining process in the system.
- The task syncs and locks all file systems, and then unmounts them. Due
to an unknown refcount bug we currently cannot unmount the root file
system; therefore the task is able to abort the clean unmount if
necessary.
- The task performs platform-dependent hardware shutdown as before.
This commit has multiple remaining issues (or exposed existing ones)
which will need to be addressed in the future but are out of scope for
now:
- Unmounting the root filesystem is impossible due to remaining
references to the inodes /home and /home/anon. I investigated this
very heavily and could not find whoever is holding the last two
references.
- Userland cannot perform proper cleanup, since the Kernel's power state
variable is accessed directly by tools instead of a proper userland
shutdown procedure directed by SystemServer.
The recently introduced Firmware/PowerState procedures are removed
again, since all of the architecture-independent code can live in the
power state switch task. The architecture-specific code is kept,
however.
Once we move to a more proper shutdown procedure, processes other than
the finalizer task must be able to perform cleanup and finalization
duties, not only because the finalizer task itself needs to be cleaned
up by someone. This global variable, mirroring the early boot flags,
allows a future shutdown process to perform cleanup on its own.
Note that while this *could* be considered a weakening in security, the
attack surface is minimal and the results are not dramatic. To exploit
this, an attacker would have to gain a Kernel write primitive to this
global variable (bypassing KASLR among other things) and then gain some
way of calling the relevant functions, all of this only to destroy some
other running process. The same effect can be achieved with LPE which
can often be gained with significantly simpler userspace exploits (e.g.
of setuid binaries).
The driver would crash if it was unable to find an output route, and
subsequently the destruction of controller did not invoke
`GenericInterruptHandler::will_be_destroyed()` because on the level of
`AudioController`, that method is unavailable.
By decoupling the interrupt handling from the controller, we get a new
refcounted class that correctly cleans up after itself :^)
This is a preparation before we can create a usable mechanism to use
filesystem-specific mount flags.
To keep some compatibility with userland code, LibC and LibCore mount
functions are kept being usable, but now instead of doing an "atomic"
syscall, they do multiple syscalls to perform the complete procedure of
mounting a filesystem.
The FileBackedFileSystem IntrusiveList in the VFS code is now changed to
be protected by a Mutex, because when we mount a new filesystem, we need
to check if a filesystem is already created for a given source_fd so we
do a scan for that OpenFileDescription in that list. If we fail to find
an already-created filesystem we create a new one and register it in the
list if we successfully mounted it. We use a Mutex because we might need
to initiate disk access during the filesystem creation, which will take
other mutexes in other parts of the kernel, therefore making it not
possible to take a spinlock while doing this.
Instead of using ifdefs to use the correct platform-specific methods, we
can just use the same pattern we use for the microseconds_delay function
which has specific implementations for each Arch CPU subdirectory.
When linking a kernel image, the actual correct and platform-specific
power-state changing methods will be called in Firmware/PowerState.cpp
file.
Since https://reviews.llvm.org/D131441, libc++ must be included before
LibC. As clang includes libc++ as one of the system includes, LibC
must be included after those, and the only correct way to do that is
to install LibC's headers into the sysroot.
Targets that don't link with LibC yet require its headers for one
reason or another must add install_libc_headers as a dependency to
ensure that the correct headers have been (re)installed into the
sysroot.
LibC/stddef.h has been dropped since the built-in stddef.h receives
a higher include priority.
In addition, string.h and wchar.h must
define __CORRECT_ISO_CPP_STRING_H_PROTO and
_LIBCPP_WCHAR_H_HAS_CONST_OVERLOADS respectively in order to tell
libc++ to not try to define methods implemented by LibC.
To ensure actual PS2 code is not tied to the i8042 code, we make them
separated in the following ways:
- PS2KeyboardDevice and PS2MouseDevice classes are no longer inheriting
from the IRQHandler class. Instead we have specific IRQHandler derived
class for the i8042 controller implementation, which is used to ensure
that we don't end up mixing PS2 code with low-level interrupt handling
functionality. In the future this means that we could add a driver for
other PS2 controllers that might have only one interrupt handler but
multiple PS2 devices are attached, therefore, making it easier to put
the right propagation flow from the controller driver all the way to
the HID core code.
- A simple abstraction layer is added between the PS2 command set which
devices could use and the actual implementation low-level commands.
This means that the code in PS2MouseDevice and PS2KeyboardDevice
classes is no longer tied to i8042 implementation-specific commands,
so now these objects could send PS2 commands to their PS2 controller
and get a PS2Response which abstracts the given response too.
All code that is related to PC BIOS should not be in the Kernel/Firmware
directory as this directory is for abstracted and platform-agnostic code
like ACPI (and device tree parsing in the future).
This fixes a problem with the aarch64 architecure, as these machines
don't have any PC-BIOS in them so actually trying to access these memory
locations (EBDA, BIOS ROM) does not make any sense, as they're specific
to x86 machines only.
This code is very x86-specific, because Intel introduced the actual
MultiProcessor specification back in 1993, qouted here as a proof:
"The MP specification covers PC/AT-compatible MP platform designs based
on Intel processor architectures and Advanced Programmable Interrupt
Controller (APIC) architectures"
Most of the ACPI static parsing methods (methods that can be called
without initializing a full AML parser) are not tied to any specific
platform or CPU architecture.
The only method that is platform-specific is the one that finds the RSDP
structure. Thus, each CPU architecture/platform needs to implement it.
This means that now aarch64 can implement its own method to find the
ACPI RSDP structure, which would be hooked into the rest of the ACPI
code elegantly, but for now I just added a FIXME and that method returns
empty value of Optional<PhysicalAddress>.
Like the HID, Audio and Storage subsystem, the Graphics subsystem (which
handles GPUs technically) exposes unix device files (typically in /dev).
To ensure consistency across the repository, move all related files to a
new directory under Kernel/Devices called "GPU".
Also remove the redundant "GPU" word from the VirtIO driver directory,
and the word "Graphics" from GraphicsManagement.{h,cpp} filenames.
The implemented cloning mechanism should be sound:
- If a PartitionTable is passed a File with
ShouldCloseFileDescriptor::Yes, then it will keep it alive until the
PartitionTable is destroyed.
- If a PartitionTable is passed a File with
ShouldCloseFileDescriptor::No, then the caller has to ensure that the
file descriptor remains alive.
If the caller is EBRPartitionTable, the same consideration holds.
If the caller is PartitionEditor::PartitionModel, this is satisfied by
keeping an OwnPtr<Core::File> around which is the originally opened
file.
Therefore, we never leak any fds, and never access a Core::File or fd
after destroying it.