This has been a FIXME for a long time. We now apply the provided
read/write permissions to the constructed FileDescription when opening
a File object via File::open().
We were running without the sticky bit and mode 777, which meant that
the /tmp directory was world-writable *without* protection.
With this fixed, it's no longer possible for everyone to steal root's
files in /tmp.
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().
This is because until we call bind(), there is no filesystem inode
for the socket yet.
If we're creating something that should have a different owner than the
current process's UID/GID, we need to plumb that all the way through
VFS down to the FS functions.
Inode::size() may try to take a lock, so we can't be calling it with
interrupts disabled.
This fixes a kernel hang when trying to execute a binary in a TmpFS.
We now have these API's in <Kernel/Random.h>:
- get_fast_random_bytes(u8* buffer, size_t buffer_size)
- get_good_random_bytes(u8* buffer, size_t buffer_size)
- get_fast_random<T>()
- get_good_random<T>()
Internally they both use x86 RDRAND if available, otherwise they fall
back to the same LCG we had in RandomDevice all along.
The main purpose of this patch is to give kernel code a way to better
express its needs for random data.
Randomness is something that will require a lot more work, but this is
hopefully a step in the right direction.
To accomodate file creation, path resolution optionally returns the
last valid parent directory seen while traversing the path.
Clients will then interpret "ENOENT, but I have a parent for you" as
meaning that the file doesn't exist, but its immediate parent directory
does. The client then goes ahead and creates a new file.
In the case of "/foo/bar/baz" where there is no "/foo", it would fail
with ENOENT and "/" as the last seen parent directory, causing e.g the
open() syscall to create "/baz".
Covered by test_io.
It was previously possible to write to read-only file descriptors,
and read from write-only file descriptors.
All FileDescription objects now start out non-readable + non-writable,
and whoever is creating them has to "manually" enable reading/writing
by calling set_readable() and/or set_writable() on them.
During initialization of PCI MMIO access mechanism we ensure that we
have an allocation from the kernel virtual address space that cannot be
taken by other components in the OS.
Also, now we ensure that interrupts are disabled so mapping the region
doesn't fail.
In order to reduce overhead, map_device() will map the requested PCI
address only if it's not mapped already.
The run script has been changed so now we can boot a Q35 machine, that
supports PCI ECAM.
To ensure we will be able to load the machine, a PIIX3 IDE controller
was added to the Q35 machine configuration in the run script.
An AHCI controller was added to the i440fx machine configuration.
Let's lock down access to the kernel symbol table, since it trivializes
learning where the kernel functions are.
Of course, you can just build the same revision yourself locally and
learn the information, but we're taking one step at a time here. :^)
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
Split a region into two/three if the desired mprotect range is a strict
subset of an existing region. We can then set the access bits on a new
region that is just our desired range and add both the new
desired subregion and the leftovers back to our page tables.
We now validate the full range of userspace memory passed into syscalls
instead of just checking that the first and last byte of the memory are
in process-owned regions.
This fixes an issue where it was possible to avoid rejection of invalid
addresses that sat between two valid ones, simply by passing a valid
address and a size large enough to put the end of the range at another
valid address.
I added a little test utility that tries to provoke EFAULT in various
ways to help verify this. I'm sure we can think of more ways to test
this but it's at least a start. :^)
Thanks to mozjag for pointing out that this code was still lacking!
Incidentally this also makes backtraces work again.
Fixes#989.
The new PCI subsystem is initialized during runtime.
PCI::Initializer is supposed to be called during early boot, to
perform a few tests, and initialize the proper configuration space
access mechanism. Kernel boot parameters can be specified by a user to
determine what tests will occur, to aid debugging on problematic
machines.
After that, PCI::Initializer should be dismissed.
PCI::IOAccess is a class that is derived from PCI::Access
class and implements PCI configuration space access mechanism via x86
IO ports.
PCI::MMIOAccess is a class that is derived from PCI::Access
and implements PCI configurtaion space access mechanism via memory
access.
The new PCI subsystem also supports determination of IO/MMIO space
needed by a device by checking a given BAR.
In addition, Every device or component that use the PCI subsystem has
changed to match the last changes.
We use DMI decoding now just to determine if PCI is available.
The DMIDecoder is initialized during early boot, thus making it possible
to probe useful data about the machine.
Other purposes are not supported yet.
ACPI subsystem includes 3 types of parsers that are created during
runtime, each one capable of parsing ACPI tables at different level.
ACPIParser is the most basic parser which is essentialy a parser that
can't parse anything useful, due to a user request to disable ACPI
support in a kernel boot parameter.
ACPIStaticParser is a derived class from ACPIParser, which is able to
parse only static data (e.g. FADT, HPET, MCFG and other tables), thus
making it not able to parse AML (ACPI Machine Language) nor to support
handling of hardware events and power management. This type of parser
can be created with a kernel boot parameter.
ACPIDynamicParser is a derived class from ACPIStaticParser, which
includes all the capabilities of the latter, but *should* implement an
AML interpretation, (by building the ACPI AML namespace) and handling
power & hardware events. Currently the methods to support AML
interpretation are not implemented.
This type of parser is created automatically during runtime if the user
didn't specify a boot parameter related to ACPI initialization.
Also, adding strncmp function definition in StdLib.h, to be able to use
it in ACPIStaticParser class.
When entering the kernel from a syscall, we now insert a small bit of
stack padding after the RegisterDump. This makes kernel stacks less
deterministic across syscalls and may make some bugs harder to exploit.
Inspired by Elena Reshetova's talk on kernel stack exploitation.