Either we mount from a loop device or other source, the user might want
to obfuscate the given source for security reasons, so this option will
ensure this will happen.
If passed during a mount, the source will be hidden when reading from
the /sys/kernel/df node.
This device is a block device that allows a user to effectively treat an
Inode as a block device.
The static construction method is given an OpenFileDescription reference
but validates that:
- The description has a valid custody (so it's not some arbitrary file).
Failing this requirement will yield EINVAL.
- The description custody points to an Inode which is a regular file, as
we only support (seekable) regular files. Failing this requirement
will yield ENOTSUP.
LoopDevice can be used to mount a regular file on the filesystem like
other supported types of (physical) block devices.
Such operation is almost equivalent to writing on an Inode, so lock the
Inode m_inode_lock exclusively.
All FileSystem Inode implementations then override a new method called
truncate_locked which should implement the actual truncating.
SysFS, ProcFS and DevPtsFS were all sending filetype 0 when traversing
their directories, but it is actually very easy to send proper filetypes
in these filesystems.
This patch binds all RAM backed filesystems to use only one enum for
their internal filetype, to simplify the implementation and allow
sharing of code.
Please note that the Plan9FS case is currently not solved as I am not
familiar with this filesystem and its constructs.
The ProcFS mostly keeps track of the filetype, and a fix was needed for
the /proc root directory - all processes exhibit a directory inside it
which makes it very easy to hardcode the directory filetype for them.
There's also the `self` symlink inode which is now exposed as DT_LNK.
As for SysFS, we could leverage the fact everything inherits from the
SysFSComponent class, so we could have a virtual const method to return
the proper filetype.
Most of the files in SysFS are "regular" files though, so the base class
has a non-pure virtual method.
Lastly, the DevPtsFS simply hardcodes '.' and '..' as directory file
type, and everything else is hardcoded to send the character device file
type, as this filesystem is only exposing character pts device files.
This commit adds minimal support for compiler-instrumentation based
memory access sanitization.
Currently we only support detection of kmalloc redzone accesses, and
kmalloc use-after-free accesses.
Support for inline checks (for improved performance), and for stack
use-after-return and use-after-return detection is left for future PRs.
When writing to /sys/kernel/request_panic it will do a kernel panic.
Trying to truncate the node will result in kernel panic with a slightly
different message.
The name for this directory is a bit awkward. Also, the distinction of
constant information is not really valuable as I thought it would be, so
let's bring that information back into the /sys/kernel directory.
The name "variables" is a bit awkward and what the directory entries are
really about is kernel configuration so let's make it clear with the new
name.
Instead, use the FixedCharBuffer class to ensure we always use a static
buffer storage for these names. This ensures that if a Process or a
Thread were created, there's a guarantee that setting a new name will
never fail, as only copying of strings should be done to that static
storage.
The limits which are set are 32 characters for processes' names and 64
characters for thread names - this is because threads' names could be
more verbose than processes' names.
Since this is the block size that file system drivers *should* set,
let's name it the logical block size, just like most file systems such
as ext2 already do anyways.
For a long time, our shutdown procedure has basically been:
- Acquire big process lock.
- Switch framebuffer to Kernel debug console.
- Sync and lock all file systems so that disk caches are flushed and
files are in a good state.
- Use firmware and architecture-specific functionality to perform
hardware shutdown.
This naive and simple shutdown procedure has multiple issues:
- No processes are terminated properly, meaning they cannot perform more
complex cleanup work. If they were in the middle of I/O, for instance,
only the data that already reached the Kernel is written to disk, and
data corruption due to unfinished writes can therefore still occur.
- No file systems are unmounted, meaning that any important unmount work
will never happen. This is important for e.g. Ext2, which has
facilites for detecting improper unmounts (see superblock's s_state
variable) and therefore requires a proper unmount to be performed.
This was also the starting point for this PR, since I wanted to
introduce basic Ext2 file system checking and unmounting.
- No hardware is properly shut down beyond what the system firmware does
on its own.
- Shutdown is performed within the write() call that asked the Kernel to
change its power state. If the shutdown procedure takes longer (i.e.
when it's done properly), this blocks the process causing the shutdown
and prevents any potentially-useful interactions between Kernel and
userland during shutdown.
In essence, current shutdown is a glorified system crash with minimal
file system cleanliness guarantees.
Therefore, this commit is the first step in improving our shutdown
procedure. The new shutdown flow is now as follows:
- From the write() call to the power state SysFS node, a new task is
started, the Power State Switch Task. Its only purpose is to change
the operating system's power state. This task takes over shutdown and
reboot duties, although reboot is not modified in this commit.
- The Power State Switch Task assumes that userland has performed all
shutdown duties it can perform on its own. In particular, it assumes
that all kinds of clean process shutdown have been done, and remaining
processes can be hard-killed without consequence. This is an important
separation of concerns: While this commit does not modify userland, in
the future SystemServer will be responsible for performing proper
shutdown of user processes, including timeouts for stubborn processes
etc.
- As mentioned above, the task hard-kills remaining user processes.
- The task hard-kills all Kernel processes except itself and the
Finalizer Task. Since Kernel processes can delay their own shutdown
indefinitely if they want to, they have plenty opportunity to perform
proper shutdown if necessary. This may become a problem with
non-cooperative Kernel tasks, but as seen two commits earlier, for now
all tasks will cooperate within a few seconds.
- The task waits for the Finalizer Task to clean up all processes.
- The task hard-kills and finalizes the Finalizer Task itself, meaning
that it now is the only remaining process in the system.
- The task syncs and locks all file systems, and then unmounts them. Due
to an unknown refcount bug we currently cannot unmount the root file
system; therefore the task is able to abort the clean unmount if
necessary.
- The task performs platform-dependent hardware shutdown as before.
This commit has multiple remaining issues (or exposed existing ones)
which will need to be addressed in the future but are out of scope for
now:
- Unmounting the root filesystem is impossible due to remaining
references to the inodes /home and /home/anon. I investigated this
very heavily and could not find whoever is holding the last two
references.
- Userland cannot perform proper cleanup, since the Kernel's power state
variable is accessed directly by tools instead of a proper userland
shutdown procedure directed by SystemServer.
The recently introduced Firmware/PowerState procedures are removed
again, since all of the architecture-independent code can live in the
power state switch task. The architecture-specific code is kept,
however.
This is a preparation before we can create a usable mechanism to use
filesystem-specific mount flags.
To keep some compatibility with userland code, LibC and LibCore mount
functions are kept being usable, but now instead of doing an "atomic"
syscall, they do multiple syscalls to perform the complete procedure of
mounting a filesystem.
The FileBackedFileSystem IntrusiveList in the VFS code is now changed to
be protected by a Mutex, because when we mount a new filesystem, we need
to check if a filesystem is already created for a given source_fd so we
do a scan for that OpenFileDescription in that list. If we fail to find
an already-created filesystem we create a new one and register it in the
list if we successfully mounted it. We use a Mutex because we might need
to initiate disk access during the filesystem creation, which will take
other mutexes in other parts of the kernel, therefore making it not
possible to take a spinlock while doing this.
Instead of using ifdefs to use the correct platform-specific methods, we
can just use the same pattern we use for the microseconds_delay function
which has specific implementations for each Arch CPU subdirectory.
When linking a kernel image, the actual correct and platform-specific
power-state changing methods will be called in Firmware/PowerState.cpp
file.
All code that is related to PC BIOS should not be in the Kernel/Firmware
directory as this directory is for abstracted and platform-agnostic code
like ACPI (and device tree parsing in the future).
This fixes a problem with the aarch64 architecure, as these machines
don't have any PC-BIOS in them so actually trying to access these memory
locations (EBDA, BIOS ROM) does not make any sense, as they're specific
to x86 machines only.
Previously, reads would only be successful for offset 0. For this
reason, the maximum size that could be correctly read from the PCI
expansion ROM SysFS node was limited to the block size, and
subsequent blocks would fail. This commit fixes the computation of
the number of bytes to read.
Like the HID, Audio and Storage subsystem, the Graphics subsystem (which
handles GPUs technically) exposes unix device files (typically in /dev).
To ensure consistency across the repository, move all related files to a
new directory under Kernel/Devices called "GPU".
Also remove the redundant "GPU" word from the VirtIO driver directory,
and the word "Graphics" from GraphicsManagement.{h,cpp} filenames.
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.
Also, move the panic and assertions handling code to that directory.
The Storage subsystem, like the Audio and HID subsystems, exposes Unix
device files (for example, in the /dev directory). To ensure consistency
across the repository, we should make the Storage subsystem to reside in
the Kernel/Devices directory like the two other mentioned subsystems.
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.
This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
The Raspberry Pi hardware doesn't support a proper software-initiated
shutdown, so this instead uses the watchdog to reboot to a special
partition which the firmware interprets as an immediate halt on
shutdown. When running under Qemu, this causes the emulator to exit.
`process.fds()` is protected by a Mutex, which causes issues when we try
to acquire it while holding a Spinlock. Since nothing seems to use this
value, let's just remove it entirely for now.
To do this we also need to get rid of LockRefPtrs in the USB code as
well.
Most of the SysFS nodes are statically generated during boot and are not
mutated afterwards.
The same goes for general device code - once we generate the appropriate
SysFS nodes, we almost never mutate the node pointers afterwards, making
locking unnecessary.
These were easy to pick-up as these pointers are assigned during the
construction point and are never changed afterwards.
This small change to these pointers will ensure that our code will not
accidentally assign these pointers with a new object which is always a
kind of bug we will want to prevent.
There was only one permanent storage location for these: as a member
in the Mount class.
That member is never modified after Mount initialization, so we don't
need to worry about races there.
This is done with 2 major steps:
1. Remove JailManagement singleton and use a structure that resembles
what we have with the Process object. This is required later for the
second step in this commit, but on its own, is a major change that
removes this clunky singleton that had no real usage by itself.
2. Use IntrusiveLists to keep references to Process objects in the same
Jail so it will be much more straightforward to iterate on this kind
of objects when needed. Previously we locked the entire Process list
and we did a simple pointer comparison to check if the checked
Process we iterate on is in the same Jail or not, which required
taking multiple Spinlocks in a very clumsy and heavyweight way.
This subdirectory is meant to hold all constant data related to the
kernel. This means that this data is never meant to updated and is
relevant from system boot to system shutdown.
Move the inodes of "load_base", "cmdline" and "system_mode" to that
directory. All nodes under this new subdirectory are generated during
boot, and therefore don't require calling kmalloc each time we need to
read them. Locking is also not necessary, because these nodes and their
data are completely static once being generated.
This replaces manually grabbing the thread's main lock.
This lets us remove the `get_thread_name` and `set_thread_name` syscalls
from the big lock. :^)