Introduce the 'debug-kernel' script to allow developers to
quickly attach a debugger to the QEMU debug remote. The
setting (-s) is already enabled by ./run today when using
QEMU for virtualisation.
If the system is running under QEMU, the debugger
will break in when the script is run. If you add
the -S option to QEMU it will wait for the debugger
to connect before booting the kernel. This allows
you to debug the init/boot process.
Personally I use cgdb instead of gdb, so I opted
to make the debugger used by the script customizable
via an environment variable.
This change also adds -g3 to the kernel build so that
rich debug symbols are available in the kernel binary.
Since a chroot is in many ways similar to a separate root mount, we can also
apply mount flags to it as if it was an actual mount. These flags will apply
whenever the chrooted process accesses its root directory, but not when other
processes access this same directory for the outside. Since it's common to
chdir("/") immediately after chrooting (so that files accessed through the
current directory inherit the same mount flags), this effectively allows one to
apply additional limitations to a process confined inside a chroot.
To this effect, sys$chroot() gains a mount_flags argument (exposed as
chroot_with_mount_flags() in userspace) which can be set to all the same values
as the flags argument for sys$mount(), and additionally to -1 to keep the flags
set for that file system. Note that passing 0 as mount_flags will unset any
flags that may have been set for the file system, not keep them.
Instead of looking up device metadata and then looking up a device by that
metadata explicitly, just use VFS::open(). This also means that attempting to
mount a device residing on a MS_NODEV file system will properly fail.
There was a time window between releasing Lock::m_lock and calling into
the lock's WaitQueue where someone else could take m_lock and bring two
threads into a deadlock situation.
Fix this issue by holding Lock::m_lock until interrupts are disabled by
either Thread::wait_on() or WaitQueue::wake_one().
It was quite easy to put the system into a heavy churn state by doing
e.g "cat /dev/zero".
It was then basically impossible to kill the "cat" process, even with
"kill -9", since signals are only delivered in two conditions:
a) The target thread is blocked in the kernel
b) The target thread is running in userspace
However, since "cat /dev/zero" command spends most of its time actively
running in the kernel, not blocked, the signal dispatch code just kept
postponing actually handling the signal indefinitely.
To fix this, we now check before returning from a syscall if there are
any pending unmasked signals, and if so, we take a dramatic pause by
blocking the current thread, knowing it will immediately be unblocked
by signal dispatch anyway. :^)
When the current thread is backtracing itself, we now start walking the
stack from the current EBP register value, instead of the TSS one.
Now SystemMonitor always appears to be running Thread::backtrace() when
sampled, which makes perfect sense. :^)
You can now bind-mount files and directories. This essentially exposes an
existing part of the file system in another place, and can be used as an
alternative to symlinks or hardlinks.
Here's an example of doing this:
# mkdir /tmp/foo
# mount /home/anon/myfile.txt /tmp/foo -o bind
# cat /tmp/foo
This is anon's file.
We now support these mount flags:
* MS_NODEV: disallow opening any devices from this file system
* MS_NOEXEC: disallow executing any executables from this file system
* MS_NOSUID: ignore set-user-id bits on executables from this file system
The fourth flag, MS_BIND, is defined, but currently ignored.
O_EXEC is mentioned by POSIX, so let's have it. Currently, it is only used
inside the kernel to ensure the process has the right permissions when opening
an executable.
At the moment, the actual flags are ignored, but we correctly propagate them all
the way from the original mount() syscall to each custody that resides on the
mounted FS.
No need to pass around RefPtr<>s and NonnullRefPtr<>s and no need to
heap-allocate them.
Also remove VFS::mount(NonnullRefPtr<FS>&&, StringView path) - it has been
unused for a long time.
While I was updating syscalls to stop passing null-terminated strings,
I added some helpful struct types:
- StringArgument { const char*; size_t; }
- ImmutableBuffer<Data, Size> { const Data*; Size; }
- MutableBuffer<Data, Size> { Data*; Size; }
The Process class has some convenience functions for validating and
optionally extracting the contents from these structs:
- get_syscall_path_argument(StringArgument)
- validate_and_copy_string_from_user(StringArgument)
- validate(ImmutableBuffer)
- validate(MutableBuffer)
There's still so much code around this and I'm wondering if we should
generate most of it instead. Possible nice little project.
In order to preserve the absolute path of the process root, we save the
custody used by chroot() before stripping it to become the new "/".
There's probably a better way to do this.
The chroot() syscall now allows the superuser to isolate a process into
a specific subtree of the filesystem. This is not strictly permanent,
as it is also possible for a superuser to break *out* of a chroot, but
it is a useful mechanism for isolating unprivileged processes.
The VFS now uses the current process's root_directory() as the root for
path resolution purposes. The root directory is stored as an uncached
Custody in the Process object.