1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-31 14:28:12 +00:00
Commit graph

2139 commits

Author SHA1 Message Date
Andreas Kling
8cc5fa5598 Kernel: Unbreak module loading (broke with NX bit changes)
Modules are now mapped fully RWX. This can definitely be improved,
but at least it unbreaks the feature for now.
2020-01-03 03:44:55 +01:00
Andreas Kling
0a1865ebc6 Kernel: read() and write() should fail with EBADF for wrong mode fd's
It was previously possible to write to read-only file descriptors,
and read from write-only file descriptors.

All FileDescription objects now start out non-readable + non-writable,
and whoever is creating them has to "manually" enable reading/writing
by calling set_readable() and/or set_writable() on them.
2020-01-03 03:29:59 +01:00
Andreas Kling
064e46e581 Kernel: Don't allow open() with (O_CREAT | O_DIRECTORY) 2020-01-03 03:16:29 +01:00
Andreas Kling
15f3abc849 Kernel: Handle O_DIRECTORY in VFS::open() instead of in each syscall
Just taking care of some FIXMEs.
2020-01-03 03:16:29 +01:00
Andreas Kling
05653a9189 Kernel: killpg() with pgrp=0 should signal every process in the group
In the same group as the calling process, that is.
2020-01-03 03:16:29 +01:00
Andreas Kling
005313df82 Kernel: kill() with signal 0 should not actually send anything
Also kill() with pid 0 should send to everyone in the same process
group as the calling process.
2020-01-03 03:16:29 +01:00
Andreas Kling
8345f51a24 Kernel: Remove unnecessary wraparound check in Process::validate_read()
This will be checked moments later by MM.validate_user_read().
2020-01-03 03:16:29 +01:00
Andreas Kling
fdde5cdf26 Kernel: Don't include the process GID in the "extra GIDs" table
Process::m_extra_gids is for supplementary GIDs only.
2020-01-02 23:45:52 +01:00
Andreas Kling
0f9800ca57 Kernel: Make the loop that marks the bottom 1MB NX a little less busy 2020-01-02 22:02:29 +01:00
Liav A
2a4a19aed8 Kernel: Fixing PCI MMIO access mechanism
During initialization of PCI MMIO access mechanism we ensure that we
have an allocation from the kernel virtual address space that cannot be
taken by other components in the OS.
Also, now we ensure that interrupts are disabled so mapping the region
doesn't fail.
In order to reduce overhead, map_device() will map the requested PCI
address only if it's not mapped already.

The run script has been changed so now we can boot a Q35 machine, that
supports PCI ECAM.
To ensure we will be able to load the machine, a PIIX3 IDE controller
was added to the Q35 machine configuration in the run script.
An AHCI controller was added to the i440fx machine configuration.
2020-01-02 21:45:04 +01:00
joshua stein
5e430e4eb4 Build: add support for building on OpenBSD
This requires gcc8 from ports to build the Toolchain.
2020-01-02 21:03:53 +01:00
Andreas Kling
32ec1e5aed Kernel: Mask kernel addresses in backtraces and profiles
Addresses outside the userspace virtual range will now show up as
0xdeadc0de in backtraces and profiles generated by unprivileged users.
2020-01-02 20:51:31 +01:00
Andreas Kling
8eb20bdfa2 Kernel: Move kernel symbols to /res/kernel.map and make it root-only
Let's lock down access to the kernel symbol table, since it trivializes
learning where the kernel functions are.

Of course, you can just build the same revision yourself locally and
learn the information, but we're taking one step at a time here. :^)
2020-01-02 20:51:31 +01:00
Andreas Kling
9fe316c2d8 Kernel: Add some missing error checks to the setpgid() syscall 2020-01-02 19:40:04 +01:00
Andreas Kling
285130cc55 Kernel: Remove debug spam about marking threads for death 2020-01-02 13:45:22 +01:00
Andreas Kling
7f843ef3b2 Kernel: Make the purge() syscall superuser-only
I don't think we need to give unprivileged users access to what is
essentially a kernel testing mechanism.
2020-01-02 13:39:49 +01:00
Andreas Kling
c01f766fb2 Kernel: writev() should fail with EINVAL if total length > INT32_MAX 2020-01-02 13:01:41 +01:00
Andreas Kling
7f04334664 Kernel: Remove broken implementation of Unix SHM
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
2020-01-02 12:44:21 +01:00
Andrew Kaster
bc50a10cc9 Kernel: sys$mprotect protects sub-regions as well as whole ones
Split a region into two/three if the desired mprotect range is a strict
subset of an existing region. We can then set the access bits on a new
region that is just our desired range and add both the new
desired subregion and the leftovers back to our page tables.
2020-01-02 12:27:13 +01:00
Andreas Kling
3f7de2713e Kernel: Make mknod() respect the process umask
Otherwise the /bin/mknod command would create world-writable inodes
by default (when run by superuser) which you probably don't want.
2020-01-02 02:40:43 +01:00
Andreas Kling
c7eb3ff1b3 Kernel: mknod() should not allow unprivileged users to create devices
In fact, unless you are superuser, you may only create a regular file,
a named pipe, or a local domain socket. Anything else should EPERM.
2020-01-02 02:36:12 +01:00
Andreas Kling
3dcec260ed Kernel: Validate the full range of user memory passed to syscalls
We now validate the full range of userspace memory passed into syscalls
instead of just checking that the first and last byte of the memory are
in process-owned regions.

This fixes an issue where it was possible to avoid rejection of invalid
addresses that sat between two valid ones, simply by passing a valid
address and a size large enough to put the end of the range at another
valid address.

I added a little test utility that tries to provoke EFAULT in various
ways to help verify this. I'm sure we can think of more ways to test
this but it's at least a start. :^)

Thanks to mozjag for pointing out that this code was still lacking!

Incidentally this also makes backtraces work again.

Fixes #989.
2020-01-02 02:17:12 +01:00
Liav A
e5ffa960d7 Kernel: Create support for PCI ECAM
The new PCI subsystem is initialized during runtime.
PCI::Initializer is supposed to be called during early boot, to
perform a few tests, and initialize the proper configuration space
access mechanism. Kernel boot parameters can be specified by a user to
determine what tests will occur, to aid debugging on problematic
machines.
After that, PCI::Initializer should be dismissed.

PCI::IOAccess is a class that is derived from PCI::Access
class and implements PCI configuration space access mechanism via x86
IO ports.
PCI::MMIOAccess is a class that is derived from PCI::Access
and implements PCI configurtaion space access mechanism via memory
access.

The new PCI subsystem also supports determination of IO/MMIO space
needed by a device by checking a given BAR.
In addition, Every device or component that use the PCI subsystem has
changed to match the last changes.
2020-01-02 00:50:09 +01:00
Liav A
d85874be4b Kernel: Create a basic SMBIOS Decoder
We use DMI decoding now just to determine if PCI is available.
The DMIDecoder is initialized during early boot, thus making it possible
to probe useful data about the machine.

Other purposes are not supported yet.
2020-01-02 00:50:09 +01:00
Liav A
1e1a6a57ed Kernel: Introduce the ACPI subsystem
ACPI subsystem includes 3 types of parsers that are created during
runtime, each one capable of parsing ACPI tables at different level.

ACPIParser is the most basic parser which is essentialy a parser that
can't parse anything useful, due to a user request to disable ACPI
support in a kernel boot parameter.

ACPIStaticParser is a derived class from ACPIParser, which is able to
parse only static data (e.g. FADT, HPET, MCFG and other tables), thus
making it not able to parse AML (ACPI Machine Language) nor to support
handling of hardware events and power management. This type of parser
can be created with a kernel boot parameter.

ACPIDynamicParser is a derived class from ACPIStaticParser, which
includes all the capabilities of the latter, but *should* implement an
AML interpretation, (by building the ACPI AML namespace) and handling
power & hardware events. Currently the methods to support AML
interpretation are not implemented.
This type of parser is created automatically during runtime if the user
didn't specify a boot parameter related to ACPI initialization.

Also, adding strncmp function definition in StdLib.h, to be able to use
it in ACPIStaticParser class.
2020-01-02 00:50:09 +01:00
joshua stein
5fa0291a05 Build: fix building Kernel/TestModule object 2020-01-01 23:33:03 +01:00
Andreas Kling
1d94b5eb04 Kernel: Add a random offset to kernel stacks upon syscall entry
When entering the kernel from a syscall, we now insert a small bit of
stack padding after the RegisterDump. This makes kernel stacks less
deterministic across syscalls and may make some bugs harder to exploit.

Inspired by Elena Reshetova's talk on kernel stack exploitation.
2020-01-01 23:21:24 +01:00
Andreas Kling
ea1911b561 Kernel: Share code between Region::map() and Region::remap_page()
These were doing mostly the same things, so let's just share the code.
2020-01-01 19:32:55 +01:00
Andreas Kling
38f93ef13b Kernel: Disable x86 RDTSC instruction in userspace
It's still possible to read the TSC via the read_tsc() syscall, but we
will now clear some of the bottom bits for unprivileged users.
2020-01-01 18:22:20 +01:00
Andrew Kaster
b6590b7f83 Demos: Add a dynamic linking demo to show off dlfcn methods
The LinkDemo program calls dlopen/dlsym/dlclose to try and load
a dyanmic library from /usr/lib. It read a global variable and
calls a global function (extern "C" of course :) ).

There a few hacks left in the LinkLib dynamic library, however.
In order to get the linker to stop complaining, we have to use
-nostartfiles -ffreestanding otherwise it will link crt0.o to our
shared object, which is definitely not right as the _init function
for a main program (that calls main) is not suitable for our lib
2020-01-01 17:48:41 +01:00
Andreas Kling
f598bbbb1d Kernel: Prevent executing I/O instructions in userspace
All threads were running with iomapbase=0 in their TSS, which the CPU
interprets as "there's an I/O permission bitmap starting at offset 0
into my TSS".

Because of that, any bits that were 1 inside the TSS would allow the
thread to execute I/O instructions on the port with that bit index.

Fix this by always setting the iomapbase to sizeof(TSS32), and also
setting the TSS descriptor's limit to sizeof(TSS32), effectively making
the I/O permissions bitmap zero-length.

This should make it no longer possible to do I/O from userspace. :^)
2020-01-01 17:31:41 +01:00
Andreas Kling
37329c2009 Kernel: Fix typo in Descriptor::set_limit()
x86 descriptor limits are 20 bytes, not 24 bytes. This was already
a 4-bit wide bitfield, so no damage done, but let's be correct.
2020-01-01 17:21:43 +01:00
Andreas Kling
fd740829d1 Kernel: Switch to eagerly restoring x86 FPU state on context switch
Lazy FPU restore is well known to be vulnerable to timing attacks,
and eager restore is a lot simpler anyway, so let's just do it eagerly.
2020-01-01 16:54:21 +01:00
Andreas Kling
9c0836ce97 Kernel: Enable x86 UMIP (User Mode Instruction Prevention) if supported
This prevents code running outside of kernel mode from using the
following instructions:

* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

There's no need for userspace to be able to use these instructions so
let's just disable them to prevent information leakage.
2020-01-01 13:21:15 +01:00
Andreas Kling
5aeaab601e Kernel: Move CPU feature detection to Arch/x86/CPU.{cpp.h}
We now refuse to boot on machines that don't support PAE since all
of our paging code depends on it.

Also let's only enable SSE and PGE support if the CPU advertises it.
2020-01-01 12:57:00 +01:00
Andreas Kling
8602fa5b49 Kernel: Enable x86 SMEP (Supervisor Mode Execution Protection)
This prevents the kernel from jumping to code in userspace memory.
2020-01-01 01:59:52 +01:00
Andreas Kling
14cdd3fdc1 Kernel: Make module_load() and module_unload() be superuser-only
These should just fail with EPERM if you're not the superuser.
2020-01-01 00:46:08 +01:00
Tibor Nagy
624116a8b1 Kernel: Implement AltGr key support 2019-12-31 19:31:42 +01:00
Andreas Kling
36f1de3c89 Kernel: Pointer range validation should fail on wraparound
Let's reject address ranges that wrap around the 2^32 mark.
2019-12-31 18:23:17 +01:00
Andreas Kling
903b159856 Kernel: Write address validation was only checking end of write range
Thanks to yyyyyyy for finding the bug! :^)
2019-12-31 18:18:54 +01:00
Andreas Kling
d8ef13a426 ProcFS: Supervisor-only inodes should be owned by UID 0, GID 0 2019-12-31 13:22:43 +01:00
Andreas Kling
c9ec415e2f Kernel: Always reject never-userspace addresses before checking regions
At the moment, addresses below 8MB and above 3GB are never accessible
to userspace, so just reject them without even looking at the current
process's memory regions.
2019-12-31 03:45:54 +01:00
Andreas Kling
3f254bfbc8 Kernel+ping: Only allow superuser to create SOCK_RAW sockets
/bin/ping is now setuid-root, and will drop privileges immediately
after opening a raw socket.
2019-12-31 01:42:34 +01:00
Andreas Kling
9af054af9e ProcFS: Reduce the amount of info accessible to non-superusers
This patch hardens /proc a bit by making many things only accessible
to UID 0, and also disallowing access to /proc/PID/ for anyone other
than the UID of that process (and superuser, obviously.)
2019-12-31 01:32:27 +01:00
Andreas Kling
54d182f553 Kernel: Remove some unnecessary leaking of kernel pointers into dmesg
There's a lot more of this and we need to stop printing kernel pointers
anywhere but the debug console.
2019-12-31 01:22:00 +01:00
Andreas Kling
66d5ebafa6 Kernel: Let's also not consider kernel regions to be valid user stacks
This one is less obviously exploitable than the previous one, but still
a bug nonetheless.
2019-12-31 00:28:14 +01:00
Andreas Kling
0fc24fe256 Kernel: User pointer validation should reject kernel-only addresses
We were happily allowing syscalls with pointers into kernel-only
regions (virtual address >= 0xc0000000).

This patch fixes that by only considering user regions in the current
process, and also double-checking the Region::is_user_accessible() flag
before approving an access.

Thanks to Fire30 for finding the bug! :^)
2019-12-31 00:24:35 +01:00
Andreas Kling
a69734bf2e Kernel: Also add a process boosting mechanism
Let's also have set_process_boost() for giving all threads in a process
the same boost.
2019-12-30 20:10:00 +01:00
Andreas Kling
610f3ad12f Kernel: Add a basic thread boosting mechanism
This patch introduces a syscall:

    int set_thread_boost(int tid, int amount)

You can use this to add a permanent boost value to the effective thread
priority of any thread with your UID (or any thread in the system if
you are the superuser.)

This is quite crude, but opens up some interesting opportunities. :^)
2019-12-30 19:23:13 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00