1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-19 23:45:10 +00:00
Commit graph

74 commits

Author SHA1 Message Date
Tom
1d621ab172 Kernel: Some futex improvements
This adds support for FUTEX_WAKE_OP, FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET,
FUTEX_REQUEUE, and FUTEX_CMP_REQUEUE, as well well as global and private
futex and absolute/relative timeouts against the appropriate clock. This
also changes the implementation so that kernel resources are only used when
a thread is blocked on a futex.

Global futexes are implemented as offsets in VMObjects, so that different
processes can share a futex against the same VMObject despite potentially
being mapped at different virtual addresses.
2021-01-17 20:30:31 +01:00
Andreas Kling
43109f9614 Kernel: Remove unused syscall sys$minherit()
This is no longer used. We can bring it back the day we need it.
2021-01-16 14:52:04 +01:00
Tom
c630669304 Kernel: If a VMObject is shared, broadcast page remappings
If we remap pages (e.g. lazy allocation) inside a VMObject that is
shared among more than one region, broadcast it to any other region
that may be mapping the same page.
2021-01-02 20:56:35 +01:00
Andreas Kling
5dae85afe7 Kernel: Pass "shared" flag to Region constructor
Before this change, we would sometimes map a region into the address
space with !is_shared(), and then moments later call set_shared(true).

I found this very confusing while debugging, so this patch makes us pass
the initial shared flag to the Region constructor, ensuring that it's in
the correct state by the time we first map the region.
2021-01-02 16:57:31 +01:00
Tom
2f429bd2d5 Kernel: Pass new region owner to Region::clone 2021-01-01 23:43:44 +01:00
Tom
476f17b3f1 Kernel: Merge PurgeableVMObject into AnonymousVMObject
This implements memory commitments and lazy-allocation of committed
memory.
2021-01-01 23:43:44 +01:00
Tom
b2a52f6208 Kernel: Implement lazy committed page allocation
By designating a committed page pool we can guarantee to have physical
pages available for lazy allocation in mappings. However, when forking
we will overcommit. The assumption is that worst-case it's better for
the fork to die due to insufficient physical memory on COW access than
the parent that created the region. If a fork wants to ensure that all
memory is available (trigger a commit) then it can use madvise.

This also means that fork now can gracefully fail if we don't have
enough physical pages available.
2021-01-01 23:43:44 +01:00
Tom
c3451899bc Kernel: Add MAP_NORESERVE support to mmap
Rather than lazily committing regions by default, we now commit
the entire region unless MAP_NORESERVE is specified.

This solves random crashes in low-memory situations where e.g. the
malloc heap allocated memory, but using pages that haven't been
used before triggers a crash when no more physical memory is available.

Use this flag to create large regions without actually committing
the backing memory. madvise() can be used to commit arbitrary areas
of such regions after creating them.
2021-01-01 23:43:44 +01:00
Tom
bc5d6992a4 Kernel: Memory purging improvements
This adds the ability for a Region to define volatile/nonvolatile
areas within mapped memory using madvise(). This also means that
memory purging takes into account all views of the PurgeableVMObject
and only purges memory that is not needed by all of them. When calling
madvise() to change an area to nonvolatile memory, return whether
memory from that area was purged. At that time also try to remap
all memory that is requested to be nonvolatile, and if insufficient
pages are available notify the caller of that fact.
2021-01-01 23:43:44 +01:00
Andreas Kling
30dbe9c78a Kernel+LibC: Add a very limited sys$mremap() implementation
This syscall can currently only remap a shared file-backed mapping into
a private file-backed mapping.
2020-12-29 02:20:43 +01:00
Ben Wiederhake
64cc3f51d0 Meta+Kernel: Make clang-format-10 clean 2020-09-25 21:18:17 +02:00
Tom
c8d9f1b9c9 Kernel: Make copy_to/from_user safe and remove unnecessary checks
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.

So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.

To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.

Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
2020-09-13 21:19:15 +02:00
Tom
bf268a0185 Kernel: Handle committing pages in regions more gracefully
Sometimes a physical underlying page may be there, but we may be
unable to allocate a page table that may be needed to map it. Bubble
up such mapping errors so that they can be handled more appropriately.
2020-09-02 00:35:56 +02:00
Andreas Kling
949aef4aef Kernel: Move syscall implementations out of Process.cpp
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.

It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)

This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)
2020-07-30 23:40:57 +02:00
Tom
06d50f64b0 Kernel: Aggregate TLB flush requests for Regions for SMP
Rather than sending one TLB flush request for each page,
aggregate them so that we're not spamming the other
processors with FlushTLB IPIs.
2020-07-06 22:39:06 +02:00
Tom
d98edb3171 Kernel: List all CPUs in /proc/cpuinfo 2020-07-01 12:07:01 +02:00
Tom
841364b609 Kernel: Add mechanism to identity map the lowest 2MB 2020-06-04 18:15:23 +02:00
Andreas Kling
6fe83b0ac4 Kernel: Crash the current process on OOM (instead of panicking kernel)
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.

In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.

Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
2020-05-06 22:28:23 +02:00
Andreas Kling
9c856811b2 Kernel: Add Region helpers for accessing underlying physical pages
Since a Region is basically a view into a potentially larger VMObject,
it was always necessary to include the Region starting offset when
accessing its underlying physical pages.

Until now, you had to do that manually, but this patch adds a simple
Region::physical_page() for read-only access and a physical_page_slot()
when you want a mutable reference to the RefPtr<PhysicalPage> itself.

A lot of code is simplified by making use of this.
2020-04-28 17:05:14 +02:00
Itamar
b306ac9b2b ptrace: Add PT_POKE
PT_POKE writes a single word to the tracee's address space.

Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.

- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
2020-04-13 00:53:22 +02:00
Andreas Kling
c19b56dc99 Kernel+LibC: Add minherit() and MAP_INHERIT_ZERO
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
2020-04-12 20:22:26 +02:00
Andreas Kling
88b334135b Kernel: Remove some Region construction helpers
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
2020-03-01 11:23:10 +01:00
Andreas Kling
aa1e209845 Kernel: Remove some unnecessary indirection in InodeFile::mmap()
InodeFile now directly calls Process::allocate_region_with_vmobject()
instead of taking an awkward detour via a special Region constructor.
2020-02-28 20:29:14 +01:00
Andreas Kling
30a8991dbf Kernel: Make Region weakable and use WeakPtr<Region> instead of Region*
This turns use-after-free bugs into null pointer dereferences instead.
2020-02-24 13:32:45 +01:00
Andreas Kling
f17c377a0c Kernel: Use bitfields in Region
This makes Region 4 bytes smaller and we can use bitfield initializers
since they are allowed in C++20. :^)
2020-02-19 12:03:11 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Liav A
d2b41010c5 Kernel: Change Region allocation helpers
We now can create a cacheable Region, so when map() is called, if a
Region is cacheable then all the virtual memory space being allocated
to it will be marked as not cache disabled.

In addition to that, OS components can create a Region that will be
mapped to a specific physical address by using the appropriate helper
method.
2020-01-14 15:38:58 +01:00
Andreas Kling
197e73ee31 Kernel+LibELF: Enable SMAP protection during non-syscall exec()
When loading a new executable, we now map the ELF image in kernel-only
memory and parse it there. Then we use copy_to_user() when initializing
writable regions with data from the executable.

Note that the exec() syscall still disables SMAP protection and will
require additional work. This patch only affects kernel-originated
process spawns.
2020-01-10 10:57:06 +01:00
Andreas Kling
ea1911b561 Kernel: Share code between Region::map() and Region::remap_page()
These were doing mostly the same things, so let's just share the code.
2020-01-01 19:32:55 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
7a0088c4d2 Kernel: Clean up Region access bit setters a little 2019-12-25 02:58:03 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
1d4d6f16b2 Kernel: Add a specific-page variant of Region::commit() 2019-12-18 22:43:32 +01:00
Andreas Kling
931e4b7f5e Kernel+SystemMonitor: Prevent userspace access to process ELF image
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)

Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.

This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15 20:11:57 +01:00
Andreas Kling
3fbc50a350 Kernel+SystemMonitor: Expose the number of set CoW bits in each Region
This number tells us how many more pages in a given region will trigger
a CoW fault if written to.
2019-12-15 16:53:00 +01:00
Andreas Kling
f41ae755ec Kernel: Crash on memory access in non-readable regions
This patch makes it possible to make memory regions non-readable.
This is enforced using the "present" bit in the page tables.
A process that hits an not-present page fault in a non-readable
region will be crashed.
2019-12-02 19:18:52 +01:00
Andreas Kling
7dc9c90f83 Kernel: Fix bug where mprotect() would ignore setting PROT_WRITE
A typo in Region::set_writable() caused us to update the readable flag
rather than the writable flag.
2019-12-02 18:15:36 +01:00
Andreas Kling
3dc87be891 Kernel: Mark mmap()-created regions with a special bit
Then only allow regions with that bit to be manipulated via munmap()
and mprotect(). This prevents messing with non-mmap()ed regions in
a process's address space (stacks, shared buffers, ...)
2019-11-24 12:26:21 +01:00
Andreas Kling
794758df3a Kernel: Implement some basic stack pointer validation
VM regions can now be marked as stack regions, which is then validated
on syscall, and on page fault.

If a thread is caught with its stack pointer pointing into anything
that's *not* a Region with its stack bit set, we'll crash the whole
process with SIGSTKFLT.

Userspace must now allocate custom stacks by using mmap() with the new
MAP_STACK flag. This mechanism was first introduced in OpenBSD, and now
we have it too, yay! :^)
2019-11-17 12:15:43 +01:00
Andreas Kling
d67c6a92db Kernel: Move page fault handling from MemoryManager to Region
After the page fault handler has found the region in which the fault
occurred, do the rest of the work in the region itself.

This patch also makes all fault types consistently crash the process
if a new page is needed but we're all out of pages.
2019-11-04 00:47:03 +01:00
Andreas Kling
0e8f1d7cb6 Kernel: Don't expose a region's page directory to the outside world
Now that region manages its own mapping/unmapping, there's no need for
the outside world to be able to grab at its page directory.
2019-11-04 00:26:00 +01:00
Andreas Kling
6ed9cc4717 Kernel: Remove Region API's for setting/unsetting the page directory
This is done implicitly by mapping or unmapping the region.
2019-11-04 00:24:20 +01:00
Andreas Kling
e3dda4e87b Kernel: Fix weird Region constructor that took nullable RefPtr<Inode>
It's never valid to construct a Region with a null Inode pointer using
this constructor, so just take a NonnullRefPtr<Inode> instead.
2019-11-04 00:21:08 +01:00
Andreas Kling
4bf1a72d21 Kernel: Teach Region how to remap itself
Now remapping (i.e flushing kernel metadata to the CPU page tables)
is done by simply calling Region::remap().
2019-11-03 21:11:08 +01:00
Andreas Kling
3dce0f23f4 Kernel: Regions should be mapped into a PageDirectory, not a Process
This patch changes the parameter to Region::map() to be a PageDirectory
since that matches how we think about the memory model:

Regions are views onto VMObjects, and are mapped into PageDirectories.
Each Process has a PageDirectory. The kernel also has a PageDirectory.
2019-11-03 21:11:08 +01:00
Andreas Kling
2cfc43c982 Kernel: Move region map/unmap operations into the Region class
The more Region can take care of itself, the better.
2019-11-03 21:11:08 +01:00
Andreas Kling
fe455c5ac4 Kernel: Move page remapping into Region::remap_page(index)
Let Region deal with this, instead of everyone calling MemoryManager.
2019-11-03 15:32:11 +01:00
Andreas Kling
d481ae95b5 Kernel: Defer creation of Region CoW bitmaps until they're needed
Instead of allocating and populating a Copy-on-Write bitmap for each
Region up front, wait until we actually clone the Region for sharing
with another process.

In most cases, we never need any CoW bits and we save ourselves a lot
of kmalloc() memory and time.
2019-10-01 19:58:41 +02:00