Lazily committed shared memory was not working in situations where one
process would write to the memory and another would only read from it.
Since the reading process would never cause a write fault in the shared
region, we'd never notice that the writing process had added real
physical pages to the VMObject. This happened because the lazily
committed pages were marked "present" in the page table.
This patch solves the issue by always allocating shared memory up front
and not trying to be clever about it.
By designating a committed page pool we can guarantee to have physical
pages available for lazy allocation in mappings. However, when forking
we will overcommit. The assumption is that worst-case it's better for
the fork to die due to insufficient physical memory on COW access than
the parent that created the region. If a fork wants to ensure that all
memory is available (trigger a commit) then it can use madvise.
This also means that fork now can gracefully fail if we don't have
enough physical pages available.
This adds the ability for a Region to define volatile/nonvolatile
areas within mapped memory using madvise(). This also means that
memory purging takes into account all views of the PurgeableVMObject
and only purges memory that is not needed by all of them. When calling
madvise() to change an area to nonvolatile memory, return whether
memory from that area was purged. At that time also try to remap
all memory that is requested to be nonvolatile, and if insufficient
pages are available notify the caller of that fact.
We need to not only add a record for a reference, but we need
to copy the reference count on fork as well, because the code
in the fork assumes that it has the same amount of references,
still.
Also, once all references are dropped when a process is disowned,
delete the shared buffer.
Fixes#4076
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
This compiles, and contains exactly the same bugs as before.
The regex 'FIXME: PID/' should reveal all markers that I left behind, including:
- Incomplete conversion
- Issues or things that look fishy
- Actual bugs that will go wrong during runtime
By making the Process class RefCounted we don't really need
ProcessInspectionHandle anymore. This also fixes some race
conditions where a Process may be deleted while still being
used by ProcFS.
Also make sure to acquire the Process' lock when accessing
regions.
Last but not least, there's no reason why a thread can't be
scheduled while being inspected, though in practice it won't
happen anyway because the scheduler lock is held at the same
time.
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.
It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)
This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)