When we hit a COW fault and discover than no other process is sharing
the physical page, we simply remap it r/w and save ourselves the
trouble. When this happens, we can also give back (uncommit) one of our
shared committed COW pages, since we won't be needing it.
We had this optimization before, but I mistakenly removed it in
50472fd69f since I had misunderstood
it to be the reason for a panic.
We currently overcommit for COW when forking a process and cloning its
memory regions. Both the parent and child process share a set of.
committed COW pages.
If there's COW sharing across more than two processeses within a lineage
(e.g parent, child & grandchild), it's possible to exhaust these pages.
When the shared set is emptied, the next COW fault in each process must
detach from the shared set and fall back to on demand allocation.
This patch makes sure that we detach from the shared set once we
discover it to be empty (during COW fault handling). This fixes an issue
where we'd try to allocate from an exhausted shared set while building
GNU binutils inside SerenityOS.
It was doing a bunch of things it didn't need to do. I think we had
misunderstood the base class as having copied m_lock in its copy
constructor but it's actually default initialized.
We had issues with committed physical pages getting miscounted in some
situations, and instead of figuring out what was going wrong and making
sure all the commits had matching uncommits, this patch makes the
problem go away by adding an RAII class to manage this instead. :^)
MemoryManager::commit_user_physical_pages() now returns an (optional)
CommittedPhysicalPageSet. You can then allocate pages from the page set
by calling take_one() on it. Any unallocated pages are uncommitted upon
destruction of the page set.
When cloning an AnonymousVMObject, committed COW pages are shared
between the parent and child object. Whicever object COW's first will
take the shared committed page, and if the other object ends up doing
a COW as well, it will notice that the page is no longer shared by
two objects and simple remap it as read/write.
When a child is COW'ed again, while still having shared committed
pages with its own parent, the grandchild object will now join in the
sharing pool with its parent and grandparent. This means that the first
2 of 3 objects that COW will draw from the shared committed pages, and
3rd will remap read/write.
Previously, we would "fork" the shared committed pages when cloning,
which could lead to a situation where the grandparent held on to 1 of
the 3 needed shared committed pages. If both the child and grandchild
COW'ed, they wouldn't have enough pages, and since the grandparent
maintained an extra +1 ref count on the page, it wasn't possible to
to remap read/write.
When we get a COW fault and discover that whoever we were COW'ing
together with has either COW'ed that page on their end (or they have
unmapped/exited) we simplify life for ourselves by clearing the COW
bit and keeping the page we already have. (No need to COW if the page
is not shared!)
The act of doing this does not return a committed page to the pool.
In fact, that committed page we had reserved for this purpose was used
up (allocated) by our COW buddy when they COW'ed the page.
This fixes a kernel panic when running TestLibCMkTemp. :^)
We don't need an entirely separate VMObject subclass to influence the
location of the physical pages.
Instead, we simply allocate enough physically contiguous memory first,
and then pass it to the AnonymousVMObject constructor that takes a span
of physical pages.
If a purgeable VM object is in the "volatile" state when we're asked
to make a COW clone of it, make life simpler by simply "purging"
the cloned object right away.
This effectively means that a fork()'ed child process will discover
its purgeable+volatile regions to be empty if/when it tries making
them non-volatile.
This patch changes the semantics of purgeable memory.
- AnonymousVMObject now has a "purgeable" flag. It can only be set when
constructing the object. (Previously, all anonymous memory was
effectively purgeable.)
- AnonymousVMObject now has a "volatile" flag. It covers the entire
range of physical pages. (Previously, we tracked ranges of volatile
pages, effectively making it a page-level concept.)
- Non-volatile objects maintain a physical page reservation via the
committed pages mechanism, to ensure full coverage for page faults.
- When an object is made volatile, it relinquishes any unused committed
pages immediately. If later made non-volatile again, we then attempt
to make a new committed pages reservation. If this fails, we return
ENOMEM to userspace.
mmap() now creates purgeable objects if passed the MAP_PURGEABLE option
together with MAP_ANONYMOUS. anon_create() memory is always purgeable.
This patch greatly simplifies VMObject locking by doing two things:
1. Giving VMObject an IntrusiveList of all its mapping Region objects.
2. Removing VMObject::m_paging_lock in favor of VMObject::m_lock
Before (1), VMObject::for_each_region() was forced to acquire the
global MM lock (since it worked by walking MemoryManager's list of
all regions and checking for regions that pointed to itself.)
With each VMObject having its own list of Regions, VMObject's own
m_lock is all we need.
Before (2), page fault handlers used a separate mutex for preventing
overlapping work. This design required multiple temporary unlocks
and was generally extremely hard to reason about.
Instead, page fault handlers now use VMObject's own m_lock as well.
Since we're taking from the committed set of pages, there should never
be a reason for this call to fail.
Also add a Badge to disallow taking committed pages from anywhere but
the Region class.
We don't need to have a dedicated API for creating a VMObject with a
single page, the multi-page API option works in all cases.
Also make the API take a Span<NonnullRefPtr<PhysicalPage>> instead of
a NonnullRefPtrVector<PhysicalPage>.
Instead of each PhysicalPage knowing whether it comes from the
supervisor pages or from the user pages, we can just check in both
sets when freeing a page.
It's just a handful of pointer range checks, nothing expensive.
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
Propagate allocation failure of m_shared_committed_cow_pages,
and uncommit previously committed COW pages on failure.
This method needs a closer look in terms of error handling, as we
will eventually need to rollback all changes on allocation failure.
Alternatively we could allocate the anonymous object much earlier
and only initialize it once the other steps have succeeded.
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.
This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).
Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.
There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
AnonymousVMObject::create_with_physical_page(s) can't be NonnullRefPtr
as it allocates internally. Fixing the API then surfaced an issue in
ScatterGatherList, where the code was attempting to create an
AnonymousVMObject in the constructor which will not be observable
during OOM.
Fix all of these issues and start propagating errors at the callers
of the AnonymousVMObject and ScatterGatherList APis.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
Instead of keeping AnonymousVMObject::m_cow_map in an OwnPtr<Bitmap>,
just make the Bitmap a regular value member. This increases the size
of the VMObject by 8 bytes, but removes some of the kmalloc/kfree spam
incurred by sys$fork().
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
This was done with the help of several scripts, I dump them here to
easily find them later:
awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in
for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
do
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
done
# Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
The modifications in this commit were automatically made using the
following command:
find . -name '*.cpp' -exec sed -i -E 's/dbg\(\) << ("[^"{]*");/dbgln\(\1\);/' {} \;