We were incorrectly using sizeof(PhysicalPageEntry) for some address
calculations instead of sizeof(PageTableEntry).
It still worked correctly because they happen to be the same size.
We now keep all the PhysicalZones on one of two intrusive lists within
the PhysicalRegion.
The "usable" list contains all zones that can be allocated from,
and the "full" list contains all zones with no free pages.
Instead of creating a PhysicalRegion and then expanding it over and
over as we traverse the memory map on boot, we now compute the final
size of the contiguous physical range up front, and *then* create a
PhysicalRegion object.
Nobody was using this API to request anythign about `PAGE_SIZE`
alignment, so let's get rid of it for now. We can reimplement it if
we end up needing it.
Also note that it wasn't actually used anywhere.
The previous allocator was very naive and kept the state of all pages
in one big bitmap. When allocating, we had to scan through the bitmap
until we found an unset bit.
This patch introduces a new binary buddy allocator that manages the
physical memory pages.
Each PhysicalRegion is divided into zones (PhysicalZone) of 16MB each.
Any extra pages at the end of physical RAM that don't fit into a 16MB
zone are turned into 15 or fewer 1MB zones.
Each zone starts out with one full-sized block, which is then
recursively subdivided into halves upon allocation, until a block of
the request size can be returned.
There are more opportunities for improvement here: the way zone objects
are allocated and stored is non-optimal. Same goes for the allocation
of buddy block state bitmaps.
Threads that don't make syscalls still need to be killed, and we can
do that at any time we want so long the thread is in user mode and
not somehow blocked (e.g. page fault).
This reverts commit 3c3a1726df.
We cannot blindly kill threads just because they're not executing in a
system call. Being blocked (including in a page fault) needs proper
unblocking and potentially kernel stack cleanup before we can mark a
thread as Dying.
Fixes#8691
This enables the Lock class to block a thread even while the thread is
working on a BlockCondition. A thread can still only be either blocked
by a Lock or a BlockCondition.
This also establishes a linked list of threads that are blocked by a
Lock and unblocking directly unlocks threads and wakes them directly.
It's possible that a timer may have been queued to be executed by
the timer irq handler, but if we're in a critical section on the
same processor and are trying to cancel that timer, we would spin
forever waiting for it to be executed.
This re-arranges the order of how things are initialized so that we
try to initialize process and thread management earlier. This is
neccessary because a lot of the code uses the Lock class, which really
needs to have a running scheduler in place so that we can properly
preempt.
This also enables us to potentially initialize some things in parallel.
Instead of each PhysicalPage knowing whether it comes from the
supervisor pages or from the user pages, we can just check in both
sets when freeing a page.
It's just a handful of pointer range checks, nothing expensive.
There appears to be no reason why the process registration needs
to happen under the space spin lock. As the first thread is not started
yet it should be completely uncontested, but it's still bad practice.
If no other thread is ready to be run we don't need to switch to the
idle thread and wait for the next timer interrupt. We can just give
the thread another timeslice and keep it running.
We need some overflow checks due to the implementation of TmpFS.
When size_t is 32 bits and off_t is 64 bits, we might overflow our
KBuffer max size and confuse the KBuffer set_size code, causing a VERIFY
failure. Make sure that resulting offset + size will fit in a size_t.
Another constraint, we make sure that the resulting offset + size will
be less than half of the maximum value of a size_t, because we double
the KBuffer size each time we resize it.
We had an inconsistency in valid user addresses. is_user_range() was
checking against the kernel base address, but previous changes caused
the maximum valid user addressable range to be 32 MiB below that.
This patch stops mmap(MAP_FIXED) of a range between these two bounds
from panic-ing the kernel in RangeAllocator::allocate_specific.