This patch merges the profiling functionality in the kernel with the
performance events mechanism. A profiler sample is now just another
perf event, rather than a dedicated thing.
Since perf events were already per-process, this now makes profiling
per-process as well.
Processes with perf events would already write out a perfcore.PID file
to the current directory on death, but since we may want to profile
a process and then let it continue running, recorded perf events can
now be accessed at any time via /proc/PID/perf_events.
This patch also adds information about process memory regions to the
perfcore JSON format. This removes the need to supply a core dump to
the Profiler app for symbolication, and so the "profiler coredump"
mechanism is removed entirely.
There's still a hard limit of 4MB worth of perf events per process,
so this is by no means a perfect final design, but it's a nice step
forward for both simplicity and stability.
Fixes#4848Fixes#4849
When loading non position-independent programs, we now take care not to
load the dynamic loader at an address that collides with the location
the main program wants to load at.
Fixes#4847.
This will enable us to take the desired load address of non-position
independent programs into account when randomizing the load address
of the dynamic loader.
Trying to pass these onto the Terminal while handling an IRQ is a recipe
for disaster. Use Processor::deferred_call_queue to create an ad-hoc
"second half" of the interrupt handler.
Fixes#4889
SystemServer now creates the /tmp/coredump and /tmp/profiler_coredumps
directories at startup, ensuring that they are owned by root, and with
basic 0755 permissions.
The kernel will also now refuse to put core dumps in a directory that
doesn't fulfill the following criteria:
- Owned by 0:0
- Directory with sticky bit not set
- 0755 permissions
Fixes#4435Fixes#4850
We were not handling sticky parents properly in sys$rmdir(). Child
directories of a sticky parent should not be rmdir'able by just anyone.
Only the owner and root.
Fixes#4875.
Before this change, truncating an Ext2FS inode to a larger size than it
was before would give you uninitialized on-disk data.
Fix this by zeroing out all the new space when doing an inode resize.
This is pretty naively implemented via Inode::write_bytes() and there's
lots of room for cleverness here in the future.
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
The modifications in this commit were automatically made using the
following command:
find . -name '*.cpp' -exec sed -i -E 's/dbg\(\) << ("[^"{]*");/dbgln\(\1\);/' {} \;
We can now test a _very_ basic transaction via `do_debug_transfer()`.
This function merely attaches some TDs to the LSCTRL queue head
and points some input and output buffers. We then sense an interrupt
with USBSTS value of 1, meaning Interrupt On Completion
(of the transaction). At this point, the input buffer is filled with
some data.
According the USB spec/UHCI datasheet (as well as the Linux and
BSD source code), if we receive an IRQ and USBSTS is 0, then
the IRQ does not belong to us and we should immediately jump
out of the handler.
We can now read/write to the two root ports exposed to the
UHCI controller, and detect when a device is plugged in or
out via a kernel process that constantly scans the port
for any changes. This is very basic, but is a bit of fun to see
the kernel detecting hardware on the fly :^)
Implemented both Queue Heads and Transfer Descriptors. These
are required to actually perform USB transactions. The UHCI
driver sets up a pool of these that can be allocated when we
need them. It seems some drivers have these statically
allocated, so it might be worth looking into that, but
for now, the simple way seems to be to allocate them on
the fly as we need them, and then release them.
It seems that not setting the framelist address register
was causing the entire system to lock up as it generated an insane
interrupt storm in the IRQ handler for the UHCI controller.
We now allocate a 4KiB aligned page via
`MemoryManager::allocate_supervisor_physical_page()` and set every
value to 1. In effect, this creates a framelist with each entry
being a "TERMINATE" entry in which the controller stalls until its'
1mS time slice is up.
Some more registers have also been set for consistency, though it
seems like this don't need to be set explicitly in software.
This patch adds sys$abort() which immediately crashes the process with
SIGABRT. This makes assertion backtraces a lot nicer by removing all
the gunk that otherwise happens between __assertion_failed() and
actually crashing from the SIGABRT.
When ProcFS could no longer allocate KBuffer objects to serve calls to
read, it would just return 0, indicating EOF. This then triggered
parsing errors because code assumed it read the file.
Because read isn't supposed to return ENOMEM, change ProcFS to populate
the file data upon file open or seek to the beginning. This also means
that calls to open can now return ENOMEM if needed. This allows the
caller to either be able to successfully open the file and read it, or
fail to open it in the first place.
Commit a3a9016701 removed the PT_INTERP header
from Loader.so which cleaned up some kernel code in execve. Unfortunately
it prevents Loader.so from being run as an executable
There is a window between dropping the last reference and removing
a ProcFSInode from the lookup map. So, when looking up we need to
check if that Inode is being destructed.
If a TLB flush request is broadcast to other processors and the addresses
to flush are user mode addresses, we can ignore such a request on the
target processor if the page directory currently in use doesn't match
the addresses to be flushed. We still need to broadcast to all processors
in that case because the other processors may switch to that same page
directory at any time.
If we remap pages (e.g. lazy allocation) inside a VMObject that is
shared among more than one region, broadcast it to any other region
that may be mapping the same page.
Lazily committed shared memory was not working in situations where one
process would write to the memory and another would only read from it.
Since the reading process would never cause a write fault in the shared
region, we'd never notice that the writing process had added real
physical pages to the VMObject. This happened because the lazily
committed pages were marked "present" in the page table.
This patch solves the issue by always allocating shared memory up front
and not trying to be clever about it.
Before this change, we would sometimes map a region into the address
space with !is_shared(), and then moments later call set_shared(true).
I found this very confusing while debugging, so this patch makes us pass
the initial shared flag to the Region constructor, ensuring that it's in
the correct state by the time we first map the region.
Insert stack canaries to find stack corruptions in the kernel.
It looks like this was enabled in the past (842716a) but appears to have been
lost during the CMake conversion.
The `-fstack-protector-strong` variant was chosen because it catches more issues
than `-fstack-protector`, but doesn't have substantial performance impact like
`-fstack-protector-all`.
This fixes a kernel crash that occured when calling ptrace with PT_PEEK
on non paged-in memory.
The crash occurred because we were holding the scheduler lock while
trying to read from the disk's block device, which we do not allow.
Fixes#4740
We need to allocate all pages for the profiler right away so that
we don't trigger page faults in the timer interrupt handler to
allocate them.
Fixes#4734
We need to free the regions before reverting the paging scope to the
original one when rolling back changes due to an error. This fixes
silent memory corruption.
By designating a committed page pool we can guarantee to have physical
pages available for lazy allocation in mappings. However, when forking
we will overcommit. The assumption is that worst-case it's better for
the fork to die due to insufficient physical memory on COW access than
the parent that created the region. If a fork wants to ensure that all
memory is available (trigger a commit) then it can use madvise.
This also means that fork now can gracefully fail if we don't have
enough physical pages available.