Instead of keeping a separate Vector<Event> for signposts, let them live
in the main event stream. For fast iteration, we instead keep a cache of
the signpost event indices.
Also check for the most common event type (sample) first instead of
leaving it as the fallback. This avoids a lot of string comparisons
while parsing profiles.
Making userspace provide a global string ID was silly, and made the API
extremely difficult to use correctly in a global profiling context.
Instead, simply make the kernel do the string ID allocation for us.
This also allows us to convert the string storage to a Vector in the
kernel (and an array in the JSON profile data.)
The first perf_event argument to a PERF_EVENT_SIGNPOST is now
interpreted as a string ID (in the profile strings set.)
This allows us to generate signposts with custom strings. :^)
Signposts generated by perf_event(PERF_EVENT_SIGNPOST) now show up in
profile timelines, and if you hover them you get a tooltip with the two
arguments passed with the event.
Most of the models were just calling did_update anyway, which is
pointless since it can be unified to the base Model class. Instead, code
calling update() will now call invalidate(), which functions identically
and is more obvious in what it does.
Additionally, a default implementation is provided, which removes the
need to add empty implementations of update() for each model subclass.
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
This enables further work on implementing KASLR by adding relocation
support to the pre-kernel and updating the kernel to be less dependent
on specific virtual memory layouts.
This removes all the hard-coded kernel base addresses from userspace
tools.
One downside for this is that e.g. Profiler no longer uses a different
color for kernel symbols when run as a non-root user.
The LexicalPath instance methods dirname(), basename(), title() and
extension() will be changed to return StringView const& in a further
commit. Due to this, users creating temporary LexicalPath objects just
to call one of those getters will recieve a StringView const& pointing
to a possible freed buffer.
To avoid this, static methods for those APIs have been added, which will
return a String by value to avoid those problems. All cases where
temporary LexicalPath objects have been used as described above haven
been changed to use the static APIs.
Previously Profiler was using timestamps to distinguish processes.
However it is possible that separate processes with the same PID exist
at the exact same timestamp (e.g. for execve). This changes Profiler
to use unique serial numbers for each event instead.
The profiler tried to be clever when handling process_exit events by
subtracting one from the timestamp. This was supposed to ensure that
events after a process' death would be attributed to the new process
in case the old process used execve(). However, if there was another
event (e.g. a CPU sample) at the exact same time the process_exit
event was recorded the profile would fail to load because we
didn't find the process anymore.
This changes introduces a new problem where samples would be attributed
to the incorrect process if a CPU sample for the old process, a
process_exit as well as a process_create event plus another CPU sample
event for the new process happened at the exact same time. I think
it's a reasonable compromise though.
I've had a couple of instances where a profile was missing process
creation events for a PID. I don't know how to reproduce it yet,
so this patch merely adds a helpful debug message so you know why
Profiler is failing to load the file.
This patch adds an additional level of hierarchy to the call tree:
Every process gets its own top-level node. :^)
Before this, selecting multiple processes would get quite confusing
as all the call stacks from different processes were combined together
into one big tree.
Problem:
- Default destructors (and constructors) are in `.cpp` files. This
prevents the compiler's optimizer from inlining them when it thinks
inlining is appropriate (unless LTO is used).
- Forward declarations can prevent some optimizations, such as
inlining of constructors and destructors.
Solution:
- Remove them or set them to `= default` and let the compiler handle
the generation of them.
- Remove unneeded forward declarations.
Hiding those frames doesn't really make sense. They're a major
contributor to a process' spent CPU time and show up in a lot of
profiles. That however is because those processes really do spend
quite a bit of time in the scheduler by doing lots of context
switches, like WindowServer when responding to IPC calls.
Instead of hiding these for aesthetic reasons we should instead
improve the scheduler.
We can lose profiling timer events for a few reasons, for example
disabled interrupts or system slowness. This accounts for lost
time between CPU samples by adding a field lost_samples to each
profiling event which tracks how many samples were lost immediately
preceding the event.
Now that the profiling timer is independent from the scheduler the
user will get quite a few CPU samples from "within" the scheduler.
These events are less useful when just profiling a user-mode process
rather than the whole system. This patch adds an option to Profiler to
hide these events.
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.
Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.
Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().
This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.
The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
This is a little bit messy, but basically if an ELF object is non-PIE,
we have to account for the executable mapping being at a hard-coded
offset and subtract that when doing symbolication.
There's probably a nicer way to solve this, I just hacked this together
so we can see "cc1plus" and friends in profiles. :^)
In multi-process profiles, the same ELF objects tend to occur many
times (everyone has libc.so for example) so we will quickly run out
of VM if we map each object once per process that uses it.
Fix this by adding a "mapped object cache" that maps the path of
an ELF object to a cached memory mapping and wrapping ELF::Image.
The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.
This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.
This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
You can now view the individual samples in a profile one by one with
the new "Samples" view. The "old" main view moves into a "Call Tree"
tab (but it remains the default view.)
When you select a sample in the samples view, we show you the full
symbolicated backtrace in a separate view on the right hand side. :^)
This way you don't have to look at all the library names if you don't
want to. Since we're pretty good about namespacing our things, the
library names are slightly redundant information.