Previously Profiler was using timestamps to distinguish processes.
However it is possible that separate processes with the same PID exist
at the exact same timestamp (e.g. for execve). This changes Profiler
to use unique serial numbers for each event instead.
The profiler tried to be clever when handling process_exit events by
subtracting one from the timestamp. This was supposed to ensure that
events after a process' death would be attributed to the new process
in case the old process used execve(). However, if there was another
event (e.g. a CPU sample) at the exact same time the process_exit
event was recorded the profile would fail to load because we
didn't find the process anymore.
This changes introduces a new problem where samples would be attributed
to the incorrect process if a CPU sample for the old process, a
process_exit as well as a process_create event plus another CPU sample
event for the new process happened at the exact same time. I think
it's a reasonable compromise though.
I've had a couple of instances where a profile was missing process
creation events for a PID. I don't know how to reproduce it yet,
so this patch merely adds a helpful debug message so you know why
Profiler is failing to load the file.
This patch adds an additional level of hierarchy to the call tree:
Every process gets its own top-level node. :^)
Before this, selecting multiple processes would get quite confusing
as all the call stacks from different processes were combined together
into one big tree.
Problem:
- Default destructors (and constructors) are in `.cpp` files. This
prevents the compiler's optimizer from inlining them when it thinks
inlining is appropriate (unless LTO is used).
- Forward declarations can prevent some optimizations, such as
inlining of constructors and destructors.
Solution:
- Remove them or set them to `= default` and let the compiler handle
the generation of them.
- Remove unneeded forward declarations.
Hiding those frames doesn't really make sense. They're a major
contributor to a process' spent CPU time and show up in a lot of
profiles. That however is because those processes really do spend
quite a bit of time in the scheduler by doing lots of context
switches, like WindowServer when responding to IPC calls.
Instead of hiding these for aesthetic reasons we should instead
improve the scheduler.
We can lose profiling timer events for a few reasons, for example
disabled interrupts or system slowness. This accounts for lost
time between CPU samples by adding a field lost_samples to each
profiling event which tracks how many samples were lost immediately
preceding the event.
Now that the profiling timer is independent from the scheduler the
user will get quite a few CPU samples from "within" the scheduler.
These events are less useful when just profiling a user-mode process
rather than the whole system. This patch adds an option to Profiler to
hide these events.
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.
Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.
Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().
This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.
The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
This is a little bit messy, but basically if an ELF object is non-PIE,
we have to account for the executable mapping being at a hard-coded
offset and subtract that when doing symbolication.
There's probably a nicer way to solve this, I just hacked this together
so we can see "cc1plus" and friends in profiles. :^)
In multi-process profiles, the same ELF objects tend to occur many
times (everyone has libc.so for example) so we will quickly run out
of VM if we map each object once per process that uses it.
Fix this by adding a "mapped object cache" that maps the path of
an ELF object to a cached memory mapping and wrapping ELF::Image.
The perfcore file format was previously limited to a single process
since the pid/executable/regions data was top-level in the JSON.
This patch moves the process-specific data into a top-level array
named "processes" and we now add entries for each process that has
been sampled during the profile run.
This makes it possible to see samples from multiple threads when
viewing a perfcore file with Profiler. This is extremely cool! :^)
You can now view the individual samples in a profile one by one with
the new "Samples" view. The "old" main view moves into a "Call Tree"
tab (but it remains the default view.)
When you select a sample in the samples view, we show you the full
symbolicated backtrace in a separate view on the right hand side. :^)
This way you don't have to look at all the library names if you don't
want to. Since we're pretty good about namespacing our things, the
library names are slightly redundant information.