Instead of caching a raw pointer to the next instruction, cache the
region we're fetching instructions from, and a pointer to its base.
This way we don't need to keep invalidating and reloading the cache
whenever the CPU jumps.
By passing the Region& to the auditing functions, we know exactly which
block we are hitting. This allows us to track big mallocations the same
way we already do chunked ones.
This gets rid of the O(n) scan in find_mallocation() for allocations
larger than the maximum malloc chunk size. :^)
These are getting quite hot (~4% of general emulation profile combined)
so let's just devirtualize them and turn the function calls into simple
boolean checks.
Instead of tracking known malloc blocks in a separate hash table,
add an optional malloc metadata pointer to MmapRegion.
This makes finding the malloc metadata for a given pointer extremely
fast since it can piggyback on the page table array. :^)
Not motivated by anything in particular, they just looked easy to fill
in. With this, all arithmetic FI* FPU instructions are implemented.
Switch to the mXXint style in a few more functions, this part is no-op.
This is used by memset() so we get a lot of mileage out of optimizing
this instruction.
Note that we currently audit every individual byte accessed separately.
This could be greatly improved by adding a range auditing mechanism to
MallocTracer.
To make SoftMMU::find_region() O(1), this patch invests 3MiB into a
lookup table where we track each possible page base address and map
them to the SoftMMU::Region corresponding to that address.
This is another large improvement to general emulation performance. :^)
We don't want the next_address pointer losing its alignment somehow.
This whole thing should be replaced at some point, since UE hosted
programs won't be able to run forever with this allocation strategy.
m32int is a 32-bit integer stored in memory, and should not be mistaken
for a floating point number. :^)
Also add missing handling of 64-bit FPU register operands to some of
the RM64 instructions.
There are some destruction order races that can cause hangs while
shutting down UE. Since there's no particular value right now in
destroying the Emulator object properly, just avoid destruction and
add a FIXME about looking into it later.
Instead of doing an O(n) scan over all the mallocations whenever we're
doing a read/write audit, UE now keeps track of ChunkedBlocks and their
chunks. Both the block lookup and the chunk lookup is O(1).
We know what ChunkedBlocks look like via mallocdefs.h from LibC.
Note that the old linear scan is still in use for big mallocations,
but the vast majority of mallocations are chunked, so this helps a lot.
This makes malloc auditing significantly faster! :^)
These instructions now operate on the specified FPU stack entry instead
of always using ST(0) and ST(1).
FUCOMI and FUCOMIP also handle NaN values slightly better.
Instead of always showing the preceding mallocation, prefer showing the
following one *if* it's closer to the audited address.
This makes it easier to find bugs where the access is just before an
allocation instead of just after it.
Start fleshing out basic support for floating-point instructions in the
UserspaceEmulator CPU.
This is all work done by @nico for #3576. I'm just merging it all in
this patch since it's a decent foundation to continue working on. :^)
When a mallocation is shrunk/grown without moving, UE needs to update
its precise metadata about the mallocation, since it tracks *exactly*
how many bytes were allocated, not just the malloc chunk size.