When catching a use-after-free, we now also print out the backtraces
for where the memory was allocated, and for where it was freed.
This will be extremely helpful for debugging.
Upon exit, the emulator will now print a leak report of any malloc
allocations that are still live and don't have pointers to their base
address anywhere in either another live mallocation, or in one of the
non-malloc-block memory regions.
Note that the malloc-block memory region check is not fully functional
and this will work even better once we get that fixed.
This is pretty cool. :^)
This patch introduces a "MallocTracer" to the UserspaceEmulator.
If this object is present on the Emulator, it can be notified whenever
the emulated program does a malloc() or free().
The notifications come in via a magic instruction sequence that we
embed in the LibC malloc() and free() functions. The sequence is:
"salc x2, push reg32 x2, pop reg32 x3"
The data about the malloc/free operation is in the three pushes.
We make sure the sequence is harmless when running natively.
Memory accesses on MmapRegion are then audited to see if they fall
inside a known-to-be-freed malloc chunk. If so, we complain loud
and red in the debugger output. :^)
This is very, very cool! :^)
It's also a whole lot slower than before, since now we're auditing
memory accesses against a new set of metadata. This will need to be
optimized (and running in this mode should be opt-in, perhaps even
a separate program, etc.)
Here goes mkdir(), unlink(), socket(), getsockopt(), fchmod()
bind(), connect(), listen(), select() and recvfrom().
They're not perfect but they seem to work. :^)
The a32 bit tells us whether a memory address is 32-bit or not.
We already have this information in Instruction, so just plumb that
around instead of double-caching the bit.
Use some template hacks to force GCC to inline more of the instruction
decoding stuff into the UserspaceEmulator main execution loop.
This is my last optimization for today, and we've gone from ~60 seconds
when running "UserspaceEmulator UserspaceEmulator id" to ~8 seconds :^)
Since this code is performance-sensitive, let's have the compiler do
whatever it can to help us with the most important files.
This yields a ~8% speedup.
To avoid MMU region lookup on every single instruction fetch, we now
cache a raw pointer to the current instruction. This gets automatically
invalidated when we jump somewhere, but as long as we're executing
sequentially, instruction fetches will hit the cache and bypass all
the region lookup stuff.
This is about a ~2x speedup. :^)