All the MOVs in the B8-BF range can use the REX.W prefix, not just B8.
Previously instructions like `48 B9... mov rcx, imm64` were interpreted
as `mov rcx, imm32` because the REX.W prefix was only applied to
`48 B8... mov rax, imm64`.
We previously passed both OperandSize and AddressSize to the
constructor.
Both values were only ever 32-bit at construction.
We used AddressSize::Size64 to signify Long mode which was needlessly
complicated.
As the existing near-by comment says, the default size of displacements
& immediates is 32 bits even in Long mode.
This makes `disasm` work on our binaries in x86-64 builds.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Most of the 64-bit instructions default to 32-bit operands and select
64-bit using REX.W prefixes. Because of that instead of defining new
instruction formats, this reuses the 32-bit formats and changes them
to take the REX prefixes into account when necessary.
Additionally this removes, adds or modifies the instruction
descriptors in the 64-bit table, where they are different from 32-bit.
Using 'disasm' these changes seem to cover pretty much all of our
64-bit binaries (except for AVX) :^)
Note that UserspaceEmulator will need to account for these prefixed
versions in its 32-bit instruction handlers before being usable on
x86-64.
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.
This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.
Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
Doesn't use them in libc headers so that those don't have to pull in
AK/Platform.h.
AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is
defined in clang builds as well.) Using AK_COMPILER_GCC simplifies
things some.
AK_COMPILER_CLANG isn't as much of a win, other than that it's
consistent with AK_COMPILER_GCC.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This mainly does two things,
1. Removes spaces after commas
2. Elides "0x" and leading zeros in most contexts
Remaining differences are:
1. objdump always has memory size annotations
We lack these and probably have some annotations wrong
2. Boolean check names
We use jump-zero, while objdump uses jump-equal for example
3. We sometimes add "00 00" symbols, which objdump elides
4. We always demangle (This is a good thing)
5. We always resolve relocations (This is a good thing)
6. We seem to detect some symbols differently/incorrectly
This allows disassembly of binaries with SSE2 instructions in them.
SSE2 also extends all MMX instructions without affecting the mnemonic,
therefore these are just directed to the same function for now.
The UserspaceEmulator does not know this as of
this commit.
`static const` variables can be computed and initialized at run-time
during initialization or the first time a function is called. Change
them to `static constexpr` to ensure they are computed at
compile-time.
This allows some removal of `strlen` because the length of the
`StringView` can be used which is pre-computed at compile-time.
Since our executables are position-independent, the address values
extraced from processes don't correspond to their values within the ELF
file. We have to offset the absolute addresses by the load base address
to get the relative symbol that we need for disassembly.
Doing these as custom classes might be faster, especially when writing
them in SSE, but this would cause a lot of Code duplication and due to
the nature of constexprs and the intelligence of the compiler they might
be using SSE/MMX either way
Problem:
- `static` variables consume memory and sometimes are less
optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
every time the function is run.
Solution:
- If a global `static` variable is only used in a single function then
move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
`constexpr`.