Doesn't use them in libc headers so that those don't have to pull in
AK/Platform.h.
AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is
defined in clang builds as well.) Using AK_COMPILER_GCC simplifies
things some.
AK_COMPILER_CLANG isn't as much of a win, other than that it's
consistent with AK_COMPILER_GCC.
This commit addresses the following shortcomings of our current, simple
and elegant memset function:
- REP STOSB/STOSQ has considerable startup overhead, it's impractical to
use for smaller sizes.
- Up until very recently, AMD CPUs didn't have support for "Enhanced REP
MOVSB/STOSB", so it performed pretty poorly on them.
With this commit applied, I could measure a ~5% decrease in `test-js`'s
runtime when I used qemu's TCG backend. The implementation is based on
the following article from Microsoft:
https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines
Two versions of the routine are implemented: one that uses the ERMS
extension mentioned above, and one that performs plain SSE stores. The
version appropriate for the CPU is selected at load time using an IFUNC.