tl;dr: This fixes deadlocks that would occur in most applications in
Serenity when SMP was enabled.
As an optimization to `pthread_mutex_unlock()`, if only one thread
holds a mutex lock, it will avoid calling `futex()` to wake, since no
threads are actually waiting in `futex()`.
If a thread manages to synchronously unlock and relock the same mutex,
the state will be set to indicate that a wake is not needed, regardless
of whether any other threads are waiting for the lock. This should be
fine, as a thread that is waiting will set the mutex to needing a wake
and check if it is unlocked to unblock itself in the same atomic
operation.
However, when `pthread_mutex_lock()` was called with only one thread
holding the mutex lock, instead of telling `futex()` to atomically
sleep the thread if the state is set to indicate that a wake is
needed, the first wait would check if the state was set to indicate a
wake was _not_ needed. This means it is possible for the call to
`futex()` to wait when any subsequent call to `pthread_mutex_unlock()`
would not call `futex()` to wake, causing it to wait forever.
After that, any other thread that tries to take the lock will see that
the lock is taken and also `futex()` wait. Despite the fact that these
other threads would set the state to needing a wake, there will be no
unblocked thread holding the lock to actually wake them.
By making it wait only if the state indicates to other threads that a
wake is needed, heavily contended mutexes should no longer cause
deadlocks. Most applications would encounter these deadlocks due to the
mutex used by `malloc()`, some sooner than others. The worst offenders
(other than Ladybird) were most likely VideoPlayer and SoundPlayer.