From 32de3ddd33a887c91e845223bd50679f43ed6666 Mon Sep 17 00:00:00 2001 From: Sergey Bugaev Date: Sun, 13 Aug 2023 17:51:08 +0300 Subject: [PATCH] Base: Document futex(2) :^) --- Base/usr/share/man/man2/futex.md | 193 +++++++++++++++++++++++++++++++ 1 file changed, 193 insertions(+) create mode 100644 Base/usr/share/man/man2/futex.md diff --git a/Base/usr/share/man/man2/futex.md b/Base/usr/share/man/man2/futex.md new file mode 100644 index 0000000000..b53f832a37 --- /dev/null +++ b/Base/usr/share/man/man2/futex.md @@ -0,0 +1,193 @@ +## Name + +futex - low-level synchronization primitive + +## Synopsis + +```c++ +#include + +// Raw syscall. +int futex(uint32_t* userspace_address, int futex_op, uint32_t value, const struct timespec* timeout, uint32_t* userspace_address2, uint32_t value3); + +// More convenient wrappers. +int futex_wait(uint32_t* userspace_address, uint32_t value, const struct timespec* abstime, int clockid, int process_shared); +int futex_wake(uint32_t* userspace_address, uint32_t count, int process_shared); +``` + +## Description + +The `futex()` system call provides a low-level synchronization primitive, +essentially exposing the kernel's internal thread synchronization primitives +to userspace. + +While the `futex()` API is powerful and generic, it is complex and cumbersome +to use, and notoriously tricky to use *correctly*. For this reason, it is not +intended to be used by application code directly, but rather to serve as +a building block for more specialized and easier to use synchronization +primitives implemented in user space, such as mutexes and semaphores. +Specifically, the `futex()` API is designed to enable userspace synchronization +primitives to have a *fast path* that does not involve calling into the kernel +at all in the common uncontended case, avoiding the cost of making a syscall +completely. + +*A futex* is a single 32-bit integer cell located anywhere in the address space +of a process (identified by its address), as well as an associated kernel-side +queue of waiting threads. The kernel-side resources associated with a futex are +created and destroyed implicitly when a futex is used; in other words, any +32-bit integer can be used as a futex without any specific setup, and a futex +on which no threads are waiting is no different to any other integer. The +kernel does not assign any meaning to the value of the futex integer; it is up +to userspace to make use of the value for its own logic. + +The `futex()` API provides a number of *operations*, the most basic ones being +_waiting_ and _waking_: + +* `FUTEX_WAKE` / `futex_wake()`: wake up to `count` threads waiting on the + futex (in the raw `futex()` syscall, `count` is passed as the `value` + argument). The two most common values for `count` are 1 (wake a single + thread) and `UINT32_MAX` (wake all threads). +* `FUTEX_WAIT` / `futex_wait()`: wait on the futex, but only if the current + value of the futex integer matches the specified `value`. The value + comparison and blocking is done atomically: if another thread changes the + value before the calling thread starts waiting, the calling thread will not + begin waiting at all, and the `futex_wait()` call will return `EAGAIN` + immediately. A waiting thread may wake up spuriously, without a matching call + to `futex_wake()`. +* `FUTEX_WAKE_BITSET`: like `FUTEX_WAKE`, but only consider waiting threads + that have specified a matching bitset (passed in `value3`). Two bitsets match + if their *bitwise and* is non-zero. A thread that has not specified a bitset + is treated as having a bitset with all bits set (`FUTEX_BITSET_MATCH_ANY`, + equal to `0xffffffff`). +* `FUTEX_WAIT_BITSET`: like `FUTEX_WAIT`, but the thread will only get woken by + wake operations specifying a matching bitset. +* `FUTEX_REQUEUE`: wake up to `value` threads waiting on the futex, and requeue + up to `value2` (passed instead of the `timeout` argument) of the remaining + waiting threads to wait on another futex specified by `userspace_address2`, + without waking them up. Waking and requeueing threads is done atomically. + + Requeueing threads without waking them up is useful to avoid "thundering + herd" issues with synchronization primitives like condition variables, where + multiple threads may wait for an event, but an event can only be handled by a + single thread at a time. +* `FUTEX_CMP_REQUEUE`: like `FUTEX_REQUEUE`, but only if the current value of + the futex integer matches the specified `value3`. The value comparison, + waking and requeueing threads are all done atomically. +* `FUTEX_WAKE_OP`: modify the value of the futex specified by + `userspace_address2`, wake up to `value` threads waiting on the futex, and + optionally up to `value2` (passed instead of the `timeout` argument) threads + waiting on the futex specified by `userspace_address2`. + + The details of this operation are not currently documented here, see the + implementation for details. + +Additionally, the `FUTEX_PRIVATE_FLAG` flag can be *or*'ed in with one of the +*operation* values listed above. This flag restricts the call to only work on +other threads of the same process (as opposed to any threads in the system that +may have the same memory page mapped into their address space, possibly at a +different address), which enables additional optimizations in the syscall +implementation. The inverse of this flag is exposed as the `process_shared` +argument in `futex_wait()` and `futex_wake()` wrapper functions. + +## Return value + +* `FUTEX_WAKE`, `FUTEX_WAKE_BITSET`, `FUTEX_WAKE_OP`: the number of the waiting + threads that have been woken up, which may be 0 or a positive number. +* `FUTEX_WAIT`, `FUTEX_WAIT_BITSET`: 0 if blocked and got woken up by an + explicit wake call or woke up spuriously, an error otherwise. +* `FUTEX_REQUEUE`, `FUTEX_CMP_REQUEUE`: the total number of threads woken up + and requeued. + +## Errors + +* `EAGAIN`: for wait operations, did not begin waiting, because the futex value + has already been changed. +* `ETIMEDOUT`: for wait operations with a timeout, timed out. +* `EFAULT`: the specified futex address is invalid. +* `ENOSYS`: `FUTEX_CLOCK_REALTIME` was specified, but the operation is not + `FUTEX_WAIT` or `FUTEX_WAIT_BITSET`. +* `EINVAL`: The arithmetic-logical operation for `FUTEX_WAKE_OP` is invalid. + +## Examples + +The following program demonstrates how futexes can be used to implement a +simple "event" synchronization primitive. An event has a boolean state: it can +be *set* or *unset*; the initial state being unset. The two operations on an +event are *waiting* until it is set, and *setting* it (which wakes up any +threads that were waiting for the event to get set). + +Such a synchronization primitive could be used, for example, to notify threads +that are waiting for another thread to perform some sort of complex +initialization. + +The implementation features two fast paths: both setting an event that no +thread is waiting on, and trying to wait on an event that has already been set, +are performed entirely in userspace without calling into the kernel. For this +to work, the value of the futex integer is used to track both the state of the +event (whether it has been set) and whether any threads are waiting on it. + +```c++ +#include +#include + +class Event { +private: + enum State : u32 { + UnsetNoWaiters, + UnsetWithWaiters, + Set, + }; + + AK::Atomic m_state { UnsetNoWaiters }; + + u32* state_futex_ptr() { return reinterpret_cast(const_cast(m_state.ptr())); } + +public: + void set() + { + State previous_state = m_state.exchange(Set, AK::memory_order_release); + // If there was anyone waiting, wake them all up. + // Fast path: no one was waiting, so we're done. + if (previous_state == UnsetWithWaiters) + futex_wake(state_futex_ptr(), UINT32_MAX, false); + } + + void wait() + { + // If the state is UnsetNoWaiters, set it to UnsetWithWaiters. + State expected_state = UnsetNoWaiters; + bool have_exchanged = m_state.compare_exchange_strong( + expected_state, UnsetWithWaiters, + AK::memory_order_acquire); + if (have_exchanged) + expected_state = UnsetWithWaiters; + + // We need to check the state in a loop and not just once + // because of the possibility of spurious wakeups. + // Fast path: if the state was already Set, we're done. + while (expected_state != Set) { + futex_wait(state_futex_ptr(), expected_state, nullptr, 0, false); + expected_state = m_state.load(AK::memory_order_acquire); + } + } +}; +``` + +## History + +The name "futex" stands for "fast userspace mutex". + +The `futex()` system call originally apeared in Linux. Since then, many other +kernels implemented support for futex-like operations, under various names, in +particular: +* Darwin (XNU) has private `ulock_wait()` and `ulock_wake()` API; +* Windows (NT) apparently has `WaitOnAddress()`, `WakeByAddressSingle()` and + `WakeByAddressAll()`; +* FreeBSD and DargonFly BSD have `umtx`; +* OpenBSD has Linux-like `futex()`; +* GNU Hurd has `gsync_wait()`, `gsync_wake()`, and `gsync_requeue()`. + +## Further reading + +* [Futexes Are Tricky](https://akkadia.org/drepper/futex.pdf) by Ulrich Drepper +* [Locking in WebKit](https://webkit.org/blog/6161/locking-in-webkit/) by Filip Pizlo