mirror of
				https://github.com/RGBCube/serenity
				synced 2025-10-31 16:52:43 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			193 lines
		
	
	
	
		
			8.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			193 lines
		
	
	
	
		
			8.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ## Name
 | |
| 
 | |
| futex - low-level synchronization primitive
 | |
| 
 | |
| ## Synopsis
 | |
| 
 | |
| ```c++
 | |
| #include <serenity.h>
 | |
| 
 | |
| // Raw syscall.
 | |
| int futex(uint32_t* userspace_address, int futex_op, uint32_t value, const struct timespec* timeout, uint32_t* userspace_address2, uint32_t value3);
 | |
| 
 | |
| // More convenient wrappers.
 | |
| int futex_wait(uint32_t* userspace_address, uint32_t value, const struct timespec* abstime, int clockid, int process_shared);
 | |
| int futex_wake(uint32_t* userspace_address, uint32_t count, int process_shared);
 | |
| ```
 | |
| 
 | |
| ## Description
 | |
| 
 | |
| The `futex()` system call provides a low-level synchronization primitive,
 | |
| essentially exposing the kernel's internal thread synchronization primitives
 | |
| to userspace.
 | |
| 
 | |
| While the `futex()` API is powerful and generic, it is complex and cumbersome
 | |
| to use, and notoriously tricky to use *correctly*. For this reason, it is not
 | |
| intended to be used by application code directly, but rather to serve as
 | |
| a building block for more specialized and easier to use synchronization
 | |
| primitives implemented in user space, such as mutexes and semaphores.
 | |
| Specifically, the `futex()` API is designed to enable userspace synchronization
 | |
| primitives to have a *fast path* that does not involve calling into the kernel
 | |
| at all in the common uncontended case, avoiding the cost of making a syscall
 | |
| completely.
 | |
| 
 | |
| *A futex* is a single 32-bit integer cell located anywhere in the address space
 | |
| of a process (identified by its address), as well as an associated kernel-side
 | |
| queue of waiting threads. The kernel-side resources associated with a futex are
 | |
| created and destroyed implicitly when a futex is used; in other words, any
 | |
| 32-bit integer can be used as a futex without any specific setup, and a futex
 | |
| on which no threads are waiting is no different to any other integer. The
 | |
| kernel does not assign any meaning to the value of the futex integer; it is up
 | |
| to userspace to make use of the value for its own logic.
 | |
| 
 | |
| The `futex()` API provides a number of *operations*, the most basic ones being
 | |
| _waiting_ and _waking_:
 | |
| 
 | |
| * `FUTEX_WAKE` / `futex_wake()`: wake up to `count` threads waiting on the
 | |
|   futex (in the raw `futex()` syscall, `count` is passed as the `value`
 | |
|   argument). The two most common values for `count` are 1 (wake a single
 | |
|   thread) and `UINT32_MAX` (wake all threads).
 | |
| * `FUTEX_WAIT` / `futex_wait()`: wait on the futex, but only if the current
 | |
|   value of the futex integer matches the specified `value`. The value
 | |
|   comparison and blocking is done atomically: if another thread changes the
 | |
|   value before the calling thread starts waiting, the calling thread will not
 | |
|   begin waiting at all, and the `futex_wait()` call will return `EAGAIN`
 | |
|   immediately. A waiting thread may wake up spuriously, without a matching call
 | |
|   to `futex_wake()`.
 | |
| * `FUTEX_WAKE_BITSET`: like `FUTEX_WAKE`, but only consider waiting threads
 | |
|   that have specified a matching bitset (passed in `value3`). Two bitsets match
 | |
|   if their *bitwise and* is non-zero. A thread that has not specified a bitset
 | |
|   is treated as having a bitset with all bits set (`FUTEX_BITSET_MATCH_ANY`,
 | |
|   equal to `0xffffffff`).
 | |
| * `FUTEX_WAIT_BITSET`: like `FUTEX_WAIT`, but the thread will only get woken by
 | |
|    wake operations specifying a matching bitset.
 | |
| * `FUTEX_REQUEUE`: wake up to `value` threads waiting on the futex, and requeue
 | |
|   up to `value2` (passed instead of the `timeout` argument) of the remaining
 | |
|   waiting threads to wait on another futex specified by `userspace_address2`,
 | |
|   without waking them up. Waking and requeueing threads is done atomically.
 | |
| 
 | |
|   Requeueing threads without waking them up is useful to avoid "thundering
 | |
|   herd" issues with synchronization primitives like condition variables, where
 | |
|   multiple threads may wait for an event, but an event can only be handled by a
 | |
|   single thread at a time.
 | |
| * `FUTEX_CMP_REQUEUE`: like `FUTEX_REQUEUE`, but only if the current value of
 | |
|   the futex integer matches the specified `value3`. The value comparison,
 | |
|   waking and requeueing threads are all done atomically.
 | |
| * `FUTEX_WAKE_OP`: modify the value of the futex specified by
 | |
|   `userspace_address2`, wake up to `value` threads waiting on the futex, and
 | |
|   optionally up to `value2` (passed instead of the `timeout` argument) threads
 | |
|   waiting on the futex specified by `userspace_address2`.
 | |
| 
 | |
|   The details of this operation are not currently documented here, see the
 | |
|   implementation for details.
 | |
| 
 | |
| Additionally, the `FUTEX_PRIVATE_FLAG` flag can be *or*'ed in with one of the
 | |
| *operation* values listed above. This flag restricts the call to only work on
 | |
| other threads of the same process (as opposed to any threads in the system that
 | |
| may have the same memory page mapped into their address space, possibly at a
 | |
| different address), which enables additional optimizations in the syscall
 | |
| implementation. The inverse of this flag is exposed as the `process_shared`
 | |
| argument in `futex_wait()` and `futex_wake()` wrapper functions.
 | |
| 
 | |
| ## Return value
 | |
| 
 | |
| * `FUTEX_WAKE`, `FUTEX_WAKE_BITSET`, `FUTEX_WAKE_OP`: the number of the waiting
 | |
|   threads that have been woken up, which may be 0 or a positive number.
 | |
| * `FUTEX_WAIT`, `FUTEX_WAIT_BITSET`: 0 if blocked and got woken up by an
 | |
|   explicit wake call or woke up spuriously, an error otherwise.
 | |
| * `FUTEX_REQUEUE`, `FUTEX_CMP_REQUEUE`: the total number of threads woken up
 | |
|   and requeued.
 | |
| 
 | |
| ## Errors
 | |
| 
 | |
| * `EAGAIN`: for wait operations, did not begin waiting, because the futex value
 | |
|    has already been changed.
 | |
| * `ETIMEDOUT`: for wait operations with a timeout, timed out.
 | |
| * `EFAULT`: the specified futex address is invalid.
 | |
| * `ENOSYS`: `FUTEX_CLOCK_REALTIME` was specified, but the operation is not
 | |
|   `FUTEX_WAIT` or `FUTEX_WAIT_BITSET`.
 | |
| * `EINVAL`: The arithmetic-logical operation for `FUTEX_WAKE_OP` is invalid.
 | |
| 
 | |
| ## Examples
 | |
| 
 | |
| The following program demonstrates how futexes can be used to implement a
 | |
| simple "event" synchronization primitive. An event has a boolean state: it can
 | |
| be *set* or *unset*; the initial state being unset. The two operations on an
 | |
| event are *waiting* until it is set, and *setting* it (which wakes up any
 | |
| threads that were waiting for the event to get set).
 | |
| 
 | |
| Such a synchronization primitive could be used, for example, to notify threads
 | |
| that are waiting for another thread to perform some sort of complex
 | |
| initialization.
 | |
| 
 | |
| The implementation features two fast paths: both setting an event that no
 | |
| thread is waiting on, and trying to wait on an event that has already been set,
 | |
| are performed entirely in userspace without calling into the kernel. For this
 | |
| to work, the value of the futex integer is used to track both the state of the
 | |
| event (whether it has been set) and whether any threads are waiting on it.
 | |
| 
 | |
| ```c++
 | |
| #include <AK/Atomic.h>
 | |
| #include <serenity.h>
 | |
| 
 | |
| class Event {
 | |
| private:
 | |
|     enum State : u32 {
 | |
|         UnsetNoWaiters,
 | |
|         UnsetWithWaiters,
 | |
|         Set,
 | |
|     };
 | |
| 
 | |
|     AK::Atomic<State> m_state { UnsetNoWaiters };
 | |
| 
 | |
|     u32* state_futex_ptr() { return reinterpret_cast<u32*>(const_cast<State*>(m_state.ptr())); }
 | |
| 
 | |
| public:
 | |
|     void set()
 | |
|     {
 | |
|         State previous_state = m_state.exchange(Set, AK::memory_order_release);
 | |
|         // If there was anyone waiting, wake them all up.
 | |
|         // Fast path: no one was waiting, so we're done.
 | |
|         if (previous_state == UnsetWithWaiters)
 | |
|             futex_wake(state_futex_ptr(), UINT32_MAX, false);
 | |
|     }
 | |
| 
 | |
|     void wait()
 | |
|     {
 | |
|         // If the state is UnsetNoWaiters, set it to UnsetWithWaiters.
 | |
|         State expected_state = UnsetNoWaiters;
 | |
|         bool have_exchanged = m_state.compare_exchange_strong(
 | |
|             expected_state, UnsetWithWaiters,
 | |
|             AK::memory_order_acquire);
 | |
|         if (have_exchanged)
 | |
|             expected_state = UnsetWithWaiters;
 | |
| 
 | |
|         // We need to check the state in a loop and not just once
 | |
|         // because of the possibility of spurious wakeups.
 | |
|         // Fast path: if the state was already Set, we're done.
 | |
|         while (expected_state != Set) {
 | |
|             futex_wait(state_futex_ptr(), expected_state, nullptr, 0, false);
 | |
|             expected_state = m_state.load(AK::memory_order_acquire);
 | |
|         }
 | |
|     }
 | |
| };
 | |
| ```
 | |
| 
 | |
| ## History
 | |
| 
 | |
| The name "futex" stands for "fast userspace mutex".
 | |
| 
 | |
| The `futex()` system call originally appeared in Linux. Since then, many other
 | |
| kernels implemented support for futex-like operations, under various names, in
 | |
| particular:
 | |
| * Darwin (XNU) has private `ulock_wait()` and `ulock_wake()` API;
 | |
| * Windows (NT) apparently has `WaitOnAddress()`, `WakeByAddressSingle()` and
 | |
|   `WakeByAddressAll()`;
 | |
| * FreeBSD and DargonFly BSD have `umtx`;
 | |
| * OpenBSD has Linux-like `futex()`;
 | |
| * GNU Hurd has `gsync_wait()`, `gsync_wake()`, and `gsync_requeue()`.
 | |
| 
 | |
| ## Further reading
 | |
| 
 | |
| * [Futexes Are Tricky](https://akkadia.org/drepper/futex.pdf) by Ulrich Drepper
 | |
| * [Locking in WebKit](https://webkit.org/blog/6161/locking-in-webkit/) by Filip Pizlo
 | 
