This should allow us to eventually properly saturate high-bandwidth
network links when using TCP, once other nonoptimal parts of our
network stack are improved.
Previously we would incorrectly handle the (somewhat uncommon) case of
binding and then separately connecting a tcp socket to a server, as we
would register the socket during the manual bind(2) in the sockets by
tuple table, but our effective tuple would then change as the result of
the connect updating our target peer address. This would result in the
the entry not being removed from the table on destruction, which could
lead to a UAF.
We now make sure to update the table entry if needed during connects.
Currently, ephemeral port allocation is handled by the
allocate_local_port_if_needed() and protocol_allocate_local_port()
methods. Actually binding the socket to an address (which means
inserting the socket/address pair into a global map) is performed either
in protocol_allocate_local_port() (for ephemeral ports) or in
protocol_listen() (for non-ephemeral ports); the latter will fail with
EADDRINUSE if the address is already used by an existing pair present in
the map.
There used to be a bug where for listen() without an explicit bind(),
the port allocation would conflict with itself: first an ephemeral port
would get allocated and inserted into the map, and then
protocol_listen() would check again for the port being free, find the
just-created map entry, and error out. This was fixed in commit
01e5af487f by passing an additional flag
did_allocate_port into protocol_listen() which specifies whether the
port was just allocated, and skipping the check in protocol_listen() if
the flag is set.
However, this only helps if the socket is bound to an ephemeral port
inside of this very listen() call. But calling bind(sin_port = 0) from
userspace should succeed and bind to an allocated ephemeral port, in the
same was as using an unbound socket for connect() does. The port number
can then be retrieved from userspace by calling getsockname (), and it
should be possible to either connect() or listen() on this socket,
keeping the allocated port number. Also, calling bind() when already
bound (either explicitly or implicitly) should always result in EINVAL.
To untangle this, introduce an explicit m_bound state in IPv4Socket,
just like LocalSocket has already. Once a socket is bound, further
attempt to bind it fail. Some operations cause the socket to implicitly
get bound to an (ephemeral) address; this is implemented by the new
ensure_bound() method. The protocol_allocate_local_port() method is
gone; it is now up to a protocol to assign a port to the socket inside
protocol_bind() if it finds that the socket has local_port() == 0.
protocol_bind() is now called in more cases, such as inside listen() if
the socket wasn't bound before that.
During receive_tcp_packet(), we now set m_send_window_size for the
socket if it is different from the default.
This removes one FIXME from TCPSocket.h.
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.
This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
There is a big mix of LockRefPtrs all over the Networking subsystem, as
well as lots of room for improvements with our locking patterns, which
this commit will not pursue, but will give a good start for such work.
To deal with this situation, we change the following things:
- Creating instances of NetworkAdapter should always yield a non-locking
NonnullRefPtr. Acquiring an instance from the NetworkingManagement
should give a simple RefPtr,as giving LockRefPtr does not really
protect from concurrency problems in such case.
- Since NetworkingManagement works with normal RefPtrs we should
protect all instances of RefPtr<NetworkAdapter> with SpinlockProtected
to ensure references are gone unexpectedly.
- Protect the so_error class member with a proper spinlock. This happens
to be important because the clear_so_error() method lacked any proper
locking measures. It also helps preventing a possible TOCTOU when we
might do a more fine-grained locking in the Socket code, so this could
be definitely a start for this.
- Change unnecessary LockRefPtr<PacketWithTimestamp> in the structure
of OutgoingPacket to a simple RefPtr<PacketWithTimestamp> as the whole
list should be MutexProtected.
This was mostly straightforward, as all the storage locations are
guarded by some related mutex.
The use of old-school associated mutexes instead of MutexProtected
is unfortunate, but the process to modernize such code is ongoing.
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
This argument is always set to description.is_blocking(), but
description is also given as a separate argument, so there's no point
to piping it through separately.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Use the same trick as SlavePTY and override unref() to provide safe
removal from the sockets_by_tuple table when destroying a TCPSocket.
This should fix the TCPSocket::from_tuple() flake seen on CI.
Before this commit, we only checked the receive buffer on the socket,
which is unused on datagram streams. Now we return the actual size of
the datagram without the protocol headers, which required the protocol
to tell us what the size of the payload is.
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
Previously there was a mix of returning plain strings and returning
explicit string views using `operator ""sv`. This change switches them
all to standardized on `operator ""sv` as it avoids a call to strlen.
We don't really have anywhere to propagate the error in NetworkTask at
the moment, since it runs in its own kernel thread and has no direct
userspace caller.
This commit moves the KResult and KResultOr objects to Kernel/API to
signify that they may now be freely used by userspace code at points
where a syscall-related error result is to be expected. It also exposes
KResult and KResultOr to the global namespace to make it nicer to use
for userspace code.
Note: TCPSocket::create_client() has a dubious locking process where
the sockets by tuple table is first shared lock to check if the socket
exists and bail out if it does, then unlocks, then exclusively locks to
add the tuple. There could be a race condition where two client
creation requests for the same tuple happen at the same time and both
cleared the shared lock check. When in doubt, lock exclusively the
whole time.
The IPv4Socket requires a DoubleBuffer for storage of any data it
received on the socket. However it was previously using the default
constructor which can not observe allocation failure. Address this by
plumbing the receive buffer through the various derived classes.
Previously there was no way for Serenity to send a packet without an
established socket connection, and there was no way to appropriately
respond to a SYN packet on a non-listening port. This patch will respond
to any non-established socket attempts with the appropraite RST/ACK,
letting the client know to close the connection.
Previously we'd just dump those packets into the network adapter's
send queue and hope for the best. Instead we should wait until the peer
has sent TCP ACK packets.
Ideally this would parse the TCP window size option from the SYN or
SYN|ACK packet, but for now we just assume the window size is 64 kB.
Previously we'd allocate buffers when sending packets. This patch
avoids these allocations by using the NetworkAdapter's packet queue.
At the same time this also avoids copying partially constructed
packets in order to prepend Ethernet and/or IPv4 headers. It also
properly truncates UDP and raw IP packets.
Previously TCPSocket::send_tcp_packet() would try to send TCP packets
which matched whatever size the userspace program specified. We'd try to
break those packets up into smaller fragments, however a much better
approach is to limit TCP packets to the maximum segment size and
avoid fragmentation altogether.
Previously we didn't retransmit lost TCP packets which would cause
connections to hang if packets were lost. Also we now time out
TCP connections after a number of retransmission attempts.
Note that the changes to IPv4Socket::create are unfortunately needed as
the return type of TCPSocket::create and IPv4Socket::create don't match.
- KResultOr<NonnullRefPtr<TcpSocket>>>
vs
- KResultOr<NonnullRefPtr<Socket>>>
To handle this we are forced to manually decompose the KResultOr<T> and
return the value() and error() separately.
When we receive a TCP packet with a sequence number that is not what
we expected we have lost one or more packets. We can signal this to
the sender by sending a TCP ACK with the previous ack number so that
they can resend the missing TCP fragments.
When the MSS option header is missing the default maximum segment
size is 536 which results in lots of very small TCP packets that
NetworkTask has to handle.
This adds the MSS option header to outbound TCP SYN packets and
sets it to an appropriate value depending on the interface's MTU.
Note that we do not currently do path MTU discovery so this could
cause problems when hops don't fragment packets properly.