When obeying FD_CLOEXEC, we don't need to explicitly call close() on
all the FileDescriptions. We can just clear them out from the process
fd table. ~FileDescription() will call close() anyway.
This fixes an issue where TelnetServer would shut down accepted sockets
when exec'ing a shell for them. Since the parent process still has the
socket open, we should not force-close it. Just let go.
This finally takes care of the kind-of excessive boilerplate code that were the
ctype adapters. On the other hand, I had to link `LibC/ctype.cpp` to the Kernel
(for `AK/JsonParser.cpp` and `AK/Format.cpp`). The previous commit actually makes
sense now: the `string.h` includes in `ctype.{h,cpp}` would require to link more LibC
stuff to the Kernel when it only needs the `_ctype_` array of `ctype.cpp`, and there
wasn't any string stuff used in ctype.
Instead of all this I could have put static derivatives of `is_any_of()` in the
concerned AK files, however that would have meant more boilerplate and workarounds;
so I went for the Kernel approach.
Similar to Process, we need to make Thread refcounted. This will solve
problems that will appear once we schedule threads on more than one
processor. This allows us to hold onto threads without necessarily
holding the scheduler lock for the entire duration.
The thread joining logic hadn't been updated to account for the subtle
differences introduced by software context switching. This fixes several
race conditions related to thread destruction and joining, as well as
finalization which did not properly account for detached state and the
fact that threads can be joined after termination as long as they're not
detached.
Fixes#3596
This function is not avaliable in the kernel.
In the future it would be nice to have some sort of <charconv> header
that does this for all integer types and then call it in strtoull and et
cetera.
The difference would be that this function say 'from_chars' would return
an Optional and not just interpret anything invalid as zero.
We need to assert if interrupts are not disabled when changing the
interrupt number of an interrupt handler.
Before this fix, any change like this would lead to a crash,
because we are using InterruptDisabler in IRQHandler::change_irq_number.
First of all, this fixes a dumb info leak where we'd write kernel heap
addresses (StringImpl*) into userspace memory when reading a watcher.
Instead of trying to pass names to userspace, we now simply pass the
child inode index. Nothing in userspace makes use of this yet anyway,
so it's not like we're breaking anything. We'll see how this evolves.
When SO_TIMESTAMP is set as an option on a SOCK_DGRAM socket, then
recvmsg() will return a SCM_TIMESTAMP control message that
contains a struct timeval with the system time that was current
when the socket was received.
Since the receiving socket isn't yet known at packet receive time,
keep timestamps for all packets.
This is useful for keeping statistics about in-kernel queue latencies
in the future, and it can be used to implement SO_TIMESTAMP.
The implementation only supports a single iovec for now.
Some might say having more than one iovec is the main point of
recvmsg() and sendmsg(), but I'm interested in the control message
bits.
There are plenty of places in the kernel that aren't
checking if they actually got their allocation.
This fixes some of them, but definitely not all.
Fixes#3390Fixes#3391
Also, let's make find_one_free_page() return nullptr
if it doesn't get a free index. This stops the kernel
crashing when out of memory and allows memory purging
to take place again.
Fixes#3487
Before e06362de9487806df92cf2360a42d3eed905b6bf this was a sneaky buffer
overflow. BufferStream did not do range checking and continued to write
past the allocated buffer (the size of which was controlled by the
user.)
The issue surfaced after my changes because OutputMemoryStream does
range checking.
Not sure how exploitable that bug was, directory entries are somewhat
controllable by the user but the buffer was on the heap, so exploiting
that should be tough.
Fixes two flaws in the thread donation logic: Scheduler::donate_to
would never really donate, but just trigger a deferred yield. And
that deferred yield never actually donated to the beneficiary.
So, when we can't immediately donate, we need to save the beneficiary
and use this information as soon as we can perform the deferred
context switch.
Fixes#3495
Fix gracefully failing these calls if used within IRQ handlers. If we're
handling IRQs, we need to handle these failures first, because we can't
really resolve page faults in a meaningful way. But if we know that it
was one of these functions that failed, then we can gracefully handle
the situation.
This solves a crash where the Scheduler attempts to produce backtraces
in the timer irq, some of which cause faults.
Fixes#3492
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
These special functions can be used to safely copy/set memory or
determine the length of a string, e.g. provided by user mode.
In the event of a page fault, safe_memcpy/safe_memset will return
false and safe_strnlen will return -1.
I decided to modify MappedROM.h because all other entried in Forward.h
are also classes, and this is visually more pleasing.
Other than that, it just doesn't make any difference which way we resolve
the conflicts.
Since "rings" typically refer to code execution and user processes
can also execute in ring 0, rename these functions to more accurately
describe what they mean: kernel processes and user processes.
The ring is determined based on the CS register. This fixes crashes
being handled as ring 3 crashes even though EIP/CS clearly showed
that the crash happened in the kernel.
In addition to being the proper POSIX etiquette, it seems like a bad idea
for issues like the one seen in #3428 to result in a kernel crash. This patch
replaces the current behavior of failing on insufficient buffer size to truncating
SOCK_RAW messages to the buffer size. This will have to change if/when MSG_PEEK
is implemented, but for now this behavior is more compliant and logical than
just bailing.
By being a bit too greedy and only allocating how much we need for
the failing allocation, we can end up in an infinite loop trying
to expand the heap further. That's because there are other allocations
(e.g. logging, vmobjects, regions, ...) that happen before we finally
retry the failed allocation request.
Also fix allocating in page size increments, which lead to an assertion
when the heap had to grow more than the 1 MiB backup.
Rather than trying to find a contiguous set of bits of size 1, just
find one single available bit using a hint.
Also, try to randomize returned physical pages a bit by placing them
into a 256 entry queue rather than making them available immediately.
Then, once the queue is filled, pick a random one, make it available
again and use that slot for the latest page to be returned.
In c3d231616c we added the atomic variable
m_have_any_unmasked_pending_signals tracking the state of pending signals.
Add helper functions that automatically update this variable as needed.
We need to wait until a thread is fully set up and ready for running
before attempting to deliver a signal. Otherwise we may not have a
user stack yet.
Also, remove the Skip0SchedulerPasses and Skip1SchedulerPass thread
states that we don't really need anymore with software context switching.
Fixes the kernel crash reported in #3419