In e7fb70b05, regular kmalloc was changed to return nullptr on
allocation failure instead of crashing. The `kmalloc_aligned_cxx`
wrapper used by the aligned operator new should do the same.
Enable the LOCK_DEBUG functionality for these new APIs, as it looks
like we want to move the whole system to use this in the not so distant
future. :^)
The LOCK_DEBUG conditional code is pretty ugly for a feature that we
only use rarely. We can remove a significant amount of this code by
utilizing a zero sized fake type when not building in LOCK_DEBUG mode.
This lets us keep the same API, but just let the compiler optimize it
away when don't actually care about the location the caller came from.
Calling error() on KResult is a mistake I made in 7ba991dc37, so
instead of doing that, which triggers an assertion if an error occured,
in Inode::read_entire method with VERIFY(nread <= sizeof(buffer)), we
really should just return the KResult and not to call error() on it.
When TCP sockets successfully establish a connection, any SO_ERROR
should be cleared back to success. For example, SO_ERROR gets set to
EINPROGRESS on asynchronous connect calls and should be cleared when
the socket moves to the Established state.
This allows for commands like netstat to reference /proc/net and
identify a connection's owning process. Process information is limited
to superusers and user owned processes.
Now that we have a significant amount of code paths handling OOM, lets
enable kmalloc and friends to actually return nullptr. This way we can
start stressing these paths and validating all of they work as expected.
The implementation uses try_copy_kstring_from_user to allocate a kernel
string using, but does not use the length of the resulting string.
The size parameter to the syscall is untrusted, as try copy kstring will
attempt to perform a `safe_strlen(..)` on the user mode string and use
that value for the allocated length of the KString instead. The bug is
that we are printing the kstring, but with the usermode size argument.
During fuzzing this resulted in us walking off the end of the allocated
KString buffer printing garbage (or any kernel data!), until we stumbled
in to the KSym region and hit a fatal page fault.
This is technically a kernel information disclosure, but (un)fortunately
the disclosure only happens to the Bochs debug port, and or the serial
port if serial debugging is enabled. As far as I can tell it's not
actually possible for an untrusted attacker to use this to do something
nefarious, as they would need access to the host. If they have host
access then they can already do much worse things :^).
The only two paths for copying strings in the kernel should be going
through the existing Userspace<char const*>, or StringArgument methods.
Lets enforce this by removing the option for using the raw cstring APIs
that were previously available.
Since the InodeIndex encapsulates a 64 bit value, it is correct to
ensure that the Kernel is exposing the entire value and the LibC is
aware of it.
This commit requires an entire re-compile because it's essentially a
change in the Kernel ABI, together with a corresponding change in LibC.
The compiler can re-order the structure (class) members if that's
necessary, so if we make Process to inherit from ProcFSExposedComponent,
even if the declaration is to inherit first from ProcessBase, then from
ProcFSExposedComponent and last from Weakable<Process>, the members of
class ProcFSExposedComponent (including the Ref-counted parts) are the
first members of the Process class.
This problem made it impossible to safely use the current toggling
method with the write-protection bit on the ProcessBase members, so
instead of inheriting from it, we make its members the last ones in the
Process class so we can safely locate and modify the corresponding page
write protection bit of these values.
We make sure that the Process class doesn't expand beyond 8192 bytes and
the protected values are always aligned on a page boundary.
If you want to record perf events, just enable profiling. This allows us
to add random perf events to programs without littering the file system
with perfcore files.
Making userspace provide a global string ID was silly, and made the API
extremely difficult to use correctly in a global profiling context.
Instead, simply make the kernel do the string ID allocation for us.
This also allows us to convert the string storage to a Vector in the
kernel (and an array in the JSON profile data.)
This syscall allows userspace to register a keyed string that appears in
a new "strings" JSON object in profile output.
This will be used to add custom strings to profile signposts. :^)
We have to disable interrupts before capturing the current Processor*,
or we risk storing the wrong one if we get preempted and resume on a
different CPU.
Caught by the VERIFY in RecursiveSpinLock::unlock()
Currently, to append a UTF-16 view to a StringBuilder, callers must
first convert the view to UTF-8 and then append the copy. Add a UTF-16
overload so callers do not need to hold an entire copy in memory.
This allows tracing the syscalls made by a thread through the kernel's
performance event framework, which is similar in principle to strace.
Currently, this merely logs a stack backtrace to the current thread's
performance event buffer whenever a syscall is made, if profiling is
enabled. Future improvements could include tracing the arguments and
the return value, for example.
Previously, non-blocking read operations on pipes returned EOF instead
of EAGAIN and non-blocking write operations blocked. This fixes
pkgsrc's bmake "eof on job pipe!" error message when running in
non-compatibility mode.
As pointed out by 8infy, this mechanism is racy:
WRITER:
1. ++update1;
2. write_data();
3. ++update2;
READER:
1. do { auto saved = update1;
2. read_data();
3. } while (saved != update2);
The following sequence can lead to a bogus/partial read:
R1 R2 R3
W1 W2 W3
We close this race by incrementing the second update counter first:
WRITER:
1. ++update2;
2. write_data();
3. ++update1;
This fixes the placeholder stub for the SO_ERROR via getsockopt. It
leverages the m_so_error value that each socket maintains. The SO_ERROR
option obtains and then clears this field, which is useful when checking
for errors that occur between socket calls. This uses an integer value
to return the SO_ERROR status.
Resolves#146
This patch adds a vDSO-like mechanism for exposing the current time as
an array of per-clock-source timestamps.
LibC's clock_gettime() calls sys$map_time_page() to map the kernel's
"time page" into the process address space (at a random address, ofc.)
This is only done on first call, and from then on the timestamps are
fetched from the time page.
This first patch only adds support for CLOCK_REALTIME, but eventually
we should be able to support all clock sources this way and get rid of
sys$clock_gettime() in the kernel entirely. :^)
Accesses are synchronized using two atomic integers that are incremented
at the start and finish of the kernel's time page update cycle.
Leave interrupts enabled so that we can still process IRQs. Critical
sections should only prevent preemption by another thread.
Co-authored-by: Tom <tomut@yahoo.com>
By making these functions static we close a window where we could get
preempted after calling Processor::current() and move to another
processor.
Co-authored-by: Tom <tomut@yahoo.com>
This removes Pipes dependency on the UHCIController by introducing a
controller base class. This will be used to implement other controllers
such as OHCI.
Additionally, there can be multiple instances of a UHCI controller.
For example, multiple UHCI instances can be required for systems with
EHCI controllers. EHCI relies on using multiple of either UHCI or OHCI
controllers to drive USB 1.x devices.
This means UHCIController can no longer be a singleton. Multiple
instances of it can now be created and passed to the device and then to
the pipe.
To handle finding and creating these instances, USBManagement has been
introduced. It has the same pattern as the other management classes
such as NetworkManagement.
Port2 logic was errantly using `portsc1` registers, meaning that
the port wouldn't be reset properly. In effect, this puts devices
connected to Port2 in an undefined state.
Processing SMP messages outside of non-SMP mode is a waste of time,
and now that we don't rely on the side effects of calling the message
processing function, let's stop calling it entirely. :^)
We were previously relying on a side effect of the critical section in
smp_process_pending_messages(): when exiting that section, it would
process any pending deferred calls.
Instead of relying on that, make the deferred invocations explicit by
calling deferred_call_execute_pending() in exit_trap().
This ensures that deferred calls get processed before entering the
scheduler at the end of exit_trap(). Since thread unblocking happens
via deferred calls, the threads don't have to wait until the next
scheduling opportunity when they could be ready *now*. :^)
This was the main reason Tom's SMP branch ran slowly in non-SMP mode.