We were leaking a ref on the executed inode in successful calls to
sys$execve(). This meant that once a binary had ever been executed,
it was impossible to remove it from the file system.
The execve system call is particularly finicky since the function
does not return normally on success, so extra care must be taken to
ensure nothing is kept alive by stack variables.
There is a big NOTE comment about this, and yet the bug still got in.
It would be nice to enforce this, but I'm unsure how.
e2fsck was complaining about blocks being allocated in an inode's list
of direct blocks while at the same time being free in the block bitmap.
It was easy to reproduce by creating a file with non-zero length and
then truncating it. This fixes the issue by clearing out the direct
block list when resizing a file to 0.
We need to create a reference for the new PID for each shared buffer
that the process had a reference to. If the process subsequently
get replaced through exec, those references will be dropped again.
But if exec for some reason fails then other code, such as global
destructors could still expect having access to them.
Fixes#4076
The time returned by sys$clock_gettime() was not aligned with the delay
calculations in sys$clock_nanosleep(). This patch fixes that by taking
the system's ticks_per_second value into account in both functions.
This patch also removes the need for Thread::sleep_until() and uses
Thread::sleep() for both absolute and relative sleeps.
This was causing the nesalizer emulator port to sleep for a negative
amount of time at the end of each frame, making it run way too fast.
Problem:
- C-style arrays do not automatically provide bounds checking and are
less type safe overall.
- `__builtin_memcmp` is not a constant expression in the current gcc.
Solution:
- Change private m_data to be AK::Array.
- Eliminate constructor from C-style array.
- Change users of the C-style array constructor to use the default
constructor.
- Change `operator==()` to be a hand-written comparison loop and let
the optimizer figure out to use `memcmp`.
We won't be receiving full PS/2 mouse packets when the VMWareBackdoor
absolute mouse mode is enabled. So, read just one byte every time
and retrieve the latest mouse packet from VMWareBackdoor immediately.
Fixes#4086
This allows issuing asynchronous requests for devices and waiting
on the completion of the request. The requests can cascade into
multiple sub-requests.
Since IRQs may complete at any time, if the current process is no
longer the same that started the process, we need to swich the
paging context before accessing user buffers.
Change the PATA driver to use this model.
Rework the PS/2 keyboard and mouse drivers to use a common 8042
controller driver. Also, reset and reconfigure the 8042 controller
as they are not guaranteed to be in the state that we expect.
When two processors send each others a SMP message at the same time
they need to process messages while waiting for delivery of the
message they just sent, or they will deadlock.
This makes most operations thread safe, especially so that they
can safely be used in the Kernel. This includes obtaining a strong
reference from a weak reference, which now requires an explicit
call to WeakPtr::strong_ref(). Another major change is that
Weakable::make_weak_ref() may require the explicit target type.
Previously we used reinterpret_cast in WeakPtr, assuming that it
can be properly converted. But WeakPtr does not necessarily have
the knowledge to be able to do this. Instead, we now ask the class
itself to deliver a WeakPtr to the type that we want.
Also, WeakLink is no longer specific to a target type. The reason
for this is that we want to be able to safely convert e.g. WeakPtr<T>
to WeakPtr<U>, and before this we just reinterpret_cast the internal
WeakLink<T> to WeakLink<U>, which is a bold assumption that it would
actually produce the correct code. Instead, WeakLink now operates
on just a raw pointer and we only make those constructors/operators
available if we can verify that it can be safely cast.
In order to guarantee thread safety, we now use the least significant
bit in the pointer for locking purposes. This also means that only
properly aligned pointers can be used.
Most systems (Linux, OpenBSD) adjust 0.5 ms per second, or 0.5 us per
1 ms tick. That is, the clock is sped up or slowed down by at most
0.05%. This means adjusting the clock by 1 s takes 2000 s, and the
clock an be adjusted by at most 1.8 s per hour.
FreeBSD adjusts 5 ms per second if the remaining time adjustment is
>= 1 s (0.5%) , else it adjusts by 0.5 ms as well. This allows adjusting
by (almost) 18 s per hour.
Since Serenity OS can lose more than 22 s per hour (#3429), this
picks an adjustment rate up to 1% for now. This allows us to
adjust up to 36s per hour, which should be sufficient to adjust
the clock fast enough to keep up with how much time the clock
currently loses. Once we have a fancier NTP implementation that can
adjust tick rate in addition to offset, we can think about reducing
this.
adjtime is a bit old-school and most current POSIX-y OSs instead
implement adjtimex/ntp_adjtime, but a) we have to start somewhere
b) ntp_adjtime() is a fairly gnarly API. OpenBSD's adjfreq looks
like it might provide similar functionality with a nicer API. But
before worrying about all this, it's probably a good idea to get
to a place where the kernel APIs are (barely) good enough so that
we can write an ntp service, and once we have that we should write
a way to automatically evaluate how well it keeps the time adjusted,
and only then should we add improvements ot the adjustment mechanism.
This addresses the issue first enountered in #3644. If a path is
first unveiled with "c" permissions, we should NOT return ENOENT
if the node does not exist on the disk, as the program will most
likely be creating it at a later time.
When computing the list of blocks to deallocate when freeing an inode,
we would stop collecting blocks after reaching the inode's block count.
Since we're getting rid of the inode, we need to also include the meta
blocks used by the on-disk block list itself.
* Change the register structures to use the volatile keyword explicitly
on the register values. This avoids accidentally omitting it as any
access will be guaranteed volatile.
* Don't assume we can read/write 64 bit value to the main counter and
the comparator. Not all HPET implementations may support this. So,
just use 32 bit words to access the registers. This ultimately works
around a bug in Bochs 2.6.11 that loses 32 bits of a 64 bit write to
a timer's comparator register (it internally writes one half and
clears the Tn_VAL_SET_CNF bit, and then because it's cleared it
fails to write the second half).
* Properly calculate the tick duration in calculate_ticks_in_nanoseconds
* As per specification, changing the frequency of one periodic timer
requires a restart of all periodic timers as it requires the main
counter to be reset.
This allows issuing asynchronous requests for devices and waiting
on the completion of the request. The requests can cascade into
multiple sub-requests.
Since IRQs may complete at any time, if the current process is no
longer the same that started the process, we need to swich the
paging context before accessing user buffers.
Change the PATA driver to use this model.
Because allocating/freeing regions may require locks that need to
wait on other processors for completion, this needs to be delayed
until it's safer. Otherwise it is possible to deadlock because we're
holding the global heap lock.
Function calls that are deferred will be executed before a thread
enters a pre-emptable state (meaning it is not in a critical section
and it is not in an irq handler). If it is not already in such a
state, it will be called immediately.
This is meant to be used from e.g. IRQ handlers where we might want
to block a thread until an interrupt happens.
If you try to do this (e.g "mv directory directory"), sys$rename() will
now fail with EDIRINTOSELF.
Dr. POSIX says we should return EINVAL for this, but a custom error
code allows us to print a much more helpful error message when this
problem occurs. :^)
Remapping these registers every time we try to read from or write to
them causes a lot of SMP broadcasts and a lot of other overhead.
This improves boot time noticeably.
Instead of mapping a 4KB region to access device configuration space
each time we call one of the PCI helpers, just map them once during
the boot process.
Then, if we request to access one of those devices, we can ask the
PCI subsystem to give us the virtual address where the device's
configuration space is mapped.
We were stripping the L3 headers from packets received on raw sockets.
This didn't match what other systems do, so let's adjust our behavior.
Thanks to @SpencerCDixon for noticing this! :^)
g_scheduler_lock cannot safely be acquired after Thread::m_lock
because another processor may already hold g_scheduler_lock and wait
for the same Thread::m_lock.
It's possible that we broadcast an IPI message right at the same time
another processor requests a halt. Rather than spinning forever waiting
for that message to be handled, check if we should halt while waiting.