thread_context_first_enter reuses the context restoring code in the
trap handler, just like other arches already do.
The `ld x2, 1*8(sp)` is unnecessary in the trap handler, as the stack
pointer should be equal to the stack pointer slot in the RegisterState
if the trap is from supervisor mode (and we currently don't support
user traps).
This load will however make us unable to reuse that code for
thread_context_first_enter.
This commit adds two functions which save/restore the entire FPU state.
On RISC-V, you only need to save the floating pointer registers
themselves and the fcsr CSR, which contains the entire state of the F/D
extensions.