1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-17 14:45:06 +00:00
Commit graph

86 commits

Author SHA1 Message Date
Zaggy1024
2ec043c4db LibVideo/VP9: Make inter-prediction fast path accumulators 32-bit
Some occasional cases could cause the accumulator to overflow and have
an incorrect result. It would be nice to use a smaller accumulator, but
it seems not to be correct. :^(

We now cast to i16 to allow 128-bit vectorization to make use of one
whole register instead of having to split the loop into multiple.

This results in about a 5% reduction in performance in my testing.
2023-04-30 05:58:27 +02:00
Zaggy1024
d6b867ba89 LibVideo/VP9: Force inlining of inverse_transform_2d() and the IDCT
Clang was reluctant to inline these for some reason. However, inlining
them seems to be quite beneficial, reducing decoding time in an intra-
heavy video by about 21% (~12.7s -> ~10.0s).
2023-04-25 17:44:36 -04:00
Zaggy1024
90c0e1ad8f LibVideo/VP9: Pre-calculate the quantizers at the start of each frame
Quantizers are a constant for the whole frame, except when segment
features override them, in which case they are a constant per segment
ID. We take advantage of this by pre-calculating those after reading
the quantization parameters and segmentation features for a frame.
This results in a small 1.5% improvement (~12.9s -> ~12.7s).
2023-04-25 17:44:36 -04:00
Zaggy1024
094b0d8a78 LibVideo/VP9: Use an enum to select segment features
This throws out some ugly `#define`s we had that were taking the role
of an enum anyway. We now have some nice getters in the contexts that
take the place of the combo of `seg_feature_active()` and then doing a
lookup in `FrameContext::m_segmentation_features` directly.
2023-04-25 17:44:36 -04:00
Zaggy1024
6e6cc1ddb2 LibVideo/VP9: Make a lookup table for bit reversals
Bit reversals are used very often in intra-predicted frames. Turning
these into a constexpr lookup table reduces the branching needed for
block transforms significantly. This reduces the times spent decoding
an intra-heavy 1080p video by about 9% (~14.3s -> ~12.9s).
2023-04-25 17:44:36 -04:00
Zaggy1024
f6764beead LibVideo/VP9: Specialize transforms on their block size
Previously, the block sizes would be checked at runtime to
determine the transform size to apply for residuals. Making the block
sizes into constant expressions allows all the loops to be unrolled
and reduces branching significantly.

This results in about a 26% improvement (~18s -> ~13.2s) in speed in an
intra-heavy test video.
2023-04-25 17:44:36 -04:00
Zaggy1024
8ad0dff5c2 LibVideo/VP9: Implement unscaled fast paths in inter prediction
Inter-prediction convolution filters are selected based on the
subpixel position determined for the motion vector relative to the
block being predicted. The subpixel position 0 only uses one single
sample in the center of the convolution, not averaging any other
samples. Let's call this a copy.

Reference frames can also be a different size relative to the frame
being predicted, but in almost every case, that scale will be 1:1
for every single frame in a video.

Taking into account these facts, we can create multiple fast paths for
inter prediction. These fast paths are only active when scaling is 1:1.

If we are doing a copy in both dimensions, then we can do a straight
memcpy from the reference frame to the output block buffer. In videos
where there is no motion, this is a dramatic speedup.

If we are doing a copy in one dimension, we can just do one convolution
and average directly into the output block buffer.

If we aren't doing a copy in either dimension, we can still cut out a
few operations from the convolution loops, since we only need to
advance our samples by whole pixels instead of subpixels.

These fast paths result in about a 34% improvement (~31.2s -> ~20.6s)
in a video which relies heavily on intra-predicted blocks due to high
motion. In videos with less motion, the improvement will be even
greater.

Also, note that the accumulators in these faster loops are only 16-bit.
High bit-depth videos will overflow those, so for now the fast path is
only used for 8-bit videos.
2023-04-25 17:44:36 -04:00
Zaggy1024
8cd72ad1ed LibVideo/VP9: Use the Y scale value in predict_inter_block()
A typo caused the Y scale value to never be used, so if a reference
frame's aspect ratio didn't match up with the current frame's, it would
decode incorrectly.

Some comments have been added to clarify the frame-constants used in
the function as well.
2023-04-25 17:44:36 -04:00
Zaggy1024
f2c0cee522 LibVideo/VP9: Consolidate frame size calculations
This moves all the frame size calculation to `FrameContext`, where the
subsampling is easily accessible to determine the size for each plane.
The internal framebuffer size has also been reduced to the exact frame
size that is output.
2023-04-25 17:44:36 -04:00
Zaggy1024
57c7389200 LibVideo/VP9: Fix rounding of components in the motion vector selection
The division in the `round_mv_...()` functions contained in the motion
vector selection process was done by bit shifting right. However, since
bit shifting negative values will truncate towards the negative end, it
was flooring instead of rounding.

This changes it to match the spec and rely on the compiler to simplify
down to a bit shift.
2023-04-25 17:44:36 -04:00
Zaggy1024
1fcac52e77 LibVideo/VP9: Count syntax elements in TileContext, and sum at the end
Syntax element counters were previously accessed across tiles, which
would cause a race condition updating the counts in a tile-threaded
mode.
2023-04-23 23:14:30 +02:00
Zaggy1024
5e3192c8d9 LibVideo/VP9: Extend the borders on reference frames to avoid branching
Extending the borders on reference frames so that motion vectors that
point outside the reference frame allows `predict_inter_block()` to
avoid some branches to clamp the sample coordinates in its loops.

This results in about a 25% improvement in decode time of a motion-
heavy YouTube video (~20.8s -> ~15.6s).
2023-04-14 07:11:45 -04:00
Zaggy1024
08b90bb2d0 LibVideo/VP9: Clamp reference frame prediction coords outside loops
Moving the clamping of the coordinates of the reference frame samples
as well as some bounds checks outside of the loop reduces the branches
needed in the `predict_inter_block()` significantly.

This results in a whopping ~41% improvement in decode performance
of an inter-prediction-heavy YouTube video (~35.4s -> ~20.8s).
2023-04-14 07:11:45 -04:00
Zaggy1024
bc49af08b4 LibVideo/VP9: Pre-calculate inter-frames' reference frame scale factors
Changing the calculation of reference frame scale factors to be done on
a per-frame basis reduces the amount of work done in
`predict_inter_block()`, which is a big hotspot in most videos.

This reduces decode times in a test video from YouTube by about 5%
(~37.2s -> ~35.4s).
2023-04-14 07:11:45 -04:00
Zaggy1024
5cd5edc3bd LibVideo/VP9: Copy data to reference frames row by row
This changes the order of the loop copying data to a reference frame
store so that it copies each row in a contiguous line rather than
copying a column at a time, which caused unnecessary branches.

This reduces the decode time on a fairly long 720p YouTube video by
about 14.5% (~43.5s to ~37.2s).
2023-04-14 07:11:45 -04:00
Zaggy1024
e6c3b0e495 LibVideo/VP9: Rename round_2() to rounded_right_shift() for clarity 2023-02-10 23:34:37 +01:00
Zaggy1024
33ff3427eb LibVideo/VP9: Drop the decoder intermediate bounds checks
Checking the bounds of the intermediate values was only implemented to
help debug the decoder. However, it is non-fatal to have the values
exceed the spec-defined bounds, and causes a measurable performance
reduction.

Additionally, the checks were implemented as an assertion, which is
easily broken by bad input files.

I see about a 4-5% decrease in decoding times in the `webm_in_vp9` test
in TestVP9Decode.
2023-02-10 23:34:37 +01:00
Zaggy1024
7b92eff4a6 LibVideo/VP9: Use u32 to store the parsed value counts
There were rare cases in which u8 was not large enough for the total
count of values read, and increasing this to u32 should have no real
effect on performance (hopefully).
2023-02-03 09:10:14 +01:00
Zaggy1024
f58c5ff569 LibVideo/VP9: Correct the mode/partition probability adaption counts 2023-02-03 09:10:14 +01:00
Zaggy1024
4224f253af LibVideo/VP9: Increase the size of summed boolean counts in merge_probs
This fixes an issue where probabilities that sum to greater than 255
would wrap and cause the maximum probability adaption to take effect.
2023-02-03 09:10:14 +01:00
Linus Groh
9c08bb9555 AK: Remove try_ prefix from FixedArray creation functions 2023-01-28 22:41:36 +01:00
Zaggy1024
f5ea6c89df LibVideo/VP9: Put reference frames into a struct 2022-11-30 08:28:30 +01:00
Zaggy1024
316dad7bf7 LibVideo/VP9: Remove m_tokens and m_token_cache from Parser
Only the residual tokens array needs to be kept for the transforms to
use after all the tokens have been parsed. The token cache is able to
be kept in the stack only for the duration of the token parsing loop.
2022-11-30 08:28:30 +01:00
Zaggy1024
a4f14f220d LibVideo/VP9: Fully qualify all reference frame type enum values
Since the enum is used as an index to arrays, it unfortunately can't
be converted to an enum class, but at least we can make sure to use it
with the qualified enum name to make things a bit clearer.
2022-11-30 08:28:30 +01:00
Zaggy1024
db9f1a18f8 LibVideo/VP9: Convert TransformMode to an enum class
TXModeSelect was also renamed to plain Select, since the qualified name
will be TransformMode::Select.
2022-11-30 08:28:30 +01:00
Zaggy1024
1a2d8ac40c LibVideo/VP9: Prefix TransformSize with Transform_ instead of TX_ 2022-11-30 08:28:30 +01:00
Zaggy1024
f6e645a153 LibVideo/VP9: Rename TX(Mode|Size) to Transform(Mode|Size) 2022-11-30 08:28:30 +01:00
Zaggy1024
f4af6714d2 LibVideo/VP9: Move persistent context storage to a different header
Moving these to another header allows Parser.h to include less context
structs/classes that were previously in Context.h.

This change will also allow consolidating some common calculations into
Context.h, since we won't be polluting the VP9 namespace as much. There
are quite a few duplicate calculations for block size, transform size,
number of horizontal and vertical sub-blocks per block, all of which
could be moved to Context.h to allow for code deduplication and more
semantic code where those calculations are needed.
2022-11-30 08:28:30 +01:00
Zaggy1024
facb779b99 LibVideo/VP9: Replace (DCT|ADST)_(DCT_ADST) with struct TransformSet
Those previous constants were only set and used to select the first and
second transforms done by the Decoder class. By turning it into a
struct, we can make the code a bit more legible while keeping those
transform modes the same size as before or smaller.
2022-11-30 08:28:30 +01:00
Zaggy1024
b6f41fe7d9 LibVideo/VP9: Pass the sub-block transform type around as a parameter
The sub-block transform types set and then used in a very small scope,
so now it is just stored in a variable and passed to the two functions
that need it, Parser::tokens() and Decoder::reconstruct().
2022-11-30 08:28:30 +01:00
Zaggy1024
fedbc12c4d LibVideo/VP9: Move segmentation parameters to FrameContext
Note that some of the previous segmentation feature settings must be
preserved when a frame is decoded that doesn't use segmentation.

This change also allowed a few functions in Decoder to be made static.
2022-11-30 08:28:30 +01:00
Zaggy1024
f4761dab09 LibVideo/VP9: Index inter-frame references with named fields or an enum
Previously, we were using size_t, often coerced from bool or u8, to
index reference pairs. Now, they must either be taken directly from
named fields or indexed using the `ReferenceIndex` enum with options
`primary` and `secondary`. With a more explicit method of indexing
these, the compiler can aid in using reference pairs correctly, and
fuzzers may be able to detect undefined behavior more easily.
2022-11-30 08:28:30 +01:00
Zaggy1024
b966f9d811 LibVideo/VP9: Move the transform mode field from Parser to FrameContext 2022-11-30 08:28:30 +01:00
Zaggy1024
6533c5f6a8 LibVideo/VP9: Move more block fields into the BlockContext struct
This includes the segment IDs, transform block sizes, prediction modes,
sub-block counts, interpolation filters and sub-block motion vectors.
2022-11-30 08:28:30 +01:00
Zaggy1024
f4e835635f LibVideo/VP9: Move quantizer indices into FrameContext
This also renames (most?) of the related quantizer functions and
variables to make more sense. I haven't determined what AC/DC stands
for here, but it may be just an arbitrary naming scheme for the first
and subsequent coefficients used to quantize the residuals for a block.
2022-11-30 08:28:30 +01:00
Zaggy1024
90f16c78fa LibVideo/VP9: Move fields set in uncompressed_header() to FrameContext 2022-11-30 08:28:30 +01:00
Zaggy1024
40bc987fe3 LibVideo/VP9: Store color config in the frame context
The color config is reused for most inter predicted frames, so we use a
struct ColorConfig to store the config from intra frames, and put it in
a field in Parser to copy from when an inter frame without color config
is encountered.
2022-11-30 08:28:30 +01:00
Zaggy1024
3259c99cab LibVideo/VP9: Choose whether/how to show new frames using an enum
There are three mutually exclusive frame-showing states:
- Show no new frame, only store the frame as a reference.
- Show a newly decoded frame.
- Show frame from the reference frame store.
Since they are mutually exclusive, using an enum rather than two bools
makes more sense.
2022-11-30 08:28:30 +01:00
Zaggy1024
befcd479ae LibVideo/VP9: Add Frame, Tile and Block context structs
These are used to pass context needed for decoding, with mutability
scoped only to the sections that the function receiving the contexts
needs to modify. This allows lifetimes of data to be more explicit
rather than being stored in fields, as well as preventing tile threads
from modifying outside their allowed bounds.
2022-11-30 08:28:30 +01:00
Zaggy1024
10d207959d LibVideo/VP9: Remove m_mi_row and col fields from the parser
These are now passed as parameters to each function that uses them.
These will later be moved to a struct to further reduce the amount of
parameters that get passed around.

Above and left per-frame block contexts are now also parameters passed
to the functions that use them instead of being retrieved when needed
from a field. This will allow them to be more easily moved to a tile-
specific context later.
2022-11-30 08:28:30 +01:00
Zaggy1024
4a4aa697d9 LibVideo/VP9: Use a struct for block context to keep between frames
There are three fields that we need to store from FrameBlockContext to
keep between frames, which are used to parse for those same fields for
the next frame.
2022-11-30 08:28:30 +01:00
Zaggy1024
5275a1101e LibVideo/VP9: Remove dump_frame_info() function from Decoder
The function serves no purpose now, any debug information we want to
pull from the decoder should be instead accessed by some other yet to
be created interface.
2022-11-30 08:28:30 +01:00
Zaggy1024
0638c5d2b8 LibVideo/VP9: Use a class to store 2D context information 2022-11-30 08:28:30 +01:00
Zaggy1024
44413c31a9 LibVideo/VP9: Store data used between decode_block calls in a struct
All state that needed to persist between calls to decode_block was
previously stored in plain Vector fields. This moves them into a struct
which sets a more explicit lifetime on that data. It may be possible to
store this data on the stack of a function with the appropriate
lifetime now that it is split into its own struct.
2022-11-30 08:28:30 +01:00
Zaggy1024
eafc048101 LibVideo/VP9: Remove a FIXME that is impossible to fix
We can't memset an array with 32-bit integers to non-zero values,
silly past me :^)
2022-11-30 08:28:30 +01:00
Zaggy1024
5f7099cff6 LibVideo/VP9: Apply higher optimization levels to Decoder and Parser
With this change, decode times on GCC as measured by TestVP9Decode are
reduced by about 15%. Not a bad improvement for a few added lines :^)
2022-11-30 07:55:29 +01:00
Zaggy1024
7514e49c17 LibVideo: Make all VP9 block intermediates stack-allocated arrays
This has two benefits:
- I observed a ~34% decrease in decoding time running TestVP9Decode.
- Removing all of these silly Vector fields helps simplify the code
  relationships between all the functions in Decoder.cpp. It'll also be
  much easier to make these static with template specializations, if
  that turns out to be worthy performance improvement.
2022-11-25 02:44:18 +03:30
Zaggy1024
981997c039 LibVideo: Combine VP9's Intra- and InterMode enums into PredictionMode
The two different mode sets are stored in single fields, and the
underlying values didn't overlap, so there was no reason to keep them
separate.

The enum is now an enum class as well, to enforce that almost all uses
of the enum are named. The only case where underlying values are used
is in lookup tables, but it may be worth abstracting that as well to
make array bounds more clear.
2022-11-12 10:17:27 -07:00
Zaggy1024
1c6d0a9777 LibVideo: Use Gfx::Size for VP9 frame sizes
Frame sizes will now be represented by Gfx::Size instead of storing
width and height separately.
2022-11-12 10:17:27 -07:00
Zaggy1024
40b0bb0914 LibVideo: Change all Span<u8 const> to ReadonlyBytes 2022-11-12 10:17:27 -07:00