serenity

mirror of https://github.com/RGBCube/serenity synced 2025-07-01 05:22:08 +00:00

Author	SHA1	Message	Date
Zaggy1024	2ec043c4db	LibVideo/VP9: Make inter-prediction fast path accumulators 32-bit Some occasional cases could cause the accumulator to overflow and have an incorrect result. It would be nice to use a smaller accumulator, but it seems not to be correct. :^( We now cast to i16 to allow 128-bit vectorization to make use of one whole register instead of having to split the loop into multiple. This results in about a 5% reduction in performance in my testing.	2023-04-30 05:58:27 +02:00
Zaggy1024	d6b867ba89	LibVideo/VP9: Force inlining of `inverse_transform_2d()` and the IDCT Clang was reluctant to inline these for some reason. However, inlining them seems to be quite beneficial, reducing decoding time in an intra- heavy video by about 21% (~12.7s -> ~10.0s).	2023-04-25 17:44:36 -04:00
Zaggy1024	90c0e1ad8f	LibVideo/VP9: Pre-calculate the quantizers at the start of each frame Quantizers are a constant for the whole frame, except when segment features override them, in which case they are a constant per segment ID. We take advantage of this by pre-calculating those after reading the quantization parameters and segmentation features for a frame. This results in a small 1.5% improvement (~12.9s -> ~12.7s).	2023-04-25 17:44:36 -04:00
Zaggy1024	094b0d8a78	LibVideo/VP9: Use an enum to select segment features This throws out some ugly `#define`s we had that were taking the role of an enum anyway. We now have some nice getters in the contexts that take the place of the combo of `seg_feature_active()` and then doing a lookup in `FrameContext::m_segmentation_features` directly.	2023-04-25 17:44:36 -04:00
Zaggy1024	6e6cc1ddb2	LibVideo/VP9: Make a lookup table for bit reversals Bit reversals are used very often in intra-predicted frames. Turning these into a constexpr lookup table reduces the branching needed for block transforms significantly. This reduces the times spent decoding an intra-heavy 1080p video by about 9% (~14.3s -> ~12.9s).	2023-04-25 17:44:36 -04:00
Zaggy1024	f6764beead	LibVideo/VP9: Specialize transforms on their block size Previously, the block sizes would be checked at runtime to determine the transform size to apply for residuals. Making the block sizes into constant expressions allows all the loops to be unrolled and reduces branching significantly. This results in about a 26% improvement (~18s -> ~13.2s) in speed in an intra-heavy test video.	2023-04-25 17:44:36 -04:00
Zaggy1024	8ad0dff5c2	LibVideo/VP9: Implement unscaled fast paths in inter prediction Inter-prediction convolution filters are selected based on the subpixel position determined for the motion vector relative to the block being predicted. The subpixel position 0 only uses one single sample in the center of the convolution, not averaging any other samples. Let's call this a copy. Reference frames can also be a different size relative to the frame being predicted, but in almost every case, that scale will be 1:1 for every single frame in a video. Taking into account these facts, we can create multiple fast paths for inter prediction. These fast paths are only active when scaling is 1:1. If we are doing a copy in both dimensions, then we can do a straight memcpy from the reference frame to the output block buffer. In videos where there is no motion, this is a dramatic speedup. If we are doing a copy in one dimension, we can just do one convolution and average directly into the output block buffer. If we aren't doing a copy in either dimension, we can still cut out a few operations from the convolution loops, since we only need to advance our samples by whole pixels instead of subpixels. These fast paths result in about a 34% improvement (~31.2s -> ~20.6s) in a video which relies heavily on intra-predicted blocks due to high motion. In videos with less motion, the improvement will be even greater. Also, note that the accumulators in these faster loops are only 16-bit. High bit-depth videos will overflow those, so for now the fast path is only used for 8-bit videos.	2023-04-25 17:44:36 -04:00
Zaggy1024	8cd72ad1ed	LibVideo/VP9: Use the Y scale value in `predict_inter_block()` A typo caused the Y scale value to never be used, so if a reference frame's aspect ratio didn't match up with the current frame's, it would decode incorrectly. Some comments have been added to clarify the frame-constants used in the function as well.	2023-04-25 17:44:36 -04:00
Zaggy1024	f2c0cee522	LibVideo/VP9: Consolidate frame size calculations This moves all the frame size calculation to `FrameContext`, where the subsampling is easily accessible to determine the size for each plane. The internal framebuffer size has also been reduced to the exact frame size that is output.	2023-04-25 17:44:36 -04:00
Zaggy1024	57c7389200	LibVideo/VP9: Fix rounding of components in the motion vector selection The division in the `round_mv_...()` functions contained in the motion vector selection process was done by bit shifting right. However, since bit shifting negative values will truncate towards the negative end, it was flooring instead of rounding. This changes it to match the spec and rely on the compiler to simplify down to a bit shift.	2023-04-25 17:44:36 -04:00
Zaggy1024	1fcac52e77	LibVideo/VP9: Count syntax elements in TileContext, and sum at the end Syntax element counters were previously accessed across tiles, which would cause a race condition updating the counts in a tile-threaded mode.	2023-04-23 23:14:30 +02:00
Zaggy1024	5e3192c8d9	LibVideo/VP9: Extend the borders on reference frames to avoid branching Extending the borders on reference frames so that motion vectors that point outside the reference frame allows `predict_inter_block()` to avoid some branches to clamp the sample coordinates in its loops. This results in about a 25% improvement in decode time of a motion- heavy YouTube video (~20.8s -> ~15.6s).	2023-04-14 07:11:45 -04:00
Zaggy1024	08b90bb2d0	LibVideo/VP9: Clamp reference frame prediction coords outside loops Moving the clamping of the coordinates of the reference frame samples as well as some bounds checks outside of the loop reduces the branches needed in the `predict_inter_block()` significantly. This results in a whopping ~41% improvement in decode performance of an inter-prediction-heavy YouTube video (~35.4s -> ~20.8s).	2023-04-14 07:11:45 -04:00
Zaggy1024	bc49af08b4	LibVideo/VP9: Pre-calculate inter-frames' reference frame scale factors Changing the calculation of reference frame scale factors to be done on a per-frame basis reduces the amount of work done in `predict_inter_block()`, which is a big hotspot in most videos. This reduces decode times in a test video from YouTube by about 5% (~37.2s -> ~35.4s).	2023-04-14 07:11:45 -04:00
Zaggy1024	5cd5edc3bd	LibVideo/VP9: Copy data to reference frames row by row This changes the order of the loop copying data to a reference frame store so that it copies each row in a contiguous line rather than copying a column at a time, which caused unnecessary branches. This reduces the decode time on a fairly long 720p YouTube video by about 14.5% (~43.5s to ~37.2s).	2023-04-14 07:11:45 -04:00
Zaggy1024	e6c3b0e495	LibVideo/VP9: Rename `round_2()` to `rounded_right_shift()` for clarity	2023-02-10 23:34:37 +01:00
Zaggy1024	33ff3427eb	LibVideo/VP9: Drop the decoder intermediate bounds checks Checking the bounds of the intermediate values was only implemented to help debug the decoder. However, it is non-fatal to have the values exceed the spec-defined bounds, and causes a measurable performance reduction. Additionally, the checks were implemented as an assertion, which is easily broken by bad input files. I see about a 4-5% decrease in decoding times in the `webm_in_vp9` test in TestVP9Decode.	2023-02-10 23:34:37 +01:00
Zaggy1024	7b92eff4a6	LibVideo/VP9: Use u32 to store the parsed value counts There were rare cases in which u8 was not large enough for the total count of values read, and increasing this to u32 should have no real effect on performance (hopefully).	2023-02-03 09:10:14 +01:00
Zaggy1024	f58c5ff569	LibVideo/VP9: Correct the mode/partition probability adaption counts	2023-02-03 09:10:14 +01:00
Zaggy1024	4224f253af	LibVideo/VP9: Increase the size of summed boolean counts in merge_probs This fixes an issue where probabilities that sum to greater than 255 would wrap and cause the maximum probability adaption to take effect.	2023-02-03 09:10:14 +01:00
Linus Groh	9c08bb9555	AK: Remove `try_` prefix from FixedArray creation functions	2023-01-28 22:41:36 +01:00
Zaggy1024	f5ea6c89df	LibVideo/VP9: Put reference frames into a struct	2022-11-30 08:28:30 +01:00
Zaggy1024	316dad7bf7	LibVideo/VP9: Remove m_tokens and m_token_cache from Parser Only the residual tokens array needs to be kept for the transforms to use after all the tokens have been parsed. The token cache is able to be kept in the stack only for the duration of the token parsing loop.	2022-11-30 08:28:30 +01:00
Zaggy1024	a4f14f220d	LibVideo/VP9: Fully qualify all reference frame type enum values Since the enum is used as an index to arrays, it unfortunately can't be converted to an enum class, but at least we can make sure to use it with the qualified enum name to make things a bit clearer.	2022-11-30 08:28:30 +01:00
Zaggy1024	db9f1a18f8	LibVideo/VP9: Convert TransformMode to an enum class TXModeSelect was also renamed to plain Select, since the qualified name will be TransformMode::Select.	2022-11-30 08:28:30 +01:00
Zaggy1024	1a2d8ac40c	LibVideo/VP9: Prefix TransformSize with Transform_ instead of TX_	2022-11-30 08:28:30 +01:00
Zaggy1024	f6e645a153	LibVideo/VP9: Rename TX(Mode\|Size) to Transform(Mode\|Size)	2022-11-30 08:28:30 +01:00
Zaggy1024	f4af6714d2	LibVideo/VP9: Move persistent context storage to a different header Moving these to another header allows Parser.h to include less context structs/classes that were previously in Context.h. This change will also allow consolidating some common calculations into Context.h, since we won't be polluting the VP9 namespace as much. There are quite a few duplicate calculations for block size, transform size, number of horizontal and vertical sub-blocks per block, all of which could be moved to Context.h to allow for code deduplication and more semantic code where those calculations are needed.	2022-11-30 08:28:30 +01:00
Zaggy1024	facb779b99	LibVideo/VP9: Replace (DCT\|ADST)_(DCT_ADST) with struct TransformSet Those previous constants were only set and used to select the first and second transforms done by the Decoder class. By turning it into a struct, we can make the code a bit more legible while keeping those transform modes the same size as before or smaller.	2022-11-30 08:28:30 +01:00
Zaggy1024	b6f41fe7d9	LibVideo/VP9: Pass the sub-block transform type around as a parameter The sub-block transform types set and then used in a very small scope, so now it is just stored in a variable and passed to the two functions that need it, Parser::tokens() and Decoder::reconstruct().	2022-11-30 08:28:30 +01:00
Zaggy1024	fedbc12c4d	LibVideo/VP9: Move segmentation parameters to FrameContext Note that some of the previous segmentation feature settings must be preserved when a frame is decoded that doesn't use segmentation. This change also allowed a few functions in Decoder to be made static.	2022-11-30 08:28:30 +01:00
Zaggy1024	f4761dab09	LibVideo/VP9: Index inter-frame references with named fields or an enum Previously, we were using size_t, often coerced from bool or u8, to index reference pairs. Now, they must either be taken directly from named fields or indexed using the `ReferenceIndex` enum with options `primary` and `secondary`. With a more explicit method of indexing these, the compiler can aid in using reference pairs correctly, and fuzzers may be able to detect undefined behavior more easily.	2022-11-30 08:28:30 +01:00
Zaggy1024	b966f9d811	LibVideo/VP9: Move the transform mode field from Parser to FrameContext	2022-11-30 08:28:30 +01:00
Zaggy1024	6533c5f6a8	LibVideo/VP9: Move more block fields into the BlockContext struct This includes the segment IDs, transform block sizes, prediction modes, sub-block counts, interpolation filters and sub-block motion vectors.	2022-11-30 08:28:30 +01:00
Zaggy1024	f4e835635f	LibVideo/VP9: Move quantizer indices into FrameContext This also renames (most?) of the related quantizer functions and variables to make more sense. I haven't determined what AC/DC stands for here, but it may be just an arbitrary naming scheme for the first and subsequent coefficients used to quantize the residuals for a block.	2022-11-30 08:28:30 +01:00
Zaggy1024	90f16c78fa	LibVideo/VP9: Move fields set in uncompressed_header() to FrameContext	2022-11-30 08:28:30 +01:00
Zaggy1024	40bc987fe3	LibVideo/VP9: Store color config in the frame context The color config is reused for most inter predicted frames, so we use a struct ColorConfig to store the config from intra frames, and put it in a field in Parser to copy from when an inter frame without color config is encountered.	2022-11-30 08:28:30 +01:00
Zaggy1024	3259c99cab	LibVideo/VP9: Choose whether/how to show new frames using an enum There are three mutually exclusive frame-showing states: - Show no new frame, only store the frame as a reference. - Show a newly decoded frame. - Show frame from the reference frame store. Since they are mutually exclusive, using an enum rather than two bools makes more sense.	2022-11-30 08:28:30 +01:00
Zaggy1024	befcd479ae	LibVideo/VP9: Add Frame, Tile and Block context structs These are used to pass context needed for decoding, with mutability scoped only to the sections that the function receiving the contexts needs to modify. This allows lifetimes of data to be more explicit rather than being stored in fields, as well as preventing tile threads from modifying outside their allowed bounds.	2022-11-30 08:28:30 +01:00
Zaggy1024	10d207959d	LibVideo/VP9: Remove m_mi_row and col fields from the parser These are now passed as parameters to each function that uses them. These will later be moved to a struct to further reduce the amount of parameters that get passed around. Above and left per-frame block contexts are now also parameters passed to the functions that use them instead of being retrieved when needed from a field. This will allow them to be more easily moved to a tile- specific context later.	2022-11-30 08:28:30 +01:00
Zaggy1024	4a4aa697d9	LibVideo/VP9: Use a struct for block context to keep between frames There are three fields that we need to store from FrameBlockContext to keep between frames, which are used to parse for those same fields for the next frame.	2022-11-30 08:28:30 +01:00
Zaggy1024	5275a1101e	LibVideo/VP9: Remove dump_frame_info() function from Decoder The function serves no purpose now, any debug information we want to pull from the decoder should be instead accessed by some other yet to be created interface.	2022-11-30 08:28:30 +01:00
Zaggy1024	0638c5d2b8	LibVideo/VP9: Use a class to store 2D context information	2022-11-30 08:28:30 +01:00
Zaggy1024	44413c31a9	LibVideo/VP9: Store data used between decode_block calls in a struct All state that needed to persist between calls to decode_block was previously stored in plain Vector fields. This moves them into a struct which sets a more explicit lifetime on that data. It may be possible to store this data on the stack of a function with the appropriate lifetime now that it is split into its own struct.	2022-11-30 08:28:30 +01:00
Zaggy1024	eafc048101	LibVideo/VP9: Remove a FIXME that is impossible to fix We can't memset an array with 32-bit integers to non-zero values, silly past me :^)	2022-11-30 08:28:30 +01:00
Zaggy1024	5f7099cff6	LibVideo/VP9: Apply higher optimization levels to Decoder and Parser With this change, decode times on GCC as measured by TestVP9Decode are reduced by about 15%. Not a bad improvement for a few added lines :^)	2022-11-30 07:55:29 +01:00
Zaggy1024	7514e49c17	LibVideo: Make all VP9 block intermediates stack-allocated arrays This has two benefits: - I observed a ~34% decrease in decoding time running TestVP9Decode. - Removing all of these silly Vector fields helps simplify the code relationships between all the functions in Decoder.cpp. It'll also be much easier to make these static with template specializations, if that turns out to be worthy performance improvement.	2022-11-25 02:44:18 +03:30
Zaggy1024	981997c039	LibVideo: Combine VP9's Intra- and InterMode enums into PredictionMode The two different mode sets are stored in single fields, and the underlying values didn't overlap, so there was no reason to keep them separate. The enum is now an enum class as well, to enforce that almost all uses of the enum are named. The only case where underlying values are used is in lookup tables, but it may be worth abstracting that as well to make array bounds more clear.	2022-11-12 10:17:27 -07:00
Zaggy1024	1c6d0a9777	LibVideo: Use Gfx::Size for VP9 frame sizes Frame sizes will now be represented by Gfx::Size instead of storing width and height separately.	2022-11-12 10:17:27 -07:00
Zaggy1024	40b0bb0914	LibVideo: Change all Span<u8 const> to ReadonlyBytes	2022-11-12 10:17:27 -07:00

1 2

86 commits