serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-27 11:32:34 +00:00

Author	SHA1	Message	Date
Zaggy1024	1422f7f904	LibVideo/VP9: Revert framebuffer size reduction to allow OOB blocks The framebuffer size was reduced in `f2c0cee`, but this caused some niche block layouts to write outside of the frame. This could be fixed by adding checks to see if a block being predicted/ reconstructed is within the frame, but the branches introduced by that reduce performance slightly. Therefore, it's better to keep the framebuffer sized according to the decoded frame size in 8x8 blocks so that any block can be decoded without bounds checking. A test was added to ensure that this continues to work.	2023-05-02 07:00:46 -04:00
Zaggy1024	2ec043c4db	LibVideo/VP9: Make inter-prediction fast path accumulators 32-bit Some occasional cases could cause the accumulator to overflow and have an incorrect result. It would be nice to use a smaller accumulator, but it seems not to be correct. :^( We now cast to i16 to allow 128-bit vectorization to make use of one whole register instead of having to split the loop into multiple. This results in about a 5% reduction in performance in my testing.	2023-04-30 05:58:27 +02:00
Zaggy1024	d6b867ba89	LibVideo/VP9: Force inlining of `inverse_transform_2d()` and the IDCT Clang was reluctant to inline these for some reason. However, inlining them seems to be quite beneficial, reducing decoding time in an intra- heavy video by about 21% (~12.7s -> ~10.0s).	2023-04-25 17:44:36 -04:00
Zaggy1024	90c0e1ad8f	LibVideo/VP9: Pre-calculate the quantizers at the start of each frame Quantizers are a constant for the whole frame, except when segment features override them, in which case they are a constant per segment ID. We take advantage of this by pre-calculating those after reading the quantization parameters and segmentation features for a frame. This results in a small 1.5% improvement (~12.9s -> ~12.7s).	2023-04-25 17:44:36 -04:00
Zaggy1024	094b0d8a78	LibVideo/VP9: Use an enum to select segment features This throws out some ugly `#define`s we had that were taking the role of an enum anyway. We now have some nice getters in the contexts that take the place of the combo of `seg_feature_active()` and then doing a lookup in `FrameContext::m_segmentation_features` directly.	2023-04-25 17:44:36 -04:00
Zaggy1024	6e6cc1ddb2	LibVideo/VP9: Make a lookup table for bit reversals Bit reversals are used very often in intra-predicted frames. Turning these into a constexpr lookup table reduces the branching needed for block transforms significantly. This reduces the times spent decoding an intra-heavy 1080p video by about 9% (~14.3s -> ~12.9s).	2023-04-25 17:44:36 -04:00
Zaggy1024	f6764beead	LibVideo/VP9: Specialize transforms on their block size Previously, the block sizes would be checked at runtime to determine the transform size to apply for residuals. Making the block sizes into constant expressions allows all the loops to be unrolled and reduces branching significantly. This results in about a 26% improvement (~18s -> ~13.2s) in speed in an intra-heavy test video.	2023-04-25 17:44:36 -04:00
Zaggy1024	8ad0dff5c2	LibVideo/VP9: Implement unscaled fast paths in inter prediction Inter-prediction convolution filters are selected based on the subpixel position determined for the motion vector relative to the block being predicted. The subpixel position 0 only uses one single sample in the center of the convolution, not averaging any other samples. Let's call this a copy. Reference frames can also be a different size relative to the frame being predicted, but in almost every case, that scale will be 1:1 for every single frame in a video. Taking into account these facts, we can create multiple fast paths for inter prediction. These fast paths are only active when scaling is 1:1. If we are doing a copy in both dimensions, then we can do a straight memcpy from the reference frame to the output block buffer. In videos where there is no motion, this is a dramatic speedup. If we are doing a copy in one dimension, we can just do one convolution and average directly into the output block buffer. If we aren't doing a copy in either dimension, we can still cut out a few operations from the convolution loops, since we only need to advance our samples by whole pixels instead of subpixels. These fast paths result in about a 34% improvement (~31.2s -> ~20.6s) in a video which relies heavily on intra-predicted blocks due to high motion. In videos with less motion, the improvement will be even greater. Also, note that the accumulators in these faster loops are only 16-bit. High bit-depth videos will overflow those, so for now the fast path is only used for 8-bit videos.	2023-04-25 17:44:36 -04:00
Zaggy1024	8cd72ad1ed	LibVideo/VP9: Use the Y scale value in `predict_inter_block()` A typo caused the Y scale value to never be used, so if a reference frame's aspect ratio didn't match up with the current frame's, it would decode incorrectly. Some comments have been added to clarify the frame-constants used in the function as well.	2023-04-25 17:44:36 -04:00
Zaggy1024	f2c0cee522	LibVideo/VP9: Consolidate frame size calculations This moves all the frame size calculation to `FrameContext`, where the subsampling is easily accessible to determine the size for each plane. The internal framebuffer size has also been reduced to the exact frame size that is output.	2023-04-25 17:44:36 -04:00
Zaggy1024	57c7389200	LibVideo/VP9: Fix rounding of components in the motion vector selection The division in the `round_mv_...()` functions contained in the motion vector selection process was done by bit shifting right. However, since bit shifting negative values will truncate towards the negative end, it was flooring instead of rounding. This changes it to match the spec and rely on the compiler to simplify down to a bit shift.	2023-04-25 17:44:36 -04:00
Zaggy1024	eba72fa3a7	LibVideo/VP9: Wait for workers to finish when there are decoding errors Previously, the `Parser::decode_tiles()` function wouldn't wait for the tile-decoding workers to finish before exiting the function, which could mean that the data the threads are working with could become invalid if the decoder is deleted after an error is encountered.	2023-04-25 06:35:13 -04:00
Timothy Flynn	f9d8e42636	LibVideo: Allocate Vector2D underlying storage with new, not malloc Using malloc does not invoke T's constructor, nor were were invoking T's constructor ourselves. Accessing T without invoking its constructor is undefined behavior.	2023-04-25 02:11:11 -06:00
Timothy Flynn	8b7c5db186	LibVideo: Do not invoke Vector2D's destructor in order to resize it This leads to all kinds of undefined behavior, especially when callers have a view into the Vector2D.	2023-04-24 18:21:01 +02:00
Zaggy1024	036eb82aca	LibVideo/VP9: Implement threaded tile column decoding This adds a new WorkerThread class to run one task asynchronously, and allow waiting for that thread to finish its work. TileContexts are placed into multiple tile column vectors with their streams to read from pre-created. Once those are ready, the threads can start their work on each vector separately. The main thread waits for those tasks to finish, then sums up the syntax element counts for each tile that was decoded.	2023-04-23 23:14:30 +02:00
Zaggy1024	1fcac52e77	LibVideo/VP9: Count syntax elements in TileContext, and sum at the end Syntax element counters were previously accessed across tiles, which would cause a race condition updating the counts in a tile-threaded mode.	2023-04-23 23:14:30 +02:00
Zaggy1024	a8604d9356	LibVideo/VP9: Fallibly allocate the probability tables	2023-04-23 23:14:30 +02:00
Zaggy1024	8ce4245214	LibVideo/VP9: Return Corrupted error when tile range decoder init fails Previously, we were incorrectly wrapping an error from `BooleanDecoder` initialization in a `DecoderErrorCategory::Memory` error. This caused an incorrect error message in VideoPlayer. Now it will instead return `DecoderErrorCategory::Corrupted`.	2023-04-23 23:14:30 +02:00
Zaggy1024	5e3192c8d9	LibVideo/VP9: Extend the borders on reference frames to avoid branching Extending the borders on reference frames so that motion vectors that point outside the reference frame allows `predict_inter_block()` to avoid some branches to clamp the sample coordinates in its loops. This results in about a 25% improvement in decode time of a motion- heavy YouTube video (~20.8s -> ~15.6s).	2023-04-14 07:11:45 -04:00
Zaggy1024	08b90bb2d0	LibVideo/VP9: Clamp reference frame prediction coords outside loops Moving the clamping of the coordinates of the reference frame samples as well as some bounds checks outside of the loop reduces the branches needed in the `predict_inter_block()` significantly. This results in a whopping ~41% improvement in decode performance of an inter-prediction-heavy YouTube video (~35.4s -> ~20.8s).	2023-04-14 07:11:45 -04:00
Zaggy1024	bc49af08b4	LibVideo/VP9: Pre-calculate inter-frames' reference frame scale factors Changing the calculation of reference frame scale factors to be done on a per-frame basis reduces the amount of work done in `predict_inter_block()`, which is a big hotspot in most videos. This reduces decode times in a test video from YouTube by about 5% (~37.2s -> ~35.4s).	2023-04-14 07:11:45 -04:00
Zaggy1024	5cd5edc3bd	LibVideo/VP9: Copy data to reference frames row by row This changes the order of the loop copying data to a reference frame store so that it copies each row in a contiguous line rather than copying a column at a time, which caused unnecessary branches. This reduces the decode time on a fairly long 720p YouTube video by about 14.5% (~43.5s to ~37.2s).	2023-04-14 07:11:45 -04:00
Ben Wiederhake	560133a0c6	Everywhere: Remove unused DeprecatedString includes	2023-04-09 22:00:54 +02:00
Zaggy1024	fb0c226da3	LibVideo/VP9: Convert the Parser to use AK/BitStream.h This doesn't appear to have had a measurable impact on performance, and behavior is the same. With the tiles using independent BooleanDecoders with their own backing BitStreams, we're even one step closer to threaded tiles!	2023-02-13 00:22:23 +00:00
Zaggy1024	e6c3b0e495	LibVideo/VP9: Rename `round_2()` to `rounded_right_shift()` for clarity	2023-02-10 23:34:37 +01:00
Zaggy1024	33ff3427eb	LibVideo/VP9: Drop the decoder intermediate bounds checks Checking the bounds of the intermediate values was only implemented to help debug the decoder. However, it is non-fatal to have the values exceed the spec-defined bounds, and causes a measurable performance reduction. Additionally, the checks were implemented as an assertion, which is easily broken by bad input files. I see about a 4-5% decrease in decoding times in the `webm_in_vp9` test in TestVP9Decode.	2023-02-10 23:34:37 +01:00
Nico Weber	89b98830f6	LibVideo: Rename "ColorRange" to "VideoFullRangeFlag" That matches the terminology used in ITU-T Rec. H.273, PNG's cICP chunk, and the ICC cicpTag. Also change the enum values to match the values in the spec -- 0 means "not full range" and 1 means "full range". (For now, keep the "Unspecified" entry around, and give it value 2. This value is not in the spec.) No intended behavior change.	2023-02-09 16:35:08 +00:00
Zaggy1024	df313c3dc5	LibVideo/VP9: Clamp motion vectors again in find_mv_refs function The clamping was previously removed apparently, which was unintended and caused some files to fail to decode properly.	2023-02-08 18:56:42 +00:00
Zaggy1024	c18728989e	LibVideo/VP9: Remove magic numbers for the uncompressed ref frames This also adds a fixme to Symbols.h to group and rename the definitions in the file.	2023-02-08 18:56:42 +00:00
Zaggy1024	24f3069129	LibVideo/VP9: Remove debug output from TreeParser	2023-02-08 18:56:42 +00:00
Zaggy1024	0f45153bbb	LibVideo/VP9: Use proper indices for updating inter_mode probabilities I previously changed it to use the absolute inter-prediction mode values instead of the ones relative to NearestMv. That caused the probability adaption to take invalid indices from the counts and broke certain videos. Now it will just convert to the PredictionMode enum when returning from parse_inter_mode, which allows us to still use it the same as before.	2023-02-03 09:10:14 +01:00
Zaggy1024	7b92eff4a6	LibVideo/VP9: Use u32 to store the parsed value counts There were rare cases in which u8 was not large enough for the total count of values read, and increasing this to u32 should have no real effect on performance (hopefully).	2023-02-03 09:10:14 +01:00
Zaggy1024	69e9f9ff63	LibVideo/VP9: Prevent negation overflow in BitStream::read_s	2023-02-03 09:10:14 +01:00
Zaggy1024	f58c5ff569	LibVideo/VP9: Correct the mode/partition probability adaption counts	2023-02-03 09:10:14 +01:00
Zaggy1024	4224f253af	LibVideo/VP9: Increase the size of summed boolean counts in merge_probs This fixes an issue where probabilities that sum to greater than 255 would wrap and cause the maximum probability adaption to take effect.	2023-02-03 09:10:14 +01:00
Linus Groh	9c08bb9555	AK: Remove `try_` prefix from FixedArray creation functions	2023-01-28 22:41:36 +01:00
Sam Atkins	5cc4b37bf3	LibVideo: Remove declarations for non-existent methods	2023-01-27 20:33:18 +00:00
Zaggy1024	42606c87e3	LibVideo/VP9: Move TreeSelection class to TreeParser.cpp The class no longer needs to be defined in the header, as it is only used in static functions.	2023-01-24 14:55:51 +00:00
Nico Weber	19d3821354	LibVideo: Fix two comment typos	2022-12-27 07:44:37 -07:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Zaggy1024	b1c7bbc4ba	LibVideo/VP9: Make get_tile_offset static and remove magic numbers This can use the new utility functions for converting units now.	2022-11-30 08:28:30 +01:00
Zaggy1024	f5ea6c89df	LibVideo/VP9: Put reference frames into a struct	2022-11-30 08:28:30 +01:00
Zaggy1024	e6b696fe24	LibVideo/VP9: Remove now-unused clear_context function from Parser	2022-11-30 08:28:30 +01:00
Zaggy1024	71aac25635	LibVideo/VP9: Move partitioning contexts to TileContext Like the non-zero tokens and segmentation IDs, these can be moved into the tile decoding loop for above context and allocated by TileContext for left context.	2022-11-30 08:28:30 +01:00
Zaggy1024	720fc5a853	LibVideo/VP9: Use unit conversion functions in BlockContext This should make things in there seem a little less magical :^)	2022-11-30 08:28:30 +01:00
Zaggy1024	1fe22f2141	LibVideo/VP9: Move segmentation id prediction context to TileContext These can also be stored in the same places as the non-zero tokens contexts.	2022-11-30 08:28:30 +01:00
Zaggy1024	9df72080a1	LibVideo/VP9: Add FIXME about implementation of tiled decoding	2022-11-30 08:28:30 +01:00
Zaggy1024	2f043a0bd4	LibVideo/VP9: Move the above non-zero tokens context into decode_tiles We can store this context in the stack of Parser::decode_tiles and use spans to give access to the sections of the context for each tile and subsequently each block.	2022-11-30 08:28:30 +01:00
Zaggy1024	4e7e9d8479	LibVideo/VP9: Move the left non-zero tokens context to TileContext The array containing the vertical line of bools indicating whether non- zero tokens were decoded in each sub-block is moved to TileContext, and a span of the valid range for a block to read and write to is created when we construct a BlockContext.	2022-11-30 08:28:30 +01:00
Zaggy1024	06082d310f	LibVideo/VP9: Split/clean up the token tree-parsing context function Since the context information for parsing residual tokens changes based on whether we're parsing the first coefficient or subsequent ones, the TreeParser::get_tokens_context function was split into two new ones to allow them to read more cleanly. All variables now have meaningful names to aid in readability as well. The math used in the function for the first token was changed to be more friendly to tile- or block-specific coordinates to facilitate range-restricted Spans of the above and left context arrays.	2022-11-30 08:28:30 +01:00

1 2 3 4

178 commits