Quantizers are a constant for the whole frame, except when segment
features override them, in which case they are a constant per segment
ID. We take advantage of this by pre-calculating those after reading
the quantization parameters and segmentation features for a frame.
This results in a small 1.5% improvement (~12.9s -> ~12.7s).
This throws out some ugly `#define`s we had that were taking the role
of an enum anyway. We now have some nice getters in the contexts that
take the place of the combo of `seg_feature_active()` and then doing a
lookup in `FrameContext::m_segmentation_features` directly.
This moves all the frame size calculation to `FrameContext`, where the
subsampling is easily accessible to determine the size for each plane.
The internal framebuffer size has also been reduced to the exact frame
size that is output.
Previously, the `Parser::decode_tiles()` function wouldn't wait for the
tile-decoding workers to finish before exiting the function, which
could mean that the data the threads are working with could become
invalid if the decoder is deleted after an error is encountered.
This adds a new WorkerThread class to run one task asynchronously,
and allow waiting for that thread to finish its work.
TileContexts are placed into multiple tile column vectors with their
streams to read from pre-created. Once those are ready, the threads can
start their work on each vector separately. The main thread waits for
those tasks to finish, then sums up the syntax element counts for each
tile that was decoded.
Previously, we were incorrectly wrapping an error from `BooleanDecoder`
initialization in a `DecoderErrorCategory::Memory` error. This caused
an incorrect error message in VideoPlayer. Now it will instead return
`DecoderErrorCategory::Corrupted`.
Changing the calculation of reference frame scale factors to be done on
a per-frame basis reduces the amount of work done in
`predict_inter_block()`, which is a big hotspot in most videos.
This reduces decode times in a test video from YouTube by about 5%
(~37.2s -> ~35.4s).
This doesn't appear to have had a measurable impact on performance,
and behavior is the same.
With the tiles using independent BooleanDecoders with their own
backing BitStreams, we're even one step closer to threaded tiles!
That matches the terminology used in ITU-T Rec. H.273,
PNG's cICP chunk, and the ICC cicpTag.
Also change the enum values to match the values in the spec --
0 means "not full range" and 1 means "full range".
(For now, keep the "Unspecified" entry around, and give it value 2.
This value is not in the spec.)
No intended behavior change.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Like the non-zero tokens and segmentation IDs, these can be moved into
the tile decoding loop for above context and allocated by TileContext
for left context.
We can store this context in the stack of Parser::decode_tiles and use
spans to give access to the sections of the context for each tile and
subsequently each block.
The array containing the vertical line of bools indicating whether non-
zero tokens were decoded in each sub-block is moved to TileContext, and
a span of the valid range for a block to read and write to is created
when we construct a BlockContext.
Since the context information for parsing residual tokens changes based
on whether we're parsing the first coefficient or subsequent ones, the
TreeParser::get_tokens_context function was split into two new ones to
allow them to read more cleanly. All variables now have meaningful
names to aid in readability as well.
The math used in the function for the first token was changed to
be more friendly to tile- or block-specific coordinates to facilitate
range-restricted Spans of the above and left context arrays.
Only the residual tokens array needs to be kept for the transforms to
use after all the tokens have been parsed. The token cache is able to
be kept in the stack only for the duration of the token parsing loop.
Since the enum is used as an index to arrays, it unfortunately can't
be converted to an enum class, but at least we can make sure to use it
with the qualified enum name to make things a bit clearer.
Previously, the variables were named similarly to the names in spec
which aren't very human-readable. This adds some utility functions for
dimensional unit conversions and names the variables in residual()
based on their units.
References to 4x4 blocks were also renamed to call them sub-blocks
instead, since unit conversion functions would not be able to begin
with "4x4_blocks".
Moving these to another header allows Parser.h to include less context
structs/classes that were previously in Context.h.
This change will also allow consolidating some common calculations into
Context.h, since we won't be polluting the VP9 namespace as much. There
are quite a few duplicate calculations for block size, transform size,
number of horizontal and vertical sub-blocks per block, all of which
could be moved to Context.h to allow for code deduplication and more
semantic code where those calculations are needed.
Those previous constants were only set and used to select the first and
second transforms done by the Decoder class. By turning it into a
struct, we can make the code a bit more legible while keeping those
transform modes the same size as before or smaller.
The sub-block transform types set and then used in a very small scope,
so now it is just stored in a variable and passed to the two functions
that need it, Parser::tokens() and Decoder::reconstruct().
Note that some of the previous segmentation feature settings must be
preserved when a frame is decoded that doesn't use segmentation.
This change also allowed a few functions in Decoder to be made static.
The motion vector joints enum is set up so that the first bit indicates
that a vector should have a non-zero value in the column, and the
second bit indicates a non-zero value for the row. Taking advantage of
this makes the code a bit more legible.
Previously, we were using size_t, often coerced from bool or u8, to
index reference pairs. Now, they must either be taken directly from
named fields or indexed using the `ReferenceIndex` enum with options
`primary` and `secondary`. With a more explicit method of indexing
these, the compiler can aid in using reference pairs correctly, and
fuzzers may be able to detect undefined behavior more easily.
Candidate vector selections are only used to calculate the new vectors
for the current block, so we only need to keep those for the duration
of the inter_block_mode_info() call.
Candidate vectors are now stored in BlockMotionVectorCandidates, which
contains the fields necessary to choose the vector to use to sample
from the selected reference frame.
Most functions related to motion vectors were renamed to more verbose
but meaningful names.
The log2 of tile counts in the horizontal and vertical dimensions are
now stored in the FrameContext struct to be kept only as long as they
are needed.
This also renames (most?) of the related quantizer functions and
variables to make more sense. I haven't determined what AC/DC stands
for here, but it may be just an arbitrary naming scheme for the first
and subsequent coefficients used to quantize the residuals for a block.