Candidate vector selections are only used to calculate the new vectors
for the current block, so we only need to keep those for the duration
of the inter_block_mode_info() call.
Candidate vectors are now stored in BlockMotionVectorCandidates, which
contains the fields necessary to choose the vector to use to sample
from the selected reference frame.
Most functions related to motion vectors were renamed to more verbose
but meaningful names.
The log2 of tile counts in the horizontal and vertical dimensions are
now stored in the FrameContext struct to be kept only as long as they
are needed.
This also renames (most?) of the related quantizer functions and
variables to make more sense. I haven't determined what AC/DC stands
for here, but it may be just an arbitrary naming scheme for the first
and subsequent coefficients used to quantize the residuals for a block.
The color config is reused for most inter predicted frames, so we use a
struct ColorConfig to store the config from intra frames, and put it in
a field in Parser to copy from when an inter frame without color config
is encountered.
There are three mutually exclusive frame-showing states:
- Show no new frame, only store the frame as a reference.
- Show a newly decoded frame.
- Show frame from the reference frame store.
Since they are mutually exclusive, using an enum rather than two bools
makes more sense.
These are used to pass context needed for decoding, with mutability
scoped only to the sections that the function receiving the contexts
needs to modify. This allows lifetimes of data to be more explicit
rather than being stored in fields, as well as preventing tile threads
from modifying outside their allowed bounds.
The field was only used once to track whether residual tokens were
present in the block. Parser::tokens() now returns a bool indicating
whether they were present.
These are now passed as parameters to each function that uses them.
These will later be moved to a struct to further reduce the amount of
parameters that get passed around.
Above and left per-frame block contexts are now also parameters passed
to the functions that use them instead of being retrieved when needed
from a field. This will allow them to be more easily moved to a tile-
specific context later.
There are three fields that we need to store from FrameBlockContext to
keep between frames, which are used to parse for those same fields for
the next frame.
The function serves no purpose now, any debug information we want to
pull from the decoder should be instead accessed by some other yet to
be created interface.
All state that needed to persist between calls to decode_block was
previously stored in plain Vector fields. This moves them into a struct
which sets a more explicit lifetime on that data. It may be possible to
store this data on the stack of a function with the appropriate
lifetime now that it is split into its own struct.
The default intra prediction mode was only used to set the sub-block
modes and the y prediction mode. Instead of storing it in a field, with
the sub modes are stored in an Array, we can just pull the last element
to set the y mode.
When errors are encountered by PlaybackManager, it attempts to switch
states to either Stopped or Corrupted. However, that causes it to set
the last presentation media time to the current playback time while the
last presentation time is unexpectedly negative because the seek never
ended.
Ending the seek before the state changes to Stopped or Corrupted
prevents this situation from happening.
This implements the fastest seeking mode available for tracks with cues
using an array of cue points for each track. It approximates the index
based on the seeking timestamp and then finds the earliest cue point
before the timestamp. The approximation assumes that cues will be on
a regular interval, which I don't believe is always the case, but it
should at least be faster than iterating the whole set of cue points
each time.
Cues are stored per track, but most videos will only have cue points
for the video track(s) that are present. For now, this assumes that it
should only seek based on the cue points for the selected track. To
seek audio in a video file, we should copy the seeked iterator over to
the audio track's iterator after seeking is complete. The iterator will
then skip to the next audio block.
Tracks have a timestamp scale value that should be present which scales
each block's timestamp offset to allow video to be synced with audio.
They should also contain a CodecDelay element and may also contain a
TrackOffset that offsets the block timestamps.
Now that we're able to find the nearest keyframe, we can have a fast
seeking mode that only seeks to keyframes, so that it doesn't have to
also decode inter frames until it reaches the timestamp.
The default is still accurate seeking, so that the entire seeking
implementation can be tested.
This just searches sequentially through each block in a SampleIterator
until it finds a block after the specified seek timestamp. Once it
finds one, it will try to set the input/output iterator to the most
recent keyframe. If the iterator's original position is closer to the
target, however, it leaves it at that original position, allowing
callers to continue decoding from that position until they reach the
target timestamp.
This implements the PlaybackManager portion of seeking, so that it can
seek to any frame in the video by dropping all preceding frames until
it reaches the seek point.
MatroskaDemuxer currently will only seek to the start of the video.
That means that every seek has to drop all the frames until it comes
across the target timestamp.
The PlaybackManager::update_presented_frame function was getting out of
hand and adding seeking was making it illegible. This rewrites it to be
(hopefully) quite a bit more readable, and adds a few comments to help
future readers of the code.
In addition, some helpful debugging prints were added that should help
debug any future issues with the player.
With these changes, the seek bar can be used, but only to seek to the
start of the file. Seeking to anywhere else in the file will cause an
error in the demuxer.
The timestamp label that was previously invisible now has its text set
according to either the playback or seek slider's position.
The Demuxer class was changed to return errors for more functions so
that all of the underlying reading can be done lazily. Other than that,
the demuxer interface is unchanged, and only the underlying reader was
modified.
The MatroskaDocument class is no more, and MatroskaReader's getter
functions replace it. Every MatroskaReader getter beyond the Segment
element's position is parsed lazily from the file as needed. This means
that all getter functions can return DecoderErrors which must be
handled by callers.
Making these functions static makes it easier to implement lazy-loading
since the parsing functions can now be called at any time.
The functions were reorganized because they were not defined in the
order they are called. However, instead of moving every function to
that order, I've declared some but defined them further into the file,
which allows the next commit's diff to be more readable.
Matroska::Reader functions now return DecoderErrorOr instead of values
being declared Optional. Useful errors can be handled by the users of
the parser, similarly to the VP9 decoder. A lot of the error checking
in the reader is a lot cleaner thanks to this change, since all reads
can be range checked in Streamer::read_octet() now.
Most functions for the Streamer class are now also out-of-line in
Reader.cpp now instead of residing in the header.
As new demuxers are added, this will get quite full of files, so it'll
be good to have a separate folder for these.
To avoid too many chained namespaces, the Containers subdirectory is
not also a namespace, but the Matroska folder is for the sake of
separating the multiple classes for parsed information entering the
Video namespace.
Timers keep their previously set interval even for single-shot mode.
We want all timers to fire immediately if they don't have a delay set
in the start() call.
This has two benefits:
- I observed a ~34% decrease in decoding time running TestVP9Decode.
- Removing all of these silly Vector fields helps simplify the code
relationships between all the functions in Decoder.cpp. It'll also be
much easier to make these static with template specializations, if
that turns out to be worthy performance improvement.
VideoPlayerWidget was keeping a reference to PlaybackManager when
changing files, so the old and new managers would both send frames to
be presented at the same time, causing it to flicker back and forth
between the two videos. However, PlaybackManager no longer relies on
event bubbling to pass events to its parent. By changing it to send
events directly to an Object, it can avoid being ref counted, so that
it will get destroyed with its containing object and stop sending
events.
With the addition of this struct, both the bool to determine if coefs
should be parsed and the token parse itself can take specific
parameters.
This is the last step in parameterizing all the tree parsing, so the
old functions in TreeParser are now unused. This patch is very
satisfying :^)
There's still more work to be done to clean up how the parameters are
passed from Parser, but that's work for another day.