Previously we would calculate the index of the first parent node as
heap.size() (which is initialized to non_zero_freqs), so in the edge
case in which all symbols had a non-zero frequency, we would use the
Size-index entry in the array for both the first symbol's leaf node,
and the first parent node.
The result would either be a non-optimal huffman code (bad), or an
illegal huffman code that would then go on to crash due to an error
check in CanonicalCode::from_bytes. (worse)
We now store parent nodes starting at heap.size() - 1, which eliminates
the potential overlap, and resolves the issue.
This method takes bytes as input and decompress everything to a
ByteBuffer. It uses two control codes (clear and end of data) as
described in the GIF, TIFF and PDF specifications.
Some users of the LZW algorithm use a different value to determine when
the size of the code changes. GIF increments the size when the number of
elements in the table is equal to 2^code_size while TIFF does it for a
count of 2^code_size - 1.
This patch adds the parameter m_offset_for_size_change with a default
value of 0 and our decoder will increment the code size when we reach
a table length of 2^code_size + m_offset_for_size_change. This allows us
to support both situations.
XZ writes filters in the order that they are used during compression, so
we need to process them in the reverse order while decompression.
This wasn't noticed earlier because we only supported the LZMA2 filter.
Since it will become a stream in a little bit, it should behave like all
non-trivial stream classes, who are not primarily intended to have
shared ownership to make closing behavior more predictable. Across all
uses of MappedFile, there is only one use case of shared mapped files in
LibVideo, which now uses the thin SharedMappedFile wrapper.
The class was an inner class of `BrotliDecompressionStream`, let's move
it outside the `Stream` object in order to ease the access to user only
interested in this part.
These routines:
- read_prefix_code
- read_simple_prefix_code
- read_complex_prefix_code
were methods of `BrotliDecompressionStream` taking a `CanonicalCode` as
an out parameter. This patch puts them in `CanonicalCode` as static
methods.
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.
Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
This should keep the `read_some` function a bit flatter and shorter, and
make it easier to match the match type decoding process with the
specification.
Webp lossless can have up to 2328 symbols. This code assumed the deflate
max of 288, leading to crashes for webp lossless files using more than
288 symbols (such as Tests/LibGfx/test-inputs/simple-vp8l.webp).
Nothing writes webp files at this point, so the m_bit_codes and
m_bit_code_lengths arrays aren't ever used in practice with more than
288 entries.