mirror of
https://github.com/RGBCube/serenity
synced 2025-07-27 19:17:44 +00:00
LibCompress: Use prefix tables to decode Huffman codes up to 8 bits long
Huffman codes have a useful property in that they are prefix codes. That is, a set of bits representing a Huffman-coded symbol is never a prefix of another symbol. This allows us to create a table, where each index in the table are integers whose prefix is the entry's corresponding Huffman code. With Deflate, we can have codes up to 16 bits in length, thus creating a prefix table with 2^16 entries. So instead of creating a table fit all possible codes, we use a cutoff of 8-bit codes. Codes larger than 8 bits fall back to the binary search method. Using the "enwik8" file as a test (100MB uncompressed, commonly used in benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression time decreases from 3.527s to 2.585s on Linux.
This commit is contained in:
parent
8e834d4bb2
commit
5aaefe4e62
2 changed files with 63 additions and 11 deletions
|
@ -30,10 +30,20 @@ public:
|
|||
static Optional<CanonicalCode> from_bytes(ReadonlyBytes);
|
||||
|
||||
private:
|
||||
static constexpr size_t max_allowed_prefixed_code_length = 8;
|
||||
|
||||
struct PrefixTableEntry {
|
||||
u16 symbol_value { 0 };
|
||||
u16 code_length { 0 };
|
||||
};
|
||||
|
||||
// Decompression - indexed by code
|
||||
Vector<u16> m_symbol_codes;
|
||||
Vector<u16> m_symbol_values;
|
||||
|
||||
Array<PrefixTableEntry, 1 << max_allowed_prefixed_code_length> m_prefix_table {};
|
||||
size_t m_max_prefixed_code_length { 0 };
|
||||
|
||||
// Compression - indexed by symbol
|
||||
Array<u16, 288> m_bit_codes {}; // deflate uses a maximum of 288 symbols (maximum of 32 for distances)
|
||||
Array<u16, 288> m_bit_code_lengths {};
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue