1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-22 03:15:08 +00:00
Commit graph

118 commits

Author SHA1 Message Date
Timothy Flynn
eed956b473 AK: Increase LittleEndianOutputBitStream's buffer size and remove loops
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.

We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    13.62s to 10.9s on Serenity (cold)
    13.62s to 9.22s on Serenity (warm)
    2.93s to 2.32s on Linux

One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
2023-04-02 10:54:37 +02:00
Nico Weber
85d0637058 LibCompress: Make CanonicalCode::from_bytes() return ErrorOr<>
No intended behavior change.
2023-04-02 06:19:46 +02:00
Tim Schumacher
ad31265e60 LibCompress: Implement block size validation for XZ streams 2023-04-01 13:57:54 +02:00
Tim Schumacher
20f1a29202 LibCompress: Factor out the list of XZ check sizes 2023-04-01 13:57:54 +02:00
Nico Weber
bc70d7bb77 LibCompress: Reduce indentation in CompressedBlock::try_read_more()
...by removing `else` after `return`.

No behavior change.
2023-04-01 13:57:39 +02:00
Timothy Flynn
7ec91dfde7 LibCompress: Add a utility to GZIP compress an entire file
This is copy-pasted from the gzip utility, along with its existing TODO.
This is currently only needed by that utility, but this gives us API
symmetry with GzipDecompressor, and helps ensure we won't end up in a
situation where only one utility receives optimizations that should be
received by all interested parties.
2023-04-01 08:15:49 +02:00
Timothy Flynn
857f559a06 gunzip+LibCompress: Move utility to decompress files to GzipDecompressor
This is to allow re-using this method (and any optimization it receives)
by other utilities, like gzip.
2023-04-01 08:15:49 +02:00
Nico Weber
c3b8b3124c LibCompress: Remove two needless heap allocations 2023-03-31 08:44:30 -06:00
Timothy Flynn
8b56d82865 AK+LibCompress: Remove the Deflate back-reference intermediate buffer
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.
2023-03-31 06:56:11 +02:00
Timothy Flynn
9f238793e0 gunzip+LibCompress: Increase buffer sizes used by Deflate and gunzip
Co-authored-by: Andreas Kling <kling@serenityos.org>
2023-03-31 06:56:11 +02:00
Tim Schumacher
fe761a4e9b LibCompress: Use LZMA context from preexisting dictionary 2023-03-30 14:39:31 +02:00
Tim Schumacher
c020ee8bfa LibCompress: Avoid overflowing the size of uncompressed LZMA2 chunks 2023-03-30 14:39:31 +02:00
Tim Schumacher
023c64011c LibCompress: Use the correct LZMA repetition offset in all cases 2023-03-30 14:39:31 +02:00
Tim Schumacher
9ccb0fc1d8 LibCompress: Only require new LZMA2 properties after dictionary reset 2023-03-30 14:39:31 +02:00
Tim Schumacher
d9627503a9 LibCompress: Reduce repeated code in the LZMA decompressor 2023-03-30 14:39:31 +02:00
Tim Schumacher
726963edc7 LibCompress: Implement support for multiple concatenated XZ streams 2023-03-30 14:38:47 +02:00
Tim Schumacher
00332c9b7d LibCompress: Move XZ header validation into the read function
The constructor is now only concerned with creating the required
streams, which means that it no longer fails for XZ streams with
invalid headers. Instead, everything is parsed and validated during the
first read, preparing us for files with multiple streams.
2023-03-30 14:38:47 +02:00
Tim Schumacher
8ff36e5910 LibCompress: Implement proper handling of LZMA end-of-stream markers 2023-03-30 08:45:35 +02:00
Tim Schumacher
b6f3b2f116 LibCompress: Move common LZMA end-of-file checks into helper functions 2023-03-30 08:45:35 +02:00
Timothy Flynn
7447a91d7e LibCompress: Decode non-self-referencing back-references in one shot
We currently decode back-references one byte at a time, while writing
that byte back out to the output buffer. This is only necessary when the
back-reference refers to itself, i.e. when the back-reference distance
is less than its length. In other cases, we can read the entire back-
reference block in one shot.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:

    5.8s to 4.89s on Serenity (cold)
    2.3s to 1.72s on Serenity (warm)
    1.6s to 1.06s on Linux
2023-03-29 13:22:11 +01:00
Timothy Flynn
5aaefe4e62 LibCompress: Use prefix tables to decode Huffman codes up to 8 bits long
Huffman codes have a useful property in that they are prefix codes. That
is, a set of bits representing a Huffman-coded symbol is never a prefix
of another symbol. This allows us to create a table, where each index in
the table are integers whose prefix is the entry's corresponding Huffman
code.

With Deflate, we can have codes up to 16 bits in length, thus creating a
prefix table with 2^16 entries. So instead of creating a table fit all
possible codes, we use a cutoff of 8-bit codes. Codes larger than 8 bits
fall back to the binary search method.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from 3.527s to 2.585s on Linux.
2023-03-29 07:19:14 +02:00
Timothy Flynn
20aaab47f9 LibCompress: Use a bit stream for the entire GZIP decompression process
We currently mix normal and bit streams during GZIP decompression, where
the latter is a wrapper around the former. This isn't causing issues now
as the underlying bit stream buffer is a byte, so the normal stream can
pick up where the bit stream left off.

In order to increase the size of that buffer though, the normal stream
will not be able to assume it can resume reading after the bit stream.
The buffer can easily contain more bits than it was meant to read, so
when the normal stream resumes, there may be N bits leftover in the bit
stream that the normal stream was meant to read.

To avoid weird behavior when mixing streams, this changes the GZIP
decompressor to always read from a bit stream.
2023-03-29 07:19:14 +02:00
Andreas Kling
aeb8224ec8 LibCompress: Speed up deflate decompression by ~11%
...simply by using LittleEndianInputBitStream::read_bit() instead of
read_bits(1). This puts us on the fast path for single-bit reads.

There's still lots of money on the table for bigger optimizations to
claim here, just picking an embarrassingly low-hanging fruit. :^)
2023-03-24 17:08:35 +01:00
Tim Schumacher
9e990f7329 LibCompress: Add support for XZ 2023-03-21 10:25:13 +01:00
Tim Schumacher
aae1404016 LibCompress: Add support for LZMA2 2023-03-21 10:25:13 +01:00
Tim Schumacher
1b8318ab67 LibCompress: Allow providing an external dictionary for LZMA
While at it, rename the former "output buffer" to "dictionary", since
that's its primary function.
2023-03-21 10:25:13 +01:00
Tim Schumacher
f4506a3a72 LibCompress: Allow appending input streams to an existing LZMA decoder 2023-03-21 10:25:13 +01:00
Tim Schumacher
04f69de7f1 LibCompress: Refactor LZMA model property decoding into a static helper 2023-03-21 10:25:13 +01:00
Tim Schumacher
b3a9729e23 LibCompress: Add support for LZMA streams 2023-03-20 12:15:38 +02:00
Tim Schumacher
ae51c1821c Everywhere: Remove unintentional partial stream reads and writes 2023-03-13 15:16:20 +00:00
Tim Schumacher
ecd1862859 AK: Rename Stream::write_entire_buffer to Stream::write_until_depleted
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher
a3f73e7d85 AK: Rename Stream::read_entire_buffer to Stream::read_until_filled
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher
d5871f5717 AK: Rename Stream::{read,write} to Stream::{read_some,write_some}
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").

Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).

No functional changes, just a lot of new FIXMEs.
2023-03-13 15:16:20 +00:00
Nico Weber
dfb45705e6 LibCompress: Make DeflateCompressor::write() use loop, not recursion
This is performance-neutral, but Instruments.app had a hard time
visualizing the very deeply nested stack frames here.

No behavior change.
2023-03-13 06:32:56 +00:00
Tim Schumacher
43f98ac6e1 Everywhere: Remove the AK:: qualifier from Stream usages 2023-02-13 00:50:07 +00:00
Tim Schumacher
874c7bba28 LibCore: Remove Stream.h 2023-02-13 00:50:07 +00:00
Tim Schumacher
220fbcaa7e AK: Remove the fallible constructor from FixedMemoryStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
8b2f23d016 AK: Remove the fallible constructor from LittleEndianOutputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
093cf428a3 AK: Move memory streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
8464da1439 AK: Move Stream and SeekableStream from LibCore
`Stream` will be qualified as `AK::Stream` until we remove the
`Core::Stream` namespace. `IODevice` now reuses the `SeekMode` that is
defined by `SeekableStream`, since defining its own would require us to
qualify it with `AK::SeekMode` everywhere.
2023-01-29 19:16:44 -07:00
Tim Schumacher
5f2ea31816 AK: Move Handle from LibCore and name it MaybeOwned
The new name should make it abundantly clear what it does.
2023-01-29 19:16:44 -07:00
Linus Groh
9c08bb9555 AK: Remove try_ prefix from FixedArray creation functions 2023-01-28 22:41:36 +01:00
Timothy Flynn
027aee2c66 Userland: Add missing Math.h and IntegralMath.h header includes
These are currently being implicitly including by FixedPoint.h by way of
Format.h. The former will soon be removed from the latter, which would
otherwise cause a compile error in these files.
2023-01-19 11:29:48 +00:00
Tim Schumacher
1ca62de558 LibCore: Return EBADF on unsupported stream operations 2023-01-19 11:41:56 +01:00
Tim Schumacher
c4da1be32c LibCompress: Remove all leftover AK::Stream headers 2023-01-13 17:34:45 -07:00
Tim Schumacher
46a53dc6e0 LibCompress: Switch the deflate seekback buffer to CircularBuffer 2023-01-13 17:34:45 -07:00
Tim Schumacher
35548cab75 LibCompress: Remove DuplexMemoryStream from GzipDecompressor 2023-01-13 17:34:45 -07:00
Tim Schumacher
0d69fbd19f LibCompress: Remove DuplexMemoryStream from DeflateDecompressor 2023-01-13 17:34:45 -07:00
Tim Schumacher
d23f0a7405 LibCompress: Switch DeflateDecompressor to a fallible constructor
We don't have anything fallible in there yet, but we will soon switch
the seekback buffer to the new `CircularBuffer`, which has a fallible
constructor.

We have to do the same for the internal `GzipDecompressor::Member`
class, as it needs to construct a `DeflateCompressor` from its received
stream.
2023-01-13 17:34:45 -07:00