This is for validating that a decoder with a weak or nonexistent
sniff() method thinks it can decode an image. This should not be
treated as an error.
No behavior change.
...from try_create_for_raw_bytes().
If a plugin returns `true` from sniff but then fails when calling
its `create()` method, we now no longer swallow that error.
Allows `image` (and other places in the system) to print a more
actionable error if early image headers are invalid.
(We now no longer try to find another plugin that can also handle
the image.)
Fixes a regression from #20063 / #19893 -- before then, we didn't
do fallible work this early.
The semantics of BGRx8888 aren't super clear and it means different
things for different parts of the codebase. In particular, the PNG
writer still writes the x channel to the alpha channel of its output.
In BMPs, the 4th palette byte is usually 0, which means after #21412 we
started writing all .bmp files with <= 8bpp as completely transparent
to PNGs.
This works around that.
(See also #19464 for previous similar workarounds.)
The added `bitmap.bmp` is a 1bpp file I drew in Photoshop and saved
using its "Save as..." saving path.
This patch brings few small QoL improvements:
- We don't need to read the Huffman stream before returning an error
due to a missing quantization table.
- We check the table presence only once per scan instead of once per
MCU.
- `dequantize()` is now infallible.
The old implementation of PassMode has only been tested with a single
image, and let's say that it didn't survive long in the wild. A few
cases were not considered:
- We only supported VerticalMode right after PassMode.
- It can happen that token need to be used but not consumed from the
reference line.
With that fix, we are able to decode every single PDF file from the
1000-file zip "0000" (except 0000871.pdf, which uses byte alignment).
This is massive progress compared to the hundred of errors that we were
previously receiving.
The former automatically adapts the prefix to binary and octal
output, and is what we already use in the majority of cases.
Patch generated by:
rg -l '0x\{' | xargs sed -i '' -e 's/0x{:/{:#/'
I ran it 4 times (until it stopped changing things) since each
invocation only converted one instance per line.
No behavior change.
This reverts commit a4b2e5b27b.
This was just plain wrong, I remember it making sense and fixing
something but that was probably due to local changes. It should never
have landed on master, my bad.
This function was called over and over in `manage_extra_channels()`,
even if the result depends only on the metadata. Instead, we now call it
once and store the result.
A tile is basically a strip with a user-defined width. With that in
mind, adding support for them is quite straightforward. As a lot the
common code was named after 'strips', to avoid future confusion I
renamed everything that interact with either strips or tiles to a
global term: 'segment'.
Note that tiled images are supposed to always have a 'TileOffsets' tag
instead of 'StripOffset'. However, this doesn't seem to be enforced by
encoders, so we support having either of them indifferently.
The test case was generated with the following Python script:
import pyvips
img = pyvips.Image.new_from_file('deflate.tiff')
img.write_to_file('tiled.tiff',
compression=pyvips.ForeignTiffCompression.DEFLATE,
tile=True, tile_width=64, tile_height=64)
This variable stores the number of rows from the beginning of the image,
contrary to `row` that stores the number of rows relative to the start
of the current segment.
The first marker is always white in CCITT streams, so lines starting
with a black pixel encodes a symbol meaning 0 white pixels. Then, the
decoding would proceed with a black symbol. We used to set the symbol's
color based on `column == 0`, which is wrong in this situation.
The `read_tag()` function is not mandated to keep the reading head at a
meaningful position, so we also need to align the pointer after the last
tag. This solves a bug where reading the last field of an IFD, which is
placed after the tags, was incorrect.
Every TIFF containers is composed of a main IFD. Some entries of this
one can be a pointer to a sub-IFD. We are now capable of exploring these
underlying structures. Note that we don't do anything with them yet.
JPEGStream::byte_offset() now returns an offset relative to the start
of the stream, instead of relative to the buffered part.
No behavior change except if JPEG_DEBUG is set.
bab2113ec1 made read_whitespace() return ErrorOr, which makes this
easy to do.
(7cafd7d177, which added the fixmes, landed slightly after bab2113ec1,
so not quite sure why it wasn't like this immediately. Maybe commit
order got changed during review; both commits were in #17831.)
No behavior change.
We always store CMYK data as YCCK, for two reasons:
1. If we ever want to do subsampling, then doing 2111 or
2112 makes sense with YCCK, while it doesn't make sense
if we store CMYK directly.
2. It forces us to write a color transform header. With a color
transform header, everyone agrees that the CMYK channels should
be stored inverted, while without it behavior between decoders
is inconsistent. (We could write an explicit color transform header
for CMYK too though, but with YCCK it's harder to forget since the
output will look wrong everywhere without it.)
initialize_mcu() grows a full CMYKBitmap override. Some of the
macroblock traversal could probably shared with some kind of
for_all_macroblocks() type function in the future, but the color
conversion math is different enough that this should be a separate
function.
Other than that, we pass around a mode parameter and make a few fuctions
write 4 instead of 3 channels, and that's it.
We use the luminance quantization and huffman tables for the K
channel.
JPEGs can store a `restart_interval`, which controls how many
minimum coded units (MCUs) apart the stream state resets.
This can be used for error correction, decoding parts of a jpeg
in parallel, etc.
We tried to use
u32 i = vcursor * context.mblock_meta.hpadded_count + hcursor;
i % (context.dc_restart_interval *
context.sampling_factors.vertical *
context.sampling_factors.horizontal) == 0
to check if we hit a multiple of an MCU.
`hcursor` is the horizontal offset into 8x8 blocks, vcursor the
vertical offset, and hpadded_count stores how many 8x8 blocks
we have per row, padded to a multiple of the sampling factor.
This isn't quite right if hcursor isn't divisible by both
the vertical and horizontal sampling factor. Tweak things so
that they work.
Also rename `i` to `number_of_mcus_decoded_so_far` since that
what it is, at least now.
For the test case, I converted an existing image to a ppm:
Build/lagom/bin/image -o out.ppm \
Tests/LibGfx/test-inputs/jpg/12-bit.jpg
Then I resized it to 102x77px in Photoshop and saved it again.
Then I turned it into a jpeg like so:
path/to/cjpeg \
-outfile Tests/LibGfx/test-inputs/jpg/odd-restart.jpg \
-sample 2x2,1x1,1x1 -quality 5 -restart 3B out.ppm
The trick here is to:
a) Pick a size that's not divisible by the data size width (8),
and that when rounded to a block size (13) still isn't divisible
by the subsample factor -- done by picking a width of 102.
b) Pick a huffman table that doesn't happen to contain the bit
pattern for a restart marker, so that reading a restart marker
from the bitstream as data causes a failure (-quality 5 happens
to do this)
c) Pick a restart interval where we fail to skip it if our calculation
is off (-restart 3B)
Together with #22987, fixes#22780.