There was a TODO questioning whether breaking on >4bpp images
was the correct behaviour when RLE4 was detected. There is no
indication in the spec that RLE4 can be used with anything >4bpp,
so I believe this doesn't require additional follow-up.
MS Spec:
https://learn.microsoft.com/en-us/windows/win32/gdi/bitmap-compression
simple-vp8l-alpha-used-false.webp is a copy of simple-vp8l.webp,
with the byte at offset 0x18 changed from 0x10 to 0x00 -- that
is, the bit in the VP8L header that stores `is_alpha_used` is cleared.
We would already allocated a BGRx8888 instead of a BGRA8888 bitmap,
but keep actual alpha data in the `x` channel.
That lead to at least `image` still writing a PNG with an alpha channel.
So explicitly set the alpha channel to 0xff when is_alpha_used is false,
to make sure all consumers of decoded lossless webp data have behavior
consistent with other webp readers.
In practice, webp encoders usually don't write files that have
`is_alpha_used` set to false and then write actual alpha data to their
output. So this is rarely observable. However, for example for
lossy+ALPH webp files, the lossless webp used to store the ALPH channel
has `is_alpha_used` set to false and all channels but green are 0
(since the lossless green channel stores the alpha channel of a
lossy+ALPH webp). So if we dump such a bitmap to a standalone webp
file (e.g. with the temporary debugging code in fc3249a1ca),
then without this commit here, `image` would convert that webp to
a fully transparent webp, while other webp software would correctly
display the green image with opaque alpha.
Methods defined in header files should generally be `inline`,
not `static`.
`static` means that each translation unit will have its own local copy
of the function when the function isn't inlined and it's up to the
linker's identical code folding to hopefully merge the potentially many
copies in the program. `inline` means that the linker can put the
identical copies in a comdat and merge them by name, without having to
compare contents.
No behavior change.
It seems for very narrow tall paths it is possible for the dxdy value
to round to a value (that after many repeated additions) overshoots
the desired end x. This caused a (rather rare) crash on this 3D canvas
demo: https://www.kevs3d.co.uk/dev/html5logo/. For now, lets just avoid
a crash here. This does not make any noticeable visual differences on
the demos I tired.
Before this change we always returned the font's point size as the
x-height which was basically never correct.
We now get it from the OS/2 table (if one with version >= 2 is available
in the file). Otherwise we fall back to using the ascent of the 'x'
glyph. Most fonts appear to have a sufficiently modern OS/2 table.
Some of these functions can be called millions of times even for images
of moderate size. Inlining these functions really helps the compiler and
gives performance improvements up to 10%.
Inside each scan, raw data is read with the following rules:
- Each `0x00` that is preceded by `0xFF` should be discarded
- If multiple `0xFF` are following, only consider one.
That, plus the fact that we don't know the size of the scan beforehand
made us put a prepared stream in a vector for an easy, later on, usage.
This patch remove this duplication by realizing these operations in a
stream-friendly way.
This class is similar to some extent to an implementation of `Stream`,
but specialized for JPEG reading.
Technically, it is composed of a `Vector` to bufferize the input and
provides a simple interface to read bytes.
This patch aims for a future better and specialized control over the
input data. The encapsulation already allows us to get rid of the last
`seek` in the decoder, thus the ability to use the decoder with a raw
`Stream`.
As it provides faster `read` routines, this patch already reduces time
spend on reading.
Errors are now deferred until `finish_decode()` is finished, meaning
branches to return errors only need to occur at the end of a ranged
decode. If VPX_DEBUG is enabled, a debug message will be printed
immediately when an overread occurs.
Average decoding times for `Tests/LibGfx/test-inputs/4.webp` improve
by about 4.7% with this change, absolute decode times changing from
27.4ms±1.1ms down to 26.1ms±1.0ms.
This does a few things:
- The decoder uses a 32- or 64-bit integer as a reservoir of the data
being decoded, rather than one single byte as it was previously.
- `read_bool()` only refills the reservoir (value) when the size drops
below one byte. Previously, it would read out a bit-sized range from
the data to completely refill the 8-bit value, doing much more work
than necessary for each individual read.
- VP9-specific code for reading the marker bit was moved to its own
function in Context.h.
- A debug flag `VPX_DEBUG` was added to optionally enable checking of
the final bits in a VPX ranged arithmetic decode and ensure that it
contains all zeroes. These zeroes are a bitstream requirement for
VP9, and are also present for all our lossy WebP test inputs
currently. This can be useful to test whether all the data present in
the range has been consumed.
A lot of the size of this diff comes from the removal of error handling
from all the range decoder reads in LibVideo/VP9 and LibGfx/WebP (VP8),
since it is now checked only at the end of the range.
In a benchmark decoding `Tests/LibGfx/test-inputs/4.webp`, decode times
are improved by about 22.8%, reducing average runtime from 35.5ms±1.1ms
down to 27.4±1.1ms.
This should cause no behavioral changes.
The spec says "Decoders are not required to use this information in any
specified way" about this field, but that's probably a typo and belongs
in the previous section. At least, images in the wild look wrong
without this, for example
https://fjord.dropboxstatic.com/warp/conversion/dropbox/warp/en-us/dropbox/Integrations_4@2x.png?id=ce8269af-ef1a-460a-8199-65af3dd978a3&output_type=webp
Implementation-wise, this now copies both uncompressed and compressed
data to yet another buffer for processing the alpha, then does
filtering on that buffer, and then copies the filtered alpha data
into the final image. (The RGB data comes from a lossy webp.)
This is a bit wasteful and we could probably manage without that
local copy, but that'd make the code more convoluted, so this is
good enough for now, at least until we've added tests for this case.
No test, because I haven't yet found out how to create images in this
format.
Else, WebP files with a broken header just return "Decoding failed"
without any more details. This way, there's some debug logging with
more details.
Maybe we'll want to remove this again since it might lead to duplicate
error messages for files that have their error not in the header.
We'll see how this feels. (But most files don't have any errors, so
it's probably fine.)
This function generates a new path, which can be filled to rasterize
a stroke of the original path (at whatever thickness you like). It
does this by convolving a circular pen with the path, so right now
only supports round line caps.
Since filled paths now have good antialiasing, doing this results in
good stroked paths for "free". It also (for free) fixes stroked lines
with an opacity < 1, nice line joins, and is possible to fill with a
paint style (e.g. a gradient or an image).
Algorithm from: https://keithp.com/~keithp/talks/cairo2003.pdf
This was previously masked by sorting the edges on max_y, but if the
last added edge pointed to an edge that ended on the current scanline,
that edge (and what it points to) would also end up added to the active
edges. These edges would then never get removed, and break things very
badly!
Remove SplitLineSegment and replace it with a FloatLine, nobody was
interested in its extra fields anymore. Also, remove the sorting of
the split segments, this really should not have been done here
anyway, and is not required by the rasterizer anymore. Keeping the
segments in stroke order will also make it possible to generate
stroked path geometry (in future).
The spec doesn't talk about this happening in the text, but
`dequant_init()` in 20.4 processes segment adjustment and quantization
index adjustment in the same variable `q` before clamping.
Since we had to adjust the latter step in the previous commit, do
it for the former step too.
I haven't seen this happen in the wild yet (and now, I hopefully
never will notice it if it happens).
It's up to callers of the ImageDecoderPlugin to honor loop_count().
The ImageDecoderPlugin doesn't have to look at it when decoding frames.
No behavior change.
The spec says that the AC dequantization factor for Y2 data should
be at least 8, so do that.
This only has a very small effect (only the first two AC table
entries are < 8 after multiplying with 155 / 100, so this would
have only a small effect on brightness), and this case is hit
exactly 0 times in all my test images. But it's still good to match
the spec.
For a 1024x1024 image, saves about a quarter MB of memory use while
decoding (compared to the decompressed image data itself needing
4 MiB). Not a huge win, but also very easy to do, so might as well.
No behavior change, no measurable performance impact.
I could not discover proof that this is actually faster than the non-SSE
version. In addition, for these relatively simple structures, the
compiler is often sufficiently smart to generate SSE code itself.
For a synthetic font benchmark I wrote, this results in a nice 11%
decrease in runtime.
For some reason, we were decoding the source color twice for every pixel
in the inner-most loop of `blit_filtered`. This makes sure we only
decode the source color once, and rearranges the code to improve
readability.
For my synthetic font rendering benchmark, this improves glyph rendering
performance by ~9%.
for_each_line_segment_on_elliptical_arc() flips the start/end points
for negative theta deltas. When doing this we have to make sure the
line segments emitted swap the start/end points back, so that the
(correct) winding order can be calculated from them.
This makes nonzero fills not totally broken for a lot of SVGs.
This is an implementation of the scanline edge-flag algorithm for
antialiased path filling described here:
https://mlab.taik.fi/~kkallio/antialiasing/EdgeFlagAA.pdf
The initial implementation does not try to implement every possible
optimization in favour of keeping things simple. However, it does
support:
- Both evenodd and nonzero fill rules
- Applying paint styles/gradients
- A range of samples per pixel (8, 16, 32)
- Very nice antialiasing :^)
This replaces the previous path filling code, that only really applied
antialiasing in the x-axis.
There's some very nice improvements around the web with this change,
especially for small icons. Strokes are still a bit wonky, as they don't
yet use this rasterizer, but I think it should be possible to convert
them to do so.