1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-25 19:15:06 +00:00
Commit graph

534 commits

Author SHA1 Message Date
Lucas CHOLLET
f389c1cdba LibGfx+LibPDF: Use LibCompress' implementation of the PackBits decoder
No need to have these three copies :^)
2023-12-27 17:40:11 +01:00
Shannon Booth
e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Nico Weber
b63eb4a4dd LibPDF: Implement /Mask support with stream object argument 2023-12-23 20:39:11 +01:00
Nico Weber
a3507ef65b LibPDF: Move error for /ImageMask out of load_image()
...and tweak load_image() to support loading mask images
(which don't have a color space and are always 1 bit per pixel).
2023-12-23 20:39:11 +01:00
Nico Weber
3ad9782e25 LibPDF: Extract a apply_alpha_channel() function
No behavior change.
2023-12-23 20:39:11 +01:00
Nico Weber
4bd11c8eb4 LibPDF: Show a 'rendering unsupported' error for images with /Mask key 2023-12-23 20:39:11 +01:00
Nico Weber
387fecea7f LibPDF: Fix typo in a variable name
No behavior change.
2023-12-23 10:10:24 +01:00
Nico Weber
6723552e95 LibPDF: Add a spec comment and remove a FIXME
I think the ASCIIHexDecode / ASCII85Decode unfilter functions handle
what this FIXME was about already.
2023-12-22 10:58:54 +01:00
Nico Weber
3d07684891 LibPDF: Extract Parser::parse_inline_image()
Pure code move, no intended behavior change.

The motivation is just to make Parser::parse_operators() less nested
and more focused.
2023-12-22 10:58:54 +01:00
Nico Weber
6032c06f6b Revert "LibPDF: Add basic tiled, coloured pattern rendering"
This reverts commit 8ff87911a3.
2023-12-21 19:24:56 +01:00
Nico Weber
7cb216c95b Revert "LibPDF: Offset PaintStyle when painting so pattern overlaps..."
This reverts commit 8c7fc4fe6c.
2023-12-21 19:24:56 +01:00
Nico Weber
6de32e5359 LibPDF: Draw inline images
The idea is to massage the inline image data into something that
looks like a regular image, and then use the normal image drawing code:
We translate the inline image abbreviations to the expanded version at
rendering time, then unfilter (i.e. uncompress) the image data at
rendering time, and the go down the usual image drawing path.

Normal streams are unfiltered when they're first accessed, but
inline image streams live in a page's drawing operators, and this
fits the current approach of parsing a page's operators anew
every time the page is rendered.

(We also need to add some special-case handling for color spaces
of inline images: Inline images can use named color spaces, while
regular images always use direct color space objects.)
2023-12-20 12:45:16 -07:00
Nico Weber
d577d181e3 LibPDF: Clamp linear_srgb values in convert_to_srgb()
This is very crude gamut mapping, but it's better than producing
NaNs when passing negative values to powf(x, 1/2.2).
2023-12-20 12:45:07 +01:00
Nico Weber
022fce75a6 LibPDF: Get inline image data from parser to renderer
We create a inline_image_end operator that has all the relevant data
in a synthetic StreamObject.

inline_image_end is still a RENDERER_TODO(), so no real behavior
change. (Previously we'd call only inline_image_begin, so string the
todo message is about is now a bit different. But no interesting
behavior change.)
2023-12-20 12:19:08 +01:00
Nico Weber
3285502ec6 LibPDF: Extract a Parser::unfilter_stream() method
No behavior change.
2023-12-20 12:19:08 +01:00
Nico Weber
b21f867e88 LibPDF: Don't crash on images with empty filter arrays
0000967.pdf page 2 contains a bunch of inline images with empty
filter arrays.
2023-12-20 12:19:08 +01:00
Nico Weber
13641693cb LibPDF: Use make_object<>() to make objects
No behavior change.
2023-12-20 12:19:08 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Nico Weber
f2f07c3a80 LibPDF: Replace if (a) VERIFY(0) with VERIFY(!a)
No behavior change.
2023-12-16 12:39:56 +01:00
Nico Weber
ee74bc2538 LibPDF: Tolerate 0-sized Subrs in PS1 font subprograms
This regressed in 2b3a41be74 in #18031.

Fixes a crash rendering page 2 and onward of
https://pyx-project.org/presentation_dantemv35_en.pdf
2023-12-16 12:39:56 +01:00
Nico Weber
11354dbf9e LibPDF: Remember inline image stream bytes
We still don't process inline images, but now we have the pieces we need
for doing it (`map` and `stream_bytes`).
2023-12-11 10:50:39 +01:00
Nico Weber
cabc6a9d80 LibPDF: Add a comment that PDF 2.0 added a length key for inline images
In practice, basically no file has it, since it was only added in 2.0,
and 1.7 explicitly said "in particular, the Type, Subtype, and Length
entries normally found in a stream or image dictionary are unnecessary."
2023-12-11 10:50:39 +01:00
Nico Weber
071f890847 LibPDF: Require whitespace in front of inline image marker EI
Fixes a crash on page 3 of 0000450.pdf of 0000.zip, where we previously
started interpreting the middle of an inline image content stream as
operators, since it contained `EI` in its pixel data.
2023-12-11 10:50:39 +01:00
Nico Weber
27aae7e2b1 LibPDF: Parse inline image key-value pairs
Not used for anything yet.
2023-12-11 10:50:39 +01:00
Nico Weber
0912896ae0 LibPDF: Extract Parser::parse_dict_contents_until()
No behavior change.
2023-12-11 10:50:39 +01:00
Kyle Pereira
8c7fc4fe6c LibPDF: Offset PaintStyle when painting so pattern overlaps properly 2023-12-10 16:44:24 +01:00
Kyle Pereira
8ff87911a3 LibPDF: Add basic tiled, coloured pattern rendering 2023-12-10 16:44:24 +01:00
Kyle Pereira
8191f2b47a LibPDF: Add parameter for background color of render 2023-12-10 16:44:24 +01:00
Kyle Pereira
60c4803dd3 LibPDF: Pass Renderer to ColorSpace 2023-12-10 16:44:24 +01:00
Kyle Pereira
082a4197b6 LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces
This is in anticipation of Pattern color space support which does not
yield a simple color.
2023-12-10 16:44:24 +01:00
Kyle Pereira
e4b8d68039 LibPDF: Permit comments at the end of a stream 2023-12-10 16:44:24 +01:00
Nico Weber
8b50b689f9 LibPDF: Reject invalid "hival" values
Doesn't fire on any of the PDFs I have, and seems like a good thing
to check.
2023-12-07 08:10:40 +00:00
Nico Weber
43cd3d7dbd LibPDF: Tolerate palettes that are one byte too long
Fixes these errors from `Meta/test_pdf.py path/to/0000`, with
0000 being 0000.zip from the PDF/A corpus in unzipped:

    Malformed PDF file: Indexed color space lookup table doesn't
                        match size, in 4 files, on 8 pages, 73 times
      path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
      path/to/0000/0000364.pdf 5 6
      path/to/0000/0000918.pdf 5
      path/to/0000/0000683.pdf 8
2023-12-07 08:10:40 +00:00
Nico Weber
832a065687 LibPDF: For low-bpp images, start scanlines on byte boundaries
Required per spec, and we get slanted images without it. Fixes e.g.
page 1 of 0000749.pdf.
2023-12-07 08:10:40 +00:00
Nico Weber
06b9633da5 LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern
When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat
the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors
to 8-bit nicely, but is the wrong thing to do for palette indices.
Stop doing this for palette indices.

Fixes "Indexed color space index out of range" for 11 files in the
PDF/A 0000.zip test set now that we correctly handle palette indices
as of the previous commit:

    Malformed PDF file: Indexed color space lookup table doesn't match
                        size, in 4 files, on 8 pages, 73 times
      path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
      path/to/0000/0000364.pdf 5 6
      path/to/0000/0000918.pdf 5
      path/to/0000/0000683.pdf 8
2023-12-07 08:10:40 +00:00
Nico Weber
8733ba2734 LibPDF: Fix decoding of IndexedColorSpace for palette sizes != 255
Previously, we were scaling palette indices from 0..(palette_size - 1)
to 0..255 before using them as index into the palette. Instead, do not
scale palette indices before using them as indices.

(Renderer::load_image() uses `component_value_decoders.empend(
.0f, 255.0f, dmin, dmax)`, so to get an identity mapping, we have to
return `0, 255` from IndexedColorSpace::default_decode()).

Fixes rendering of the gradient on page 5 of 0000277.pdf.
2023-12-06 15:32:13 +01:00
Nico Weber
4cb0593daf LibPDF: Convert LAB values to bytes differently
Gfx::ICC::Profile's current API takes bytes, so we need to do some
contortions for LAB values to go through.

This will probably become nicer once we implement all the backward
transforms in Gfx::ICC::Profile, but for now let's hack it in
on the LibPDF side.

Makes colors in 0000651.pdf looks good, especially on pages 1 and 7-12.
2023-12-05 11:36:44 -05:00
Nico Weber
b2a1130556 LibGfx/ICC: Implement conversion between different connection spaces
If one profile uses PCSXYZ and the other PCSLAB as connection space,
we now do the necessary XYZ/LAB conversion.

With this and the previous commits, we can now convert from profiles
that use PCSLAB with mAB, such as stress.jpeg from
https://littlecms.com/blog/2020/09/09/browser-check/ :

    % Build/lagom/icc --name sRGB --reencode-to serenity-sRGB.icc
    % Build/lagom/bin/image -o out.png \
        --convert-to-color-profile serenity-sRGB.icc \
        ~/src/jpegfiles/stress.jpeg
2023-12-04 08:02:36 +00:00
Nico Weber
1c88b82dfc LibPDF: Do less work in SampledFunction::evaluate()'s inner loop
Instead of recomputing the left index and the float amount in that
interval for each coordinate all the time, do it once when we
preprocess the input coordinates.

One line less, faster, and arguably easier to read.

No behavior change.
2023-12-02 22:26:13 +01:00
Nico Weber
54883b7d41 LibPDF: Remove get_bounds lambda in SampledFunction::evaluate()
Using `min()` to guarantee the left index is never == `size() - 1`,
even for an interpolation value of 1.0, is less code, and arguably
easier to understand as well.

No behavior change.
2023-12-02 22:26:13 +01:00
Nico Weber
d9fd72007e LibPDF: Add a spec comment to SampledFunction::sample() 2023-12-02 22:26:13 +01:00
Idan Horowitz
aad5c58996 LibPDF: Eliminate reference cycle between OutlineItem parent/children
Since all parents held a reference pointer to their children, and all
children held reference pointers to their parents, both objects would
never get free'd once the document was no longer being used.

Fixes ossfuzz-63833.
2023-12-02 22:23:53 +01:00
Lucas CHOLLET
2a5cb5becb LibCompress: Add LZWDecoder::decode_all()
This method takes bytes as input and decompress everything to a
ByteBuffer. It uses two control codes (clear and end of data) as
described in the GIF, TIFF and PDF specifications.
2023-12-01 12:58:14 +01:00
Nico Weber
f34da6396f LibPDF: Update font size after getting font from cache
Page 1 of 0000277.pdf does:

    BT 22 0 0 22  59  28 Tm /TT2 1 Tf
        (Presented at Photonics West OPTO, February 17, 2016) Tj ET
    BT 32 0 0 32 269 426 Tm /TT1 1 Tf
        (Robert W. Boyd) Tj ET
    BT 22 0 0 22 253 357 Tm /TT2 1 Tf
        (Department of Physics and) Tj ET
    BT 22 0 0 22 105 326 Tm /TT2 1 Tf
        (Max-Planck Centre for Extreme and Quantum Photonics) Tj ET

Every line begins a text operation, then updates the font matrix,
selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text
operation.

`Tm` (which sets the font matrix) contains a scale, and uses that
to update the font size of the currently-active font (cf #20084).
But in this file, we `Tm` first and `Tf` (font selection) second,
so this updates the size of the old font. So when we pull it out
of the cache again on line 3, it would still have the old size
from the `Tm` on line 2.

(The whole text scaling logic in LibPDF imho needs a rethink; the
current approach also causes issues with zero-width glyphs which
currently lead to divisions by zero. But that's for another PR.)

Fixes another regression from c8510b58a3 (which I've accidentally
referred to by 2340e834cd in another commit).
2023-11-26 19:05:13 -05:00
Nico Weber
eb1c99bd72 LibPDF+LibGfx: Make SMasks on jpeg images work
SMasks are greyscale images that get used as alpha channel for a
different image.

JPEGs in PDFs are stored as streams with /DCTDecode filters, and
we have a separate code path for loading those in the PDF renderer.
That code path just calls our JPEG decoder, which creates bitmaps
with format BGRx8888.

So when we process an SMask for such a bitmap, we have to change
the bitmap's format to BGRA8888 in addition to setting alpha values
on all pixels.
2023-11-23 12:13:03 +01:00
Nico Weber
57e2b5ef59 LibPDF+Tests: Correctly decode text strings without explicit encoding 2023-11-22 09:08:06 -07:00
Nico Weber
e39a790c82 LibPDF: Stop converting encodings in object parser
Per 1.7 spec 3.8.1, there are multiple logical text string types:
* text strings
* ASCII strings
* byte strings

Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0)
UTF-8.

But byte strings shouldn't be converted but treated as binary
data.

This makes us no longer convert strings used for drawing page text.
TABLE 5.6 "Text-showing operators" lists the operands for text-showing
operators as just "string", not "text string" (even though these strings
confusingly are called "text strings" in the body text), so not doing
this there is correct (and matches other viewers).

We also no longer incorrectly convert strings used for cypto data
(such as passwords), if they start with an UTF-16BE or UTF-8 marker.

No behavior change for outlines and info dict entries.

https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of
this.

(ASCII strings only contain ASCII characters and behave the same
anyways.)
2023-11-22 09:08:06 -07:00
Nico Weber
14bcb5219d LibPDF: Tolerate comments before drawing operators
Necessary to be able to render
https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf
2023-11-22 08:56:43 +00:00
Nico Weber
9e8cf4fc1a LibPDF: Tolerate comment after last dict item
Necessary to be able to open
https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf
2023-11-22 08:56:43 +00:00
Nico Weber
4440452f92 LibPDF: Support images with 1, 2, 4 bits per pixel
They just get upsampled to 8 bits per pixel images.
2023-11-18 07:33:15 +00:00