1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-16 19:45:07 +00:00
Commit graph

622 commits

Author SHA1 Message Date
Nico Weber
d577d181e3 LibPDF: Clamp linear_srgb values in convert_to_srgb()
This is very crude gamut mapping, but it's better than producing
NaNs when passing negative values to powf(x, 1/2.2).
2023-12-20 12:45:07 +01:00
Nico Weber
022fce75a6 LibPDF: Get inline image data from parser to renderer
We create a inline_image_end operator that has all the relevant data
in a synthetic StreamObject.

inline_image_end is still a RENDERER_TODO(), so no real behavior
change. (Previously we'd call only inline_image_begin, so string the
todo message is about is now a bit different. But no interesting
behavior change.)
2023-12-20 12:19:08 +01:00
Nico Weber
3285502ec6 LibPDF: Extract a Parser::unfilter_stream() method
No behavior change.
2023-12-20 12:19:08 +01:00
Nico Weber
b21f867e88 LibPDF: Don't crash on images with empty filter arrays
0000967.pdf page 2 contains a bunch of inline images with empty
filter arrays.
2023-12-20 12:19:08 +01:00
Nico Weber
13641693cb LibPDF: Use make_object<>() to make objects
No behavior change.
2023-12-20 12:19:08 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Nico Weber
f2f07c3a80 LibPDF: Replace if (a) VERIFY(0) with VERIFY(!a)
No behavior change.
2023-12-16 12:39:56 +01:00
Nico Weber
ee74bc2538 LibPDF: Tolerate 0-sized Subrs in PS1 font subprograms
This regressed in 2b3a41be74 in #18031.

Fixes a crash rendering page 2 and onward of
https://pyx-project.org/presentation_dantemv35_en.pdf
2023-12-16 12:39:56 +01:00
Nico Weber
11354dbf9e LibPDF: Remember inline image stream bytes
We still don't process inline images, but now we have the pieces we need
for doing it (`map` and `stream_bytes`).
2023-12-11 10:50:39 +01:00
Nico Weber
cabc6a9d80 LibPDF: Add a comment that PDF 2.0 added a length key for inline images
In practice, basically no file has it, since it was only added in 2.0,
and 1.7 explicitly said "in particular, the Type, Subtype, and Length
entries normally found in a stream or image dictionary are unnecessary."
2023-12-11 10:50:39 +01:00
Nico Weber
071f890847 LibPDF: Require whitespace in front of inline image marker EI
Fixes a crash on page 3 of 0000450.pdf of 0000.zip, where we previously
started interpreting the middle of an inline image content stream as
operators, since it contained `EI` in its pixel data.
2023-12-11 10:50:39 +01:00
Nico Weber
27aae7e2b1 LibPDF: Parse inline image key-value pairs
Not used for anything yet.
2023-12-11 10:50:39 +01:00
Nico Weber
0912896ae0 LibPDF: Extract Parser::parse_dict_contents_until()
No behavior change.
2023-12-11 10:50:39 +01:00
Kyle Pereira
8c7fc4fe6c LibPDF: Offset PaintStyle when painting so pattern overlaps properly 2023-12-10 16:44:24 +01:00
Kyle Pereira
8ff87911a3 LibPDF: Add basic tiled, coloured pattern rendering 2023-12-10 16:44:24 +01:00
Kyle Pereira
8191f2b47a LibPDF: Add parameter for background color of render 2023-12-10 16:44:24 +01:00
Kyle Pereira
60c4803dd3 LibPDF: Pass Renderer to ColorSpace 2023-12-10 16:44:24 +01:00
Kyle Pereira
082a4197b6 LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces
This is in anticipation of Pattern color space support which does not
yield a simple color.
2023-12-10 16:44:24 +01:00
Kyle Pereira
e4b8d68039 LibPDF: Permit comments at the end of a stream 2023-12-10 16:44:24 +01:00
Nico Weber
8b50b689f9 LibPDF: Reject invalid "hival" values
Doesn't fire on any of the PDFs I have, and seems like a good thing
to check.
2023-12-07 08:10:40 +00:00
Nico Weber
43cd3d7dbd LibPDF: Tolerate palettes that are one byte too long
Fixes these errors from `Meta/test_pdf.py path/to/0000`, with
0000 being 0000.zip from the PDF/A corpus in unzipped:

    Malformed PDF file: Indexed color space lookup table doesn't
                        match size, in 4 files, on 8 pages, 73 times
      path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
      path/to/0000/0000364.pdf 5 6
      path/to/0000/0000918.pdf 5
      path/to/0000/0000683.pdf 8
2023-12-07 08:10:40 +00:00
Nico Weber
832a065687 LibPDF: For low-bpp images, start scanlines on byte boundaries
Required per spec, and we get slanted images without it. Fixes e.g.
page 1 of 0000749.pdf.
2023-12-07 08:10:40 +00:00
Nico Weber
06b9633da5 LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern
When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat
the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors
to 8-bit nicely, but is the wrong thing to do for palette indices.
Stop doing this for palette indices.

Fixes "Indexed color space index out of range" for 11 files in the
PDF/A 0000.zip test set now that we correctly handle palette indices
as of the previous commit:

    Malformed PDF file: Indexed color space lookup table doesn't match
                        size, in 4 files, on 8 pages, 73 times
      path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
      path/to/0000/0000364.pdf 5 6
      path/to/0000/0000918.pdf 5
      path/to/0000/0000683.pdf 8
2023-12-07 08:10:40 +00:00
Nico Weber
8733ba2734 LibPDF: Fix decoding of IndexedColorSpace for palette sizes != 255
Previously, we were scaling palette indices from 0..(palette_size - 1)
to 0..255 before using them as index into the palette. Instead, do not
scale palette indices before using them as indices.

(Renderer::load_image() uses `component_value_decoders.empend(
.0f, 255.0f, dmin, dmax)`, so to get an identity mapping, we have to
return `0, 255` from IndexedColorSpace::default_decode()).

Fixes rendering of the gradient on page 5 of 0000277.pdf.
2023-12-06 15:32:13 +01:00
Nico Weber
4cb0593daf LibPDF: Convert LAB values to bytes differently
Gfx::ICC::Profile's current API takes bytes, so we need to do some
contortions for LAB values to go through.

This will probably become nicer once we implement all the backward
transforms in Gfx::ICC::Profile, but for now let's hack it in
on the LibPDF side.

Makes colors in 0000651.pdf looks good, especially on pages 1 and 7-12.
2023-12-05 11:36:44 -05:00
Nico Weber
b2a1130556 LibGfx/ICC: Implement conversion between different connection spaces
If one profile uses PCSXYZ and the other PCSLAB as connection space,
we now do the necessary XYZ/LAB conversion.

With this and the previous commits, we can now convert from profiles
that use PCSLAB with mAB, such as stress.jpeg from
https://littlecms.com/blog/2020/09/09/browser-check/ :

    % Build/lagom/icc --name sRGB --reencode-to serenity-sRGB.icc
    % Build/lagom/bin/image -o out.png \
        --convert-to-color-profile serenity-sRGB.icc \
        ~/src/jpegfiles/stress.jpeg
2023-12-04 08:02:36 +00:00
Nico Weber
1c88b82dfc LibPDF: Do less work in SampledFunction::evaluate()'s inner loop
Instead of recomputing the left index and the float amount in that
interval for each coordinate all the time, do it once when we
preprocess the input coordinates.

One line less, faster, and arguably easier to read.

No behavior change.
2023-12-02 22:26:13 +01:00
Nico Weber
54883b7d41 LibPDF: Remove get_bounds lambda in SampledFunction::evaluate()
Using `min()` to guarantee the left index is never == `size() - 1`,
even for an interpolation value of 1.0, is less code, and arguably
easier to understand as well.

No behavior change.
2023-12-02 22:26:13 +01:00
Nico Weber
d9fd72007e LibPDF: Add a spec comment to SampledFunction::sample() 2023-12-02 22:26:13 +01:00
Idan Horowitz
aad5c58996 LibPDF: Eliminate reference cycle between OutlineItem parent/children
Since all parents held a reference pointer to their children, and all
children held reference pointers to their parents, both objects would
never get free'd once the document was no longer being used.

Fixes ossfuzz-63833.
2023-12-02 22:23:53 +01:00
Lucas CHOLLET
2a5cb5becb LibCompress: Add LZWDecoder::decode_all()
This method takes bytes as input and decompress everything to a
ByteBuffer. It uses two control codes (clear and end of data) as
described in the GIF, TIFF and PDF specifications.
2023-12-01 12:58:14 +01:00
Nico Weber
f34da6396f LibPDF: Update font size after getting font from cache
Page 1 of 0000277.pdf does:

    BT 22 0 0 22  59  28 Tm /TT2 1 Tf
        (Presented at Photonics West OPTO, February 17, 2016) Tj ET
    BT 32 0 0 32 269 426 Tm /TT1 1 Tf
        (Robert W. Boyd) Tj ET
    BT 22 0 0 22 253 357 Tm /TT2 1 Tf
        (Department of Physics and) Tj ET
    BT 22 0 0 22 105 326 Tm /TT2 1 Tf
        (Max-Planck Centre for Extreme and Quantum Photonics) Tj ET

Every line begins a text operation, then updates the font matrix,
selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text
operation.

`Tm` (which sets the font matrix) contains a scale, and uses that
to update the font size of the currently-active font (cf #20084).
But in this file, we `Tm` first and `Tf` (font selection) second,
so this updates the size of the old font. So when we pull it out
of the cache again on line 3, it would still have the old size
from the `Tm` on line 2.

(The whole text scaling logic in LibPDF imho needs a rethink; the
current approach also causes issues with zero-width glyphs which
currently lead to divisions by zero. But that's for another PR.)

Fixes another regression from c8510b58a3 (which I've accidentally
referred to by 2340e834cd in another commit).
2023-11-26 19:05:13 -05:00
Nico Weber
eb1c99bd72 LibPDF+LibGfx: Make SMasks on jpeg images work
SMasks are greyscale images that get used as alpha channel for a
different image.

JPEGs in PDFs are stored as streams with /DCTDecode filters, and
we have a separate code path for loading those in the PDF renderer.
That code path just calls our JPEG decoder, which creates bitmaps
with format BGRx8888.

So when we process an SMask for such a bitmap, we have to change
the bitmap's format to BGRA8888 in addition to setting alpha values
on all pixels.
2023-11-23 12:13:03 +01:00
Nico Weber
57e2b5ef59 LibPDF+Tests: Correctly decode text strings without explicit encoding 2023-11-22 09:08:06 -07:00
Nico Weber
e39a790c82 LibPDF: Stop converting encodings in object parser
Per 1.7 spec 3.8.1, there are multiple logical text string types:
* text strings
* ASCII strings
* byte strings

Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0)
UTF-8.

But byte strings shouldn't be converted but treated as binary
data.

This makes us no longer convert strings used for drawing page text.
TABLE 5.6 "Text-showing operators" lists the operands for text-showing
operators as just "string", not "text string" (even though these strings
confusingly are called "text strings" in the body text), so not doing
this there is correct (and matches other viewers).

We also no longer incorrectly convert strings used for cypto data
(such as passwords), if they start with an UTF-16BE or UTF-8 marker.

No behavior change for outlines and info dict entries.

https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of
this.

(ASCII strings only contain ASCII characters and behave the same
anyways.)
2023-11-22 09:08:06 -07:00
Nico Weber
14bcb5219d LibPDF: Tolerate comments before drawing operators
Necessary to be able to render
https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf
2023-11-22 08:56:43 +00:00
Nico Weber
9e8cf4fc1a LibPDF: Tolerate comment after last dict item
Necessary to be able to open
https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf
2023-11-22 08:56:43 +00:00
Nico Weber
4440452f92 LibPDF: Support images with 1, 2, 4 bits per pixel
They just get upsampled to 8 bits per pixel images.
2023-11-18 07:33:15 +00:00
Nico Weber
bfe27228a3 LibPDF+LibGfx: Don't invert CMYK channels in JPEG data in PDFs
This is a hack: Ideally we'd have a CMYK Bitmap pixel format,
and we'd convert to rgb at blit time. Then we could also apply color
profiles (which for CMYK images are CMYK-based).

Also, the colors for our CMYK->RGB conversion are off for PDFs,
and we have distinct codepaths for this in Gfx::Color (for paths)
and JPEGs. So when we fix that, we'll have to fix it in two places.

But this doesn't require a lot of code and it's a huge visual
progression, so let's go with it for now.
2023-11-17 22:32:40 +00:00
Nico Weber
bd7ae7f91e LibPDF: Consistently asciibetize CommonNames.h
The file wasn't quite decided if it wanted to sort by ascii value
or by case folding. Now it uses ascii value, thanks to vim's
`:'<,'>sort`.

No behavior change.
2023-11-17 20:27:42 +00:00
Nico Weber
29396415d5 LibPDF: Add an initial implementation of type 3 glyph rendering
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.

We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.

Good test files from pdfa dataset 0000.zip:

- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
  text appears mirrored if the font matrix isn't handled correctly

- 0000425.pdf, page 1: Draws several glyphs in a single run;
  glyphs overlap if Renderer::render_type3_glyph() ignores the
  passed-in point

- 0000211.pdf, any page: Uses type 3 glyphs for all text.
  Good perf test (already "reasonably fast")

- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
  purple box is a type 3 font glyph, and it's colored (which in part
  means the first operator is `d0`, while all the other documents above
  use `d1`)
2023-11-17 19:47:53 +00:00
Nico Weber
14ddab5519 LibPDF: Stub out type3_font_set_glyph_width*
Type 3 font glyphs begin with either `d0` or `d1`. If we bail out
with an "unsupported" error on the very first operator in a glyph,
we'll never paint the glyph.

Just stub these out for now. We probably want to do more in here in
the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).
2023-11-17 19:47:53 +00:00
Nico Weber
54c98a46d8 LibPDF: Correctly parse the d0 and d1 operators
They are the first operator in a type 3 charproc.
Operator.h already knew about them, but we didn't manage to parse
them, since they're the only two operators that contain a digit.
2023-11-17 19:47:53 +00:00
Nico Weber
5513f8bbe3 LibPDF: Move ScopedState from a function on Renderer into Renderer
No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
126a0be595 LibPDF: Pass Renderer to SimpleFont::draw_glyph()
This makes it available in Type3Font::draw_glyph().

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
bcc6439b5f LibPDF: Pass Renderer to PDFFont::draw_string()
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.

On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.

(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
e0c0864ddf LibPDF: Load a few values off a type 3 font dictionary 2023-11-17 19:47:53 +00:00
Nico Weber
9632d8ee49 LibPDF: Make SimpleFont font matrix configurable
Type 3 fonts can set it to a custom value.
2023-11-17 19:47:53 +00:00
Nico Weber
4cd1a2d319 LibPDF: Add some scaffolding for type 3 fonts 2023-11-17 19:47:53 +00:00
Nico Weber
7f999b1ff5 LibPDF: Sink m_base_font_name from PDFFont into subclasses
/BaseFont is a required key for type 0, type 1, and truetype
font dictionaries, but not for type 3 font dictionaries.

This is mechanical; type 0 fonts don't even use this yet
(but probably should).

PDFFont::initialize() is now empty and could be removed,
but maybe we'll put stuff there again later, so I'm leaving
it around for a bit longer.
2023-11-17 19:47:53 +00:00