1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-22 10:25:07 +00:00
Commit graph

490 commits

Author SHA1 Message Date
Nico Weber
eb1c99bd72 LibPDF+LibGfx: Make SMasks on jpeg images work
SMasks are greyscale images that get used as alpha channel for a
different image.

JPEGs in PDFs are stored as streams with /DCTDecode filters, and
we have a separate code path for loading those in the PDF renderer.
That code path just calls our JPEG decoder, which creates bitmaps
with format BGRx8888.

So when we process an SMask for such a bitmap, we have to change
the bitmap's format to BGRA8888 in addition to setting alpha values
on all pixels.
2023-11-23 12:13:03 +01:00
Nico Weber
57e2b5ef59 LibPDF+Tests: Correctly decode text strings without explicit encoding 2023-11-22 09:08:06 -07:00
Nico Weber
e39a790c82 LibPDF: Stop converting encodings in object parser
Per 1.7 spec 3.8.1, there are multiple logical text string types:
* text strings
* ASCII strings
* byte strings

Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0)
UTF-8.

But byte strings shouldn't be converted but treated as binary
data.

This makes us no longer convert strings used for drawing page text.
TABLE 5.6 "Text-showing operators" lists the operands for text-showing
operators as just "string", not "text string" (even though these strings
confusingly are called "text strings" in the body text), so not doing
this there is correct (and matches other viewers).

We also no longer incorrectly convert strings used for cypto data
(such as passwords), if they start with an UTF-16BE or UTF-8 marker.

No behavior change for outlines and info dict entries.

https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of
this.

(ASCII strings only contain ASCII characters and behave the same
anyways.)
2023-11-22 09:08:06 -07:00
Nico Weber
14bcb5219d LibPDF: Tolerate comments before drawing operators
Necessary to be able to render
https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf
2023-11-22 08:56:43 +00:00
Nico Weber
9e8cf4fc1a LibPDF: Tolerate comment after last dict item
Necessary to be able to open
https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf
2023-11-22 08:56:43 +00:00
Nico Weber
4440452f92 LibPDF: Support images with 1, 2, 4 bits per pixel
They just get upsampled to 8 bits per pixel images.
2023-11-18 07:33:15 +00:00
Nico Weber
bfe27228a3 LibPDF+LibGfx: Don't invert CMYK channels in JPEG data in PDFs
This is a hack: Ideally we'd have a CMYK Bitmap pixel format,
and we'd convert to rgb at blit time. Then we could also apply color
profiles (which for CMYK images are CMYK-based).

Also, the colors for our CMYK->RGB conversion are off for PDFs,
and we have distinct codepaths for this in Gfx::Color (for paths)
and JPEGs. So when we fix that, we'll have to fix it in two places.

But this doesn't require a lot of code and it's a huge visual
progression, so let's go with it for now.
2023-11-17 22:32:40 +00:00
Nico Weber
bd7ae7f91e LibPDF: Consistently asciibetize CommonNames.h
The file wasn't quite decided if it wanted to sort by ascii value
or by case folding. Now it uses ascii value, thanks to vim's
`:'<,'>sort`.

No behavior change.
2023-11-17 20:27:42 +00:00
Nico Weber
29396415d5 LibPDF: Add an initial implementation of type 3 glyph rendering
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.

We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.

Good test files from pdfa dataset 0000.zip:

- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
  text appears mirrored if the font matrix isn't handled correctly

- 0000425.pdf, page 1: Draws several glyphs in a single run;
  glyphs overlap if Renderer::render_type3_glyph() ignores the
  passed-in point

- 0000211.pdf, any page: Uses type 3 glyphs for all text.
  Good perf test (already "reasonably fast")

- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
  purple box is a type 3 font glyph, and it's colored (which in part
  means the first operator is `d0`, while all the other documents above
  use `d1`)
2023-11-17 19:47:53 +00:00
Nico Weber
14ddab5519 LibPDF: Stub out type3_font_set_glyph_width*
Type 3 font glyphs begin with either `d0` or `d1`. If we bail out
with an "unsupported" error on the very first operator in a glyph,
we'll never paint the glyph.

Just stub these out for now. We probably want to do more in here in
the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).
2023-11-17 19:47:53 +00:00
Nico Weber
54c98a46d8 LibPDF: Correctly parse the d0 and d1 operators
They are the first operator in a type 3 charproc.
Operator.h already knew about them, but we didn't manage to parse
them, since they're the only two operators that contain a digit.
2023-11-17 19:47:53 +00:00
Nico Weber
5513f8bbe3 LibPDF: Move ScopedState from a function on Renderer into Renderer
No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
126a0be595 LibPDF: Pass Renderer to SimpleFont::draw_glyph()
This makes it available in Type3Font::draw_glyph().

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
bcc6439b5f LibPDF: Pass Renderer to PDFFont::draw_string()
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.

On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.

(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
e0c0864ddf LibPDF: Load a few values off a type 3 font dictionary 2023-11-17 19:47:53 +00:00
Nico Weber
9632d8ee49 LibPDF: Make SimpleFont font matrix configurable
Type 3 fonts can set it to a custom value.
2023-11-17 19:47:53 +00:00
Nico Weber
4cd1a2d319 LibPDF: Add some scaffolding for type 3 fonts 2023-11-17 19:47:53 +00:00
Nico Weber
7f999b1ff5 LibPDF: Sink m_base_font_name from PDFFont into subclasses
/BaseFont is a required key for type 0, type 1, and truetype
font dictionaries, but not for type 3 font dictionaries.

This is mechanical; type 0 fonts don't even use this yet
(but probably should).

PDFFont::initialize() is now empty and could be removed,
but maybe we'll put stuff there again later, so I'm leaving
it around for a bit longer.
2023-11-17 19:47:53 +00:00
Nico Weber
6c1da5db54 LibPDF: Make SimpleFont::draw_glyph() fallible 2023-11-17 19:47:53 +00:00
Nico Weber
843e9daa8c LibPDF: Remove unused PDFFont::type()
This got added in #15270, but its one use then got removed again
in #16150.

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
26fd29baf8 LibPDF: Give Type3 fonts a dedicated error message
They're described in "5.5.4 Type 3 Fonts" in the PDF 1.7 spec, so we
shouldn't `internal_error()` on them. They're just not implemented yet.
2023-11-17 19:47:53 +00:00
Nico Weber
5eaa403ddf LibPDF: Use font dictionary object as cache key, not resource name
In the main page contents, /T0 might refer to a different font than
it might refer to in an XObject. So don't use the `Tf` argument as
font cache key. Instead, use the address of the font dictionary object.

Fixes false cache sharing, and also allows us to share cache entries
if the same font dict is referred to by two different names.

Fixes a regression from 2340e834cd (but keeps the speed-up intact).
2023-11-17 19:14:39 +01:00
Nico Weber
443b3eac77 LibPDF: Let decode_png_prediction() call LibGfx's unfilter_scanline()
It's less code, but it also fixes a bug: The implementation in
Filter.cpp used to use the previous byte as reference value, while
we're supposed to use the value of the previous channel as reference
(at least when a pixel is larger than one byte).
2023-11-17 19:09:50 +01:00
Nico Weber
145ade3a86 LibPDF: Remove a needless AK:: qualification
No behavior change.
2023-11-17 19:09:50 +01:00
Nico Weber
0416a07d56 LibPDF: Make filter byte not part of row in decode_png_prediction()
No behavior change.
2023-11-17 19:09:50 +01:00
Nico Weber
b763960fc2 LibPDF: Convert decode_png_prediction to use spans
No behavior change.
2023-11-17 19:09:50 +01:00
Nico Weber
588d6fab22 LibGfx+LibPDF: Create filter_type() for converting u8 to FilterType
...and use it in LibPDF.

No behavior change.
2023-11-17 19:09:50 +01:00
Nico Weber
7e4fe8e610 LibPDF: Use PNG::paeth_predictor() in png decoding path
No behavior change.

Ideally, the PDF code would just call a function PNGLoader to do the
PNG unfiltering, but let's first try to make the implementations look
more similar.
2023-11-17 19:09:50 +01:00
Lucas CHOLLET
1e8004734f LibPDF: Don't consider the End of Data code as normal ASCII85 input
Data encoded with ASCII85 is terminated with the EOD code 0x7E3E. This
should not be considered as normal input but rather discarded.
2023-11-14 10:15:15 +01:00
Lucas CHOLLET
59a6d4b7bc LibPDF: Factorize duplicated code in Filter::decode_ascii85() 2023-11-14 10:15:15 +01:00
Lucas CHOLLET
2fe0647c68 LibPDF: Handle pdf-specific white spaces correctly in ASCII85
We were previously only looking the space character but PDF white
spaces is a superset of ascii spaces.
2023-11-14 10:15:15 +01:00
Lucas CHOLLET
db08fe12ec LibPDF: Implement Reader::is_[eol, whitespace](char)
These two static members are now used to implement respective `matches_`
methods but will also be useful to provide a global implementation of
the specified concept of whitespace.
2023-11-14 10:15:15 +01:00
Lucas CHOLLET
dac703a0b8 LibPDF: Avoid an unnecessary copy in Filter::decode_ascii85() 2023-11-14 10:15:15 +01:00
Nico Weber
9b022239c3 LibPDF: Apply all offsets of TJ operator
TJ acts on a list of either strings or numbers.
The strings are drawn, and the numbers are treated as offsets.

Previously, we'd only apply the last-seen number as offset when
we saw a string. That had the effect of us ignoring all but the
last number in front of a string, and ignoring numbers at the
end of the list.

Now, we apply all numbers as offsets.
Our rendering of Tests/LibPDF/text.pdf now matches other PDF viewers.
2023-11-14 10:11:09 +01:00
Nico Weber
1c2b0feb7b LibPDF: Change how CFF optional width prefix is stored
Per 5177.Type2.pdf 3.1 "Type 2 Charstring Organization",
a glyph's charstring looks like:

    w? {hs* vs* cm* hm* mt subpath}? {mt subpath}* endchar

The `w?` is the width of the glyph, but it's optional. So all
possible commands after it (hstem* vstem* cntrmask hintmask
moveto endchar) check if there's an extra number at the start
and interpret it as a width, for the very first command we read.

This was done by having an `is_first_command` local bool that
got set to false after the first command. That didn't work with
subrs: If the first command was a call to a subr that just pushed
a bunch of numbers, then the second command after it is the actual
first command.

Instead, move that bool into the state. Set it to false the
first time we try to read a width, since that means we just read
a command that could've been prefixed by a width.
2023-11-14 10:10:34 +01:00
Lucas CHOLLET
9e4d697d23 LibPDF: Detect DCT images correctly
Images can have multiple filters, each one of them is processed
sequentially. Only the last one will be relevant for the image format
(DCT or JPXDecode), so use the last filter instead of the first one to
detect that property.
2023-11-13 10:30:34 -05:00
Nico Weber
f882a3ae37 LibPDF: In ColorSpace creation code, use resolve_to() more
For valid PDFs, this makes no difference.

For invalid PDFs, we now assert during the cast in resolve_to() instead
of returning a PDFError. However, most PDFs are valid, and even for
invalid PDFs, we'd previously keep the old color space around when
getting the PDF error and then usually assert later when the old
color space got passed a color with an unexpected number of components
(since the components were for the new color space).

Doesn't affect any of the > 2000 PDFs I use for testing locally,
is less code, and should make for less surprising asserts when it
does happen.
2023-11-13 10:29:26 -05:00
Lucas CHOLLET
9bc25db9a3 LibPDF: Add support for the LZW filter
This allows us to decode the first page of ThinkingInPostScript.pdf :^)
2023-11-13 14:23:23 +01:00
Lucas CHOLLET
048ef11136 LibPDF: Factorize flate parameters handling to its own function
This part will be shared with the LZW filter, so let's factorize it.
2023-11-13 14:23:23 +01:00
Nico Weber
bbde3cbc90 LibPDF: Tolerate an indirect object as dict for CIE-based color spaces
Namely, for CalGrayColorSpace, CalRGBColorSpace, LabColorSpace.

Fixes a crash rendering any page of Adobe's 5014.CIDFont_Spec.pdf
(which uses CalRGBColorSpace with an indirect dict: The dict is
object `92 0`, and many color spaces are inline objects referring
to it).
2023-11-13 07:12:05 -05:00
Nico Weber
f4a847894f LibPDF: Make SampledFunction::evaluate() work for n-dimensional input
I didn't find example code for this and the AI assistant did very
poorly on this as well. So I had to write it all by myself!

It can be much more efficient I think, but I think the overall
shape is maybe roughly fine.
2023-11-12 07:55:04 +01:00
Nico Weber
a9ef65e64a LibPDF: For multi-output SampledFunctions, fix output colors
For N outputs, the outputs aren't stored in N independent planes.
Instead, N output values are stored right next to each other in
the stream data.
2023-11-11 08:55:37 +01:00
Nico Weber
ec739460e0 LibPDF: Add test for SampledFunction and fix bugs found by it
* SampledFunction now keeps the StreamObject it gets data from alive
  (doesn't matter too much in practice, but does matter in the test,
  where nothing else keeps the stream alive).

* If a sample is an integer, we would previously sample that value
  twice and then divide by zero when interpolating. Make sure to
  sample 1 unit apart.
2023-11-11 08:55:37 +01:00
Nico Weber
323ba7404c LibPDF: Implement SampledFunction::evaluate() for some sampled functions
Things now work for functions that are all of:
* linear
* 1-D input
* 8 bits per sample
2023-11-10 15:03:30 +00:00
Nico Weber
fd1876441a LibPDF: Implement SampledFunction::create() 2023-11-10 15:03:30 +00:00
Nico Weber
cd9f4655ec LibPDF: Tweak implementation of postscript roll op
Since positive offsets roll to the right, it makes more sense
to do the big reverse first. Gets rid of an awkward minus sign.

No behavior change.
2023-11-10 14:45:38 +01:00
Nico Weber
b23ed86889 LibPDF: Implement StitchingFunction::evaluate() 2023-11-10 14:45:16 +01:00
Nico Weber
ba34ddeb21 LibPDF: Implement StitchingFunction creation 2023-11-10 14:45:16 +01:00
Nico Weber
5af6e1c042 LibPDF: Implement DeviceNColorSpace 2023-11-09 23:33:49 +01:00
Nico Weber
0f07049935 LibPDF: Add ColorSpaceFamily::operator==
No behavior change.
2023-11-09 23:33:49 +01:00