1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-18 11:55:07 +00:00
Commit graph

622 commits

Author SHA1 Message Date
Nico Weber
ea6fed627a LibPDF: Get color rendering intent from image dict
Still not used for anything, so no behavior change.
2023-10-20 08:58:52 +02:00
Nico Weber
ebba24b848 LibPDF: Fix lookup of built-in Bold Italic strings
Liberation*-BoldItalic.ttf apparently self-identifies as "Bold Italic",
not "BoldItalic".
2023-10-19 16:52:49 -04:00
Nico Weber
708d5e2fe6 LibPDF: Implement color_rendering_intent operator
Implements the `ri` operator, and the `RI` key in a graphics state
dictionary.

We don't do anything yet with the color rendering intent except
store it.

No behavior change except removing a few "not yet implemented"
messages.
2023-10-19 16:51:16 -04:00
Nico Weber
609e640530 LibPDF: Try harder to use a RAII object to restore state
Follow-up to #21489. There, I made us use a RAII object.

That's great, but if the embedded instruction stream pushes
its own graphics state, then an early return would cause us to
not process graphics state pop instructions in the embedded stream.

To fix this, remember the graphics stack depth before entering
the nested instruction stream, and explicitly shrink the stack back
to that size upon exit.

Enables us to render all pages of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
without crashing.
2023-10-19 16:49:00 -04:00
Nico Weber
b835d2bd66 LibPDF: Use a RAII object to restore state in recursive render
Previously, if one operator returned an error, the TRY() would cause
us to return without restoring the outer graphics state, leading to
problems such as handing a 3-tuple to a grayscale color space
(because the inner object set up a grayscale color space that we
failed to dispose of).

Makes us crash later on page 43 of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
2023-10-18 19:43:31 -04:00
Nico Weber
3c2d820391 LibPDF: If softmask has different size than target bitmap, resize it
Size of smask and image aren't guaranteed to be equal by the spec
(...except for /Matte, see page 555 of the PDF 1.7 spec, but we
don't implement that), and in pratice they sometimes aren't.

Fixes an assert on page 4 of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
We now make it all the way to page 43 of 64 before crashing.
2023-10-18 20:03:35 +01:00
Nico Weber
3907374621 LibPDF: Implement support for callgsubr in CFF font programs
Font programs are bytecode programs defining glyphs. If several glyphs
share a piece of outline, that opcode sequence can be put in a
subroutine ("subr") table and the definition of those glyphs can then
call that subroutine by number, to reduce file size.

CFF fonts can in theory contain multiple fonts, and so there's a global
subr table shared by all the fonts in one CFF, and a local per-fornt
subr table.  We used to only implement the local subr table, now we
implement both.

(We only support one font per CFF, and at least in PDF files, that's
all that's ever used. So a global subr table isn't very useful.
But the spec explicitly allows it -- "Global subroutines may be used in
a FontSet even if it only contains one font." -- and it happens in
practice.)
2023-10-18 10:50:32 -04:00
Nico Weber
185573c03f LibPDF: Implement subr_number biasing for CFF font programs 2023-10-18 10:50:32 -04:00
Nico Weber
4dc4de052a LibPDF: Implement opcode 28 for CFF font programs 2023-10-18 10:50:32 -04:00
Nico Weber
44efff81b9 LibPDF: Remove a dbgln() call in CFF subrs decoding
This code is a lot more reliable now than it used to be, and this
dbgln() is quite noisy for some files. So let's remove it.
2023-10-18 10:43:51 -04:00
Nico Weber
02d2d12592 LibPDF: Allow moving Reader::move_to() to end of data stream
CFF::parse_index_data() calls move_to() to put the reader's
current position behind the index data.

In several PDFs, the PrivDictOperator::Subrs case in CFF::create()
sets up a span that contains exactly the Subrs data and nothing
after it, so that finale move_to() call in parse_index_data()
would cause an assert.

This is similar to fe3612ebcb, where the caller was also in CFF.
So maybe CFF just has a different view of what valid values to pass
to Reader are, compared to the rest of the code? But having an iterator
point to one past the valid data in a container is common, so maybe
this is the Right Fix after all.

Fixes a crash opening 411_getting_started_with_instruments.pdf
(and a whole bunch of other WWDC slides). Rendering is pretty glitchy
and we still crash on page 14, but at least we can open the file now.

The file is currently available at:
411cbc60y12x68arcof/411/411_getting_started_with_instruments.pdf
2023-10-18 06:32:23 -04:00
Nico Weber
182639217f LibPDF: Implement GoTo action for outline
Outline items can contain either a /Dest key or an /A key.

The /Dest key points to a "Destination" (various ways to reference a
page in the same document).

The /A key points to an "Action" which can have several types.
One type, the /GoTo type, just also points to a Destination.

Implement GoTo actions. This makes clicking "Contents" in the outline of
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf
work. (Almost all other items in this file's outline use /Dest.
"Contents" could too, but it uses /A /GoTo for some reason.)

(Other action types are things like opening a hyperlink, opening a
different file, playing a sound, submitting a form, etc. Actions
are also used for in-page links, not just in outlines. Many of
these action types we'll likely never want to implement.)
2023-10-18 06:29:02 -04:00
Nico Weber
d9c9510d3c LibPDF: Rename x-macro argument name
I'd like to add a string called `A`, so the argument can't be called
`A` as well.

No behavior change.
2023-10-18 06:29:02 -04:00
Nico Weber
f646e47d46 LibPDF: Extract a create_destination_from_object() function
No big behavior change. The new function now produces an error
if a destination isn't in one of the supported formats.
2023-10-18 06:29:02 -04:00
Nico Weber
46fd6fdfa3 LibPDF: Read Global subr data in CFF reader
This was the last piece of data we didn't read yet.
(We also don't yet support multiple fonts per CFF, but I haven't
found a PDF using that yet.)

We still don't do anything with it, but now we at least print a
warning if this data is there and we ignore it.
2023-10-18 11:02:10 +02:00
Nico Weber
3be5719987 LibPDF: Rename subroutines to local_subroutines in CFF code 2023-10-18 11:02:10 +02:00
Nico Weber
9a0b559932 LibPDF: Tweak formatting of built-in CFF tables
This makes the code look more like the pages in the spec.

No behavior change, whitespace change only.
2023-10-18 11:00:17 +02:00
Nico Weber
f0e7fb7038 LibPDF: Make Subrs optional in PS1FontProgram
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf :

"Using charstring subroutines is not a requirement of a Type 1
font program."

And some versions of Computer Modern do in fact not contain a Subrs
array.

Together with #21473, makes Problemset.pdf from the pdffiles repro
render ok instead of crashing.
2023-10-18 11:00:02 +02:00
Nico Weber
cb961101c7 LibPDF: Implement CFF built-in Standard and Expert encodings
With this, all tables from the spec appendixes are in CFF.cpp.

This fixes a crash reading page 2 (and onward) of
2ThestructureoftheCIE1997ColourAppearanceModelCIECAM97s.pdf in
the pdffiles repo.
2023-10-17 10:21:38 +02:00
Nico Weber
eeada4678c LibPDF: Postpone CFF encoding processing after Top DICT has been read
The encoding offset defaults to 0, i.e. the Standard Encoding.
That means reading the encoding only if the tag is present causes
us to not read it if a font uses the Standard Encoding.

Now, we always read an encoding, even if it's the (implicit) default
one.
2023-10-17 10:21:38 +02:00
Nico Weber
1cfe639b6c LibPDF: Implement CFF supplemental encoding
The main encoding data maps glyph ID ("GID") to its codepoint.
If a glyph has several codepoints, then a secondary table mapping
codepoint to string ID ("SID") of the glyph's name is present.

(A separate table associates each glyph with its name already.)

I haven't seen this used in the wild, but the structure of the
supplemental data is also going to be needed for built-in encodings.
2023-10-17 10:21:38 +02:00
Nico Weber
37daeae6fd LibPDF: Add spec comments, dbgln_if()s to CFF's parse_encoding() 2023-10-17 10:21:38 +02:00
Nico Weber
007d7cdd53 LibPDF: Fix sign (and fixed point) in glyph decoding opcode 24
Two bugs:

1. We decoded a u32, not an i32 as the spec wants
2. (minor) Our fixed-point divisor was off by one

Fixes text rendering in Bakke2010a.pdf in pdffiles, and rendering of
other fonts with negative width adjustments from optcode 255.
That PDF was produced by "Apple pstopdf" and uses font SFBX1200,
which is apparently a variant of Computer Modern. So maybe this
helps with lots of PDFs produced from TeX files, but I haven't
checked that.
2023-10-16 08:33:35 +02:00
Nico Weber
96a4936567 LibPDF: Checking for built-in CFF encodings
Only prints a warning for them for now.

Also warn on the not-yet-implemented encoding supplement.
2023-10-16 08:32:18 +02:00
Nico Weber
414a164850 LibPDF: Be louder about unimplemented CFF dict entries 2023-10-16 08:32:18 +02:00
Nico Weber
c825194fb9 LibPDF: Reject CFFs with more than one font
The code assumes that there's just one Top DICT, so let's be loud
when that isn't the case.
2023-10-16 08:32:18 +02:00
Nico Weber
6f783929dd LibPDF: Implement support for CFF charset format 2
I haven't seen this being used in the wild (yet), but it's easy
to implement, and with this we support all charset formats.

So we can now mention if we see a format we don't know about.
2023-10-15 15:27:15 +02:00
Nico Weber
5b915fb15c LibPDF: Add more spec comments to parse_charset() 2023-10-15 15:27:15 +02:00
Nico Weber
49275c4b17 LibPDF: Don't overflow SIDs in type 1 charset parsing
first_sid has type SID (aka u16), so don't store it in an u8.

This fixes (among other things) page 24 on the PDF 1.7 spec.
2023-10-15 15:27:15 +02:00
Nico Weber
23d6e9f577 LibPDF: Implement CFF built-in charsets ISOAdobe, Expert, Expert Subset 2023-10-15 09:33:34 +02:00
Nico Weber
8060957d8d LibPDF: Use Appendix A instead of Appendix C for standard names
From "10 String INDEX":

"Further space saving is obtained by allocating commonly occurring
strings to predefined SIDs. These strings, known as the standard
strings, describe all the names used in the ISOAdobe and Expert
character sets along with a few other strings common to Type 1 fonts. A
complete list of standard strings is given in Appendix A.  The client
program will contain an array of standard strings with nStoStrings
elements. Thus, the standard strings take SIDs in the range 0 to
(nStaStrings-1)."

And "13 Charsets" says that charsets store SIDs.

Fixes all

    "Couldn't find string for SID $n, going with space"

messages when going through the encoding pages (page 1010 and
thereabouts) in the PDF 1.7 spec.
2023-10-15 09:33:34 +02:00
Nico Weber
aba787a441 LibPDF: Implement reading of CFF String Index
Only really useful for reading SIDs in the Top DICT (copyright
text etc), which we currently don't do.

I haven't seen a difference from looking things up in the string
table. The only real effect from the commit that I need is that
it pulls a local resolve() labmda into a real function
resolve_sid(), which I want to call in a future commit.

But it makes things more spec-compliant, and if we ever want to
read SIDs in metadata in the future, now we can.
2023-10-15 09:33:34 +02:00
Nico Weber
3c49d0dad3 LibPDF: Add a CFF_DEBUG toggle
I'd like to put some debug prints behind this soon.

No behavior change.
2023-10-15 07:14:29 +02:00
Ali Mohammad Pur
aeee98b3a1 AK+Everywhere: Remove the null state of DeprecatedString
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>

Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
2023-10-13 18:33:21 +03:30
Nico Weber
2249e79630 LibPDF: Add two FIXMEs 2023-10-13 07:53:27 +02:00
Nico Weber
d451197d3d LibPDF: Add spec comments to CFF 2023-10-13 07:53:27 +02:00
Nico Weber
349996f7f2 LibPDF: Don't crash on files with float CFF defaultWidthX
We'd unconditionally get the int from a Variant<int, float> here,
but PDFs often have a float for defaultWidthX and nominalWidthX.

Fixes crash opening Bakke2010a.pdf from pdffiles (but while the
file loads ok, it looks completely busted).
2023-10-12 19:43:57 +02:00
Nico Weber
c8510b58a3 LibPDF: Cache fonts per page
Previously, every time a page switched fonts, we'd completely
re-parse the font.

Now, we cache fonts in Renderer, effectively caching them per page.

It'd be nice to have an LRU cache across pages too, but that's a
bigger change, and this already helps a lot.

Font size is part of the cache key, which means we re-parse the same
font at different font sizes. That could be better too, but again,
it's a big help as-is already.

Takes rendering the 1310 pages of the PDF 1.7 reference with

    Build/lagom/bin/pdf --debugging-stats \
        ~/Downloads/pdf_reference_1-7.pdf

from 71 s to 11s :^)

Going through pages especially in the index is noticeably snappier.

(On the PDF 2.0 spec, ISO_32000-2-2020_sponsored.pdf, it's less
dramatic: From 19s to 16s.)
2023-10-11 07:10:19 +02:00
Andreas Kling
13db3c5ce0 LibGfx: Convert FontDatabase APIs to use FlyString 2023-09-06 11:29:03 -04:00
Nico Weber
934340d845 LibPDF: Add FIXME for CIDFontType2 creation
Move some code only needed for CIDFontType2 creation into a new
function and add a FIXME describing what needs to happen there.
2023-08-14 16:26:09 +02:00
Nico Weber
1c263eee61 LibPDF: Add spec comments and FIXMEs to Type0Font::draw_string() 2023-08-14 16:26:09 +02:00
MacDue
6088374ad2 LibPDF: Ensure all subpaths are closed before filling paths
This lets us correctly draw figure 3.4 in pdf_reference_1-7.pdf.
2023-07-25 13:42:40 +02:00
Nico Weber
715b6f868f LibPDF: Sketch out Type0 font support some more
Type0 fonts can be either CFF-based or TrueType-based.
Create a subclass for each, put in some spec text, and
give each case a dedicated error code, so that `--debugging-stats`
can tell me which branch is more common.
2023-07-25 12:10:36 +02:00
Nico Weber
5aab31dc40 LibPDF: Dedicated messages for Indexed and Pattern spaces
Makes them easier to interpret in `pdf --debugging-stats` output.
2023-07-24 11:01:25 -04:00
Nico Weber
fad834a21c LibPDF: Add smoke-and-mirror implementation of SeparationColorSpace
None of the methods actually do anything, but we now create an
actual SeparationColorSpace object for /Separation color spaces.

This fixes a crash on page 810 of pdf_reference_1-7.pdf.
Previously, we'd log a "separation color space not supported" error,
which would lead to Renderer not updating its current color space.
It'd stay a DeviceCYMK color space, which would then later assert
when it got a 1-argument array as color (which now the
SeparationColorSpace gets instead, which logs an "unimplemented"
error for that instead of asserting).
2023-07-24 09:52:01 -04:00
Nico Weber
af5a7b9a51 LibPDF: Don't crash on encrypted files with streams with filter arrays
Makes it possible to render more than 0 pages of CIPA_DC-003-2020_E.pdf
2023-07-24 09:50:45 -04:00
Nico Weber
532230c0e4 LibPDF: Extract a Document::read_filters() method
No behavior change.
2023-07-24 09:50:45 -04:00
Nico Weber
ca1a98ba9f LibPDF: Replace two more crashes with messages 2023-07-23 23:05:32 -04:00
Nico Weber
29c3a9c5f0 LibPDF: Don't crash on images without /Filter
Fixes a crash rendering page 819 of ISO_32000-2-2020_sponsored.pdf
which contains an uncompressed 2x2 1bpp grayscale bitmap.
2023-07-23 23:04:55 -04:00
Nico Weber
7dfa5fc1dc LibPDF: Make JPEG decoding errors not assert
Instead, they're now turned into a diagnostic like other rendering
problems, looking like so:

    Internal error while processing PDF file:
        Unsupported chroma subsampling factors

Makes us no longer crash rendering page 1141 of pdf_reference_1.7-pdf.
2023-07-23 23:04:25 -04:00