Implements the `ri` operator, and the `RI` key in a graphics state
dictionary.
We don't do anything yet with the color rendering intent except
store it.
No behavior change except removing a few "not yet implemented"
messages.
Follow-up to #21489. There, I made us use a RAII object.
That's great, but if the embedded instruction stream pushes
its own graphics state, then an early return would cause us to
not process graphics state pop instructions in the embedded stream.
To fix this, remember the graphics stack depth before entering
the nested instruction stream, and explicitly shrink the stack back
to that size upon exit.
Enables us to render all pages of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
without crashing.
Previously, if one operator returned an error, the TRY() would cause
us to return without restoring the outer graphics state, leading to
problems such as handing a 3-tuple to a grayscale color space
(because the inner object set up a grayscale color space that we
failed to dispose of).
Makes us crash later on page 43 of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
Font programs are bytecode programs defining glyphs. If several glyphs
share a piece of outline, that opcode sequence can be put in a
subroutine ("subr") table and the definition of those glyphs can then
call that subroutine by number, to reduce file size.
CFF fonts can in theory contain multiple fonts, and so there's a global
subr table shared by all the fonts in one CFF, and a local per-fornt
subr table. We used to only implement the local subr table, now we
implement both.
(We only support one font per CFF, and at least in PDF files, that's
all that's ever used. So a global subr table isn't very useful.
But the spec explicitly allows it -- "Global subroutines may be used in
a FontSet even if it only contains one font." -- and it happens in
practice.)
CFF::parse_index_data() calls move_to() to put the reader's
current position behind the index data.
In several PDFs, the PrivDictOperator::Subrs case in CFF::create()
sets up a span that contains exactly the Subrs data and nothing
after it, so that finale move_to() call in parse_index_data()
would cause an assert.
This is similar to fe3612ebcb, where the caller was also in CFF.
So maybe CFF just has a different view of what valid values to pass
to Reader are, compared to the rest of the code? But having an iterator
point to one past the valid data in a container is common, so maybe
this is the Right Fix after all.
Fixes a crash opening 411_getting_started_with_instruments.pdf
(and a whole bunch of other WWDC slides). Rendering is pretty glitchy
and we still crash on page 14, but at least we can open the file now.
The file is currently available at:
411cbc60y12x68arcof/411/411_getting_started_with_instruments.pdf
Outline items can contain either a /Dest key or an /A key.
The /Dest key points to a "Destination" (various ways to reference a
page in the same document).
The /A key points to an "Action" which can have several types.
One type, the /GoTo type, just also points to a Destination.
Implement GoTo actions. This makes clicking "Contents" in the outline of
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf
work. (Almost all other items in this file's outline use /Dest.
"Contents" could too, but it uses /A /GoTo for some reason.)
(Other action types are things like opening a hyperlink, opening a
different file, playing a sound, submitting a form, etc. Actions
are also used for in-page links, not just in outlines. Many of
these action types we'll likely never want to implement.)
This was the last piece of data we didn't read yet.
(We also don't yet support multiple fonts per CFF, but I haven't
found a PDF using that yet.)
We still don't do anything with it, but now we at least print a
warning if this data is there and we ignore it.
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf :
"Using charstring subroutines is not a requirement of a Type 1
font program."
And some versions of Computer Modern do in fact not contain a Subrs
array.
Together with #21473, makes Problemset.pdf from the pdffiles repro
render ok instead of crashing.
With this, all tables from the spec appendixes are in CFF.cpp.
This fixes a crash reading page 2 (and onward) of
2ThestructureoftheCIE1997ColourAppearanceModelCIECAM97s.pdf in
the pdffiles repo.
The encoding offset defaults to 0, i.e. the Standard Encoding.
That means reading the encoding only if the tag is present causes
us to not read it if a font uses the Standard Encoding.
Now, we always read an encoding, even if it's the (implicit) default
one.
The main encoding data maps glyph ID ("GID") to its codepoint.
If a glyph has several codepoints, then a secondary table mapping
codepoint to string ID ("SID") of the glyph's name is present.
(A separate table associates each glyph with its name already.)
I haven't seen this used in the wild, but the structure of the
supplemental data is also going to be needed for built-in encodings.
Two bugs:
1. We decoded a u32, not an i32 as the spec wants
2. (minor) Our fixed-point divisor was off by one
Fixes text rendering in Bakke2010a.pdf in pdffiles, and rendering of
other fonts with negative width adjustments from optcode 255.
That PDF was produced by "Apple pstopdf" and uses font SFBX1200,
which is apparently a variant of Computer Modern. So maybe this
helps with lots of PDFs produced from TeX files, but I haven't
checked that.
I haven't seen this being used in the wild (yet), but it's easy
to implement, and with this we support all charset formats.
So we can now mention if we see a format we don't know about.
From "10 String INDEX":
"Further space saving is obtained by allocating commonly occurring
strings to predefined SIDs. These strings, known as the standard
strings, describe all the names used in the ISOAdobe and Expert
character sets along with a few other strings common to Type 1 fonts. A
complete list of standard strings is given in Appendix A. The client
program will contain an array of standard strings with nStoStrings
elements. Thus, the standard strings take SIDs in the range 0 to
(nStaStrings-1)."
And "13 Charsets" says that charsets store SIDs.
Fixes all
"Couldn't find string for SID $n, going with space"
messages when going through the encoding pages (page 1010 and
thereabouts) in the PDF 1.7 spec.
Only really useful for reading SIDs in the Top DICT (copyright
text etc), which we currently don't do.
I haven't seen a difference from looking things up in the string
table. The only real effect from the commit that I need is that
it pulls a local resolve() labmda into a real function
resolve_sid(), which I want to call in a future commit.
But it makes things more spec-compliant, and if we ever want to
read SIDs in metadata in the future, now we can.
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>
Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
We'd unconditionally get the int from a Variant<int, float> here,
but PDFs often have a float for defaultWidthX and nominalWidthX.
Fixes crash opening Bakke2010a.pdf from pdffiles (but while the
file loads ok, it looks completely busted).
Previously, every time a page switched fonts, we'd completely
re-parse the font.
Now, we cache fonts in Renderer, effectively caching them per page.
It'd be nice to have an LRU cache across pages too, but that's a
bigger change, and this already helps a lot.
Font size is part of the cache key, which means we re-parse the same
font at different font sizes. That could be better too, but again,
it's a big help as-is already.
Takes rendering the 1310 pages of the PDF 1.7 reference with
Build/lagom/bin/pdf --debugging-stats \
~/Downloads/pdf_reference_1-7.pdf
from 71 s to 11s :^)
Going through pages especially in the index is noticeably snappier.
(On the PDF 2.0 spec, ISO_32000-2-2020_sponsored.pdf, it's less
dramatic: From 19s to 16s.)
Type0 fonts can be either CFF-based or TrueType-based.
Create a subclass for each, put in some spec text, and
give each case a dedicated error code, so that `--debugging-stats`
can tell me which branch is more common.
None of the methods actually do anything, but we now create an
actual SeparationColorSpace object for /Separation color spaces.
This fixes a crash on page 810 of pdf_reference_1-7.pdf.
Previously, we'd log a "separation color space not supported" error,
which would lead to Renderer not updating its current color space.
It'd stay a DeviceCYMK color space, which would then later assert
when it got a 1-argument array as color (which now the
SeparationColorSpace gets instead, which logs an "unimplemented"
error for that instead of asserting).
Instead, they're now turned into a diagnostic like other rendering
problems, looking like so:
Internal error while processing PDF file:
Unsupported chroma subsampling factors
Makes us no longer crash rendering page 1141 of pdf_reference_1.7-pdf.