Type0 fonts can be either CFF-based or TrueType-based.
Create a subclass for each, put in some spec text, and
give each case a dedicated error code, so that `--debugging-stats`
can tell me which branch is more common.
None of the methods actually do anything, but we now create an
actual SeparationColorSpace object for /Separation color spaces.
This fixes a crash on page 810 of pdf_reference_1-7.pdf.
Previously, we'd log a "separation color space not supported" error,
which would lead to Renderer not updating its current color space.
It'd stay a DeviceCYMK color space, which would then later assert
when it got a 1-argument array as color (which now the
SeparationColorSpace gets instead, which logs an "unimplemented"
error for that instead of asserting).
Instead, they're now turned into a diagnostic like other rendering
problems, looking like so:
Internal error while processing PDF file:
Unsupported chroma subsampling factors
Makes us no longer crash rendering page 1141 of pdf_reference_1.7-pdf.
That way, we render an incomplete page and log a message instead of
crashing the viewer application.
Lets us survive e.g. page 489 of pdf_reference_1-7.pdf.
It's `"`, not `''`.
Now the `text_next_line_show_string_set_spacing` gets called and logs
a TODO at page render time if `"` is used in a PDF:
warning: Rendering of feature not supported:
draw operation: text_next_line_show_string_set_spacing
It caused a parse error (also at page render time) previously:
[parse_value @ .../LibPDF/Parser.cpp:104]
Parser error at offset 611: Unexpected char """
With this, `pdf` can print info for CIPA_DC-003-2020_E.pdf
(from https://cipa.jp/e/std/std-sec.html), as well as all other
files I've tried.
CIPA_DC-003-2020_E.pdf is special because it quits this loop after
exactly 64 interations, at round_number 63.
While here, also update a comment to use the non-spec-comment style
I'm now using elsewhere in the file.
With this, AESV3 support is complete and CIPA_DC-007-2021_E.pdf
can be opened :^)
(CIPA_DC-003-2020_E.pdf incorrectly cannot be opened yet. This
is due to a minor bug in computing_a_hash_r6_and_later() that
I'll fix a bit later. But except for this minor bug, all AESV3
files I've found so far seem to work.)
try_provide_user_password() calls compute_encryption_key_r6_and_later()
now. This checks both owner and user passwords. (For pre-R6 files,
owner password checking isn't yet implemented, as far as I can tell.)
With this, CIPA_DC-007-2021_E.pdf (or other AESV3-encrypted files)
successfully compute a file encryption key (...and then hit the
TODO() in StandardSecurityHandler::crypt() for AESV3, but it's
still good progress.)
...for handlers of revision 6.
The spec for this algorithm has several quirks:
1. It describes how to authenticate a password as an owner password,
but it redundantly inlines the description of algorithm 12 instead
of referring to it. We just call that algorithm here.
2. It does _not_ describe how to authenticate a password as a user
password before using the password to compute the file encryption
key using an intermediate user key, despite the latter step that
computes the file encryption key refers to the password as
"user password". I added a call to algorithm 11 to check if the
password is the user password that isn't in the spec. Maybe I'm
misunderstanding the spec, but this looks like a spec bug to me.
3. It says "using AES-256 in ECB mode with an initialization vector
of zero". ECB mode has no initialization vector. CBC mode with
initialization vector of zero for message length 16 is the same
as ECB mode though, so maybe that's meant? (In addition to the
spec being a bit wobbly, using EBC in new software isn't
recommended, but too late for that.)
SASLprep / stringprep still aren't implemented. For ASCII passwords
(including the important empty password), this is good enough.
...for handlers of revision 6.
Since this adds U to the hash input, also trim the size of U and O to
48 bytes. The spec requires them to be 48 bytes, but all the newer PDFs
on https://cipa.jp/e/std/std-sec.html have 127 bytes -- 48 real bytes
and 79 nul padding bytes. These files were created by:
Creator: Word 用 Acrobat PDFMaker 17
Producer: Adobe PDF Library 15.0
and
Creator: Word 用 Acrobat PDFMaker 17
Producer: Adobe PDF Library 17.11.238
LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and
m_y_scale are immutable once the class is created, so it can get away
with not doing it.
In Type1Font, `width` changes in different calls to
Type1Font::draw_glyph(), so we need to make it part of the cache key.
Fixes rendering of the word "Version" on the first page of
pdf_reference_1-7.pdf.
This is a step towards AESV3 support for PDF files.
The straight-forward way of writing this with our APIs is pretty
allocation-heavy, but this code won't run all that often for the
regular "open PDF, check password" flow.
- `encrypt()` will always fill a multiple of block size,
`decrypt()` might produce less data. But other than that,
the middle span isn't modified even though it's a reference.
So pass the ByteBuffer to assign() (kind of like before 5998072f15,
but pass-by-move())
- In the encryption code path, assign a single buffer for IV and data
instead of awkwardly copying the data around later.
Thanks to CxByte for suggesting most of this!
No intentional behavior change.
Subsections are generally not contiguous, however this logic assumed
that they were, and kept a persistent "entry_index" count while looping
through all subsections. This commit rewrites the logic to be more
straightforward; just loop through all of the subsections and handle
each one separately.
DeprecatedString::substring() makes a copy of the substring.
Instead, use a StringView, which can make substring views in constant
time.
Reduces time for `pdf --dump-contents image-based-pdf-sample.pdf` to
2.2s (from not completing for 1+ minutes).
That file contains a 221 kB jpeg.
Find it on the internet here:
https://nlsblog.org/wp-content/uploads/2020/06/image-based-pdf-sample.pdf
This detects AESV3, and copies over the spec comments explaining what
needs to be done, but doesn't actually do it yet.
AESV3 is technically PDF 2.0-only, but
https://cipa.jp/std/documents/download_e.html?CIPA_DC-007-2021_E has a
1.7 PDF that uses it.
Previously we'd claim that we need a password to decrypt it.
Now, we cleanly crash with a TODO() \o/
- No , between array or dict elements
- `stream` goes in front of stream data, _after_ the stream dict
Also, print string contents as ASCII if the string data is mostly ASCII.
We now track it in the graphics state. It isn't used for anything yet.
Fixes the one thing that rendering the first 100 pages of
pdf_reference_1-7.pdf complains about.
With this, looking at page 2 of pdf_reference_1-7.pdf no longer crashes.
Why did it crash in the first place? Because due to this bug, CFF.cpp
failed to parse the font program for the font used to render the `®`
character. `Renderer::render()` adds all errors that are encounterd
to an `errors` object but continues rendering. That meant that the
previous font was still active, and that didn't have a width for that
symbol in its width table.
SimpleFont::draw_string() falls back to get_glyph_width() if there's
no entry for a character for a symbol. `Type1Font::get_glyph_width()`
always dereferences `m_font` in that method, even if the font has
a font program (and m_font is hence nullptr).
With the off-by-one fixed, the second font is successfully installed
as current font, and the second font has a width entry for that symbol,
so the problem no longer occurs.
There were two problems:
1. parse_compressed_object_with_index() parses indirect objects
without going through Parser::parse_indirect_value(), so
push_reference() / pop_reference() weren't called.
Manually call them, both for the indirect object containing
the object stream and for the indirect object within the
object stream.
2. The indirect object within the object stream got decrypted
twice: Once when the object stream data itself got decrypted,
and then incorrectly a second time when the object data within
the stream was read. To fix, disable encryption while parsing
object stream data (since it's already decrypted).
The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/
which according to readme.md at the same location is CC0.