1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-15 13:07:35 +00:00
Commit graph

598 commits

Author SHA1 Message Date
Nico Weber
414a164850 LibPDF: Be louder about unimplemented CFF dict entries 2023-10-16 08:32:18 +02:00
Nico Weber
c825194fb9 LibPDF: Reject CFFs with more than one font
The code assumes that there's just one Top DICT, so let's be loud
when that isn't the case.
2023-10-16 08:32:18 +02:00
Nico Weber
6f783929dd LibPDF: Implement support for CFF charset format 2
I haven't seen this being used in the wild (yet), but it's easy
to implement, and with this we support all charset formats.

So we can now mention if we see a format we don't know about.
2023-10-15 15:27:15 +02:00
Nico Weber
5b915fb15c LibPDF: Add more spec comments to parse_charset() 2023-10-15 15:27:15 +02:00
Nico Weber
49275c4b17 LibPDF: Don't overflow SIDs in type 1 charset parsing
first_sid has type SID (aka u16), so don't store it in an u8.

This fixes (among other things) page 24 on the PDF 1.7 spec.
2023-10-15 15:27:15 +02:00
Nico Weber
23d6e9f577 LibPDF: Implement CFF built-in charsets ISOAdobe, Expert, Expert Subset 2023-10-15 09:33:34 +02:00
Nico Weber
8060957d8d LibPDF: Use Appendix A instead of Appendix C for standard names
From "10 String INDEX":

"Further space saving is obtained by allocating commonly occurring
strings to predefined SIDs. These strings, known as the standard
strings, describe all the names used in the ISOAdobe and Expert
character sets along with a few other strings common to Type 1 fonts. A
complete list of standard strings is given in Appendix A.  The client
program will contain an array of standard strings with nStoStrings
elements. Thus, the standard strings take SIDs in the range 0 to
(nStaStrings-1)."

And "13 Charsets" says that charsets store SIDs.

Fixes all

    "Couldn't find string for SID $n, going with space"

messages when going through the encoding pages (page 1010 and
thereabouts) in the PDF 1.7 spec.
2023-10-15 09:33:34 +02:00
Nico Weber
aba787a441 LibPDF: Implement reading of CFF String Index
Only really useful for reading SIDs in the Top DICT (copyright
text etc), which we currently don't do.

I haven't seen a difference from looking things up in the string
table. The only real effect from the commit that I need is that
it pulls a local resolve() labmda into a real function
resolve_sid(), which I want to call in a future commit.

But it makes things more spec-compliant, and if we ever want to
read SIDs in metadata in the future, now we can.
2023-10-15 09:33:34 +02:00
Nico Weber
3c49d0dad3 LibPDF: Add a CFF_DEBUG toggle
I'd like to put some debug prints behind this soon.

No behavior change.
2023-10-15 07:14:29 +02:00
Ali Mohammad Pur
aeee98b3a1 AK+Everywhere: Remove the null state of DeprecatedString
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>

Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
2023-10-13 18:33:21 +03:30
Nico Weber
2249e79630 LibPDF: Add two FIXMEs 2023-10-13 07:53:27 +02:00
Nico Weber
d451197d3d LibPDF: Add spec comments to CFF 2023-10-13 07:53:27 +02:00
Nico Weber
349996f7f2 LibPDF: Don't crash on files with float CFF defaultWidthX
We'd unconditionally get the int from a Variant<int, float> here,
but PDFs often have a float for defaultWidthX and nominalWidthX.

Fixes crash opening Bakke2010a.pdf from pdffiles (but while the
file loads ok, it looks completely busted).
2023-10-12 19:43:57 +02:00
Nico Weber
c8510b58a3 LibPDF: Cache fonts per page
Previously, every time a page switched fonts, we'd completely
re-parse the font.

Now, we cache fonts in Renderer, effectively caching them per page.

It'd be nice to have an LRU cache across pages too, but that's a
bigger change, and this already helps a lot.

Font size is part of the cache key, which means we re-parse the same
font at different font sizes. That could be better too, but again,
it's a big help as-is already.

Takes rendering the 1310 pages of the PDF 1.7 reference with

    Build/lagom/bin/pdf --debugging-stats \
        ~/Downloads/pdf_reference_1-7.pdf

from 71 s to 11s :^)

Going through pages especially in the index is noticeably snappier.

(On the PDF 2.0 spec, ISO_32000-2-2020_sponsored.pdf, it's less
dramatic: From 19s to 16s.)
2023-10-11 07:10:19 +02:00
Andreas Kling
13db3c5ce0 LibGfx: Convert FontDatabase APIs to use FlyString 2023-09-06 11:29:03 -04:00
Nico Weber
934340d845 LibPDF: Add FIXME for CIDFontType2 creation
Move some code only needed for CIDFontType2 creation into a new
function and add a FIXME describing what needs to happen there.
2023-08-14 16:26:09 +02:00
Nico Weber
1c263eee61 LibPDF: Add spec comments and FIXMEs to Type0Font::draw_string() 2023-08-14 16:26:09 +02:00
MacDue
6088374ad2 LibPDF: Ensure all subpaths are closed before filling paths
This lets us correctly draw figure 3.4 in pdf_reference_1-7.pdf.
2023-07-25 13:42:40 +02:00
Nico Weber
715b6f868f LibPDF: Sketch out Type0 font support some more
Type0 fonts can be either CFF-based or TrueType-based.
Create a subclass for each, put in some spec text, and
give each case a dedicated error code, so that `--debugging-stats`
can tell me which branch is more common.
2023-07-25 12:10:36 +02:00
Nico Weber
5aab31dc40 LibPDF: Dedicated messages for Indexed and Pattern spaces
Makes them easier to interpret in `pdf --debugging-stats` output.
2023-07-24 11:01:25 -04:00
Nico Weber
fad834a21c LibPDF: Add smoke-and-mirror implementation of SeparationColorSpace
None of the methods actually do anything, but we now create an
actual SeparationColorSpace object for /Separation color spaces.

This fixes a crash on page 810 of pdf_reference_1-7.pdf.
Previously, we'd log a "separation color space not supported" error,
which would lead to Renderer not updating its current color space.
It'd stay a DeviceCYMK color space, which would then later assert
when it got a 1-argument array as color (which now the
SeparationColorSpace gets instead, which logs an "unimplemented"
error for that instead of asserting).
2023-07-24 09:52:01 -04:00
Nico Weber
af5a7b9a51 LibPDF: Don't crash on encrypted files with streams with filter arrays
Makes it possible to render more than 0 pages of CIPA_DC-003-2020_E.pdf
2023-07-24 09:50:45 -04:00
Nico Weber
532230c0e4 LibPDF: Extract a Document::read_filters() method
No behavior change.
2023-07-24 09:50:45 -04:00
Nico Weber
ca1a98ba9f LibPDF: Replace two more crashes with messages 2023-07-23 23:05:32 -04:00
Nico Weber
29c3a9c5f0 LibPDF: Don't crash on images without /Filter
Fixes a crash rendering page 819 of ISO_32000-2-2020_sponsored.pdf
which contains an uncompressed 2x2 1bpp grayscale bitmap.
2023-07-23 23:04:55 -04:00
Nico Weber
7dfa5fc1dc LibPDF: Make JPEG decoding errors not assert
Instead, they're now turned into a diagnostic like other rendering
problems, looking like so:

    Internal error while processing PDF file:
        Unsupported chroma subsampling factors

Makes us no longer crash rendering page 1141 of pdf_reference_1.7-pdf.
2023-07-23 23:04:25 -04:00
Nico Weber
7b825fb44b LibPDF: Replace two TODO()s with Error returns
That way, we render an incomplete page and log a message instead of
crashing the viewer application.

Lets us survive e.g. page 489 of pdf_reference_1-7.pdf.
2023-07-23 11:42:44 -04:00
Nico Weber
77e6dbab33 LibPDF: Fix symbol for text_next_line_show_string_set_spacing operator
It's `"`, not `''`.

Now the `text_next_line_show_string_set_spacing` gets called and logs
a TODO at page render time if `"` is used in a PDF:

    warning: Rendering of feature not supported:
        draw operation: text_next_line_show_string_set_spacing

It caused a parse error (also at page render time) previously:

    [parse_value @ .../LibPDF/Parser.cpp:104]
        Parser error at offset 611: Unexpected char """
2023-07-22 12:25:30 -04:00
Nico Weber
18b86b1868 LibPDF: Apply text matrix scale to character and word spacing 2023-07-22 12:24:29 -04:00
Nico Weber
e3cc05b935 LibPDF: Don't ignore word_spacing 2023-07-22 12:24:29 -04:00
Nico Weber
164c132928 LibPDF: Fix dumping of toplevel indirects
An indirect object starts `42 0 obj`, not `obj 42 0`.
2023-07-21 10:44:50 -04:00
Nico Weber
f956cd6e6a LibPDF: Fix an off-by-one in computing_a_hash_r6_and_later()
With this, `pdf` can print info for CIPA_DC-003-2020_E.pdf
(from https://cipa.jp/e/std/std-sec.html), as well as all other
files I've tried.

CIPA_DC-003-2020_E.pdf is special because it quits this loop after
exactly 64 interations, at round_number 63.

While here, also update a comment to use the non-spec-comment style
I'm now using elsewhere in the file.
2023-07-21 11:55:20 +02:00
Nico Weber
f26783596d LibPDF: Implement StandardSecurityHandler::crypt for AESV3
With this, AESV3 support is complete and CIPA_DC-007-2021_E.pdf
can be opened :^)

(CIPA_DC-003-2020_E.pdf incorrectly cannot be opened yet. This
is due to a minor bug in computing_a_hash_r6_and_later() that
I'll fix a bit later. But except for this minor bug, all AESV3
files I've found so far seem to work.)
2023-07-21 11:55:20 +02:00
Nico Weber
12e77cba0a LibPDF: Move "7.6.2 General Encryption Algorithm" comment down a bit
The algorithm really only starts a bit later in the function,
so move the comment to there.
2023-07-21 11:55:20 +02:00
Nico Weber
6d0dbaf9d7 LibPDF: Extract aes helper in StandardSecurityHandler::crypt()
No behavior change, pure code move.

We'll use this for AESV3.
2023-07-21 11:55:20 +02:00
Nico Weber
9cbdb334ab LibPDF: Make try_provide_user_password() work for R6+ files
try_provide_user_password() calls compute_encryption_key_r6_and_later()
now. This checks both owner and user passwords. (For pre-R6 files,
owner password checking isn't yet implemented, as far as I can tell.)

With this, CIPA_DC-007-2021_E.pdf (or other AESV3-encrypted files)
successfully compute a file encryption key (...and then hit the
TODO() in StandardSecurityHandler::crypt() for AESV3, but it's
still good progress.)
2023-07-21 11:55:20 +02:00
Nico Weber
0428308420 LibPDF: Implement 7.6.4.3.3 Algorithm 2.A: Retrieve file encryption key
...for handlers of revision 6.

The spec for this algorithm has several quirks:

1. It describes how to authenticate a password as an owner password,
   but it redundantly inlines the description of algorithm 12 instead
   of referring to it. We just call that algorithm here.

2. It does _not_ describe how to authenticate a password as a user
   password before using the password to compute the file encryption
   key using an intermediate user key, despite the latter step that
   computes the file encryption key refers to the password as
   "user password". I added a call to algorithm 11 to check if the
   password is the user password that isn't in the spec. Maybe I'm
   misunderstanding the spec, but this looks like a spec bug to me.

3. It says "using AES-256 in ECB mode with an initialization vector
   of zero". ECB mode has no initialization vector. CBC mode with
   initialization vector of zero for message length 16 is the same
   as ECB mode though, so maybe that's meant? (In addition to the
   spec being a bit wobbly, using EBC in new software isn't
   recommended, but too late for that.)

SASLprep / stringprep still aren't implemented. For ASCII passwords
(including the important empty password), this is good enough.
2023-07-21 11:55:20 +02:00
Nico Weber
f8a3022ca2 LibPDF: Plumb OE, UE, Perms values to StandardSecurityHandler 2023-07-21 11:55:20 +02:00
Nico Weber
57768325cc LibPDF: Implement 7.6.4.4.11 Algorithm 12: Authenticating owner password
...for handlers of revision 6.

Since this adds U to the hash input, also trim the size of U and O to
48 bytes. The spec requires them to be 48 bytes, but all the newer PDFs
on https://cipa.jp/e/std/std-sec.html have 127 bytes -- 48 real bytes
and 79 nul padding bytes. These files were created by:

    Creator: Word 用 Acrobat PDFMaker 17
    Producer: Adobe PDF Library 15.0

and

    Creator: Word 用 Acrobat PDFMaker 17
    Producer: Adobe PDF Library 17.11.238
2023-07-21 11:55:20 +02:00
Nico Weber
8f6c67a71c LibPDF: Implement 7.6.4.4.10 Algorithm 11: Authenticating user password
...for handlers of revision 6.
2023-07-21 11:55:20 +02:00
Nico Weber
f23a394aac LibPDF: Stop using MUST in Encryption.cpp
...and use `release_value_but_fixme_should_propagate_errors()` instead,
as requested by mattco98.
2023-07-21 11:55:20 +02:00
Nico Weber
6caaffa134 LibPDF: Add a few FIXMEs to set_graphics_state_from_dict 2023-07-21 08:17:12 +02:00
Nico Weber
9283c939bb LibPDF: Include width in Type1Font glyph cache key
LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and
m_y_scale are immutable once the class is created, so it can get away
with not doing it.

In Type1Font, `width` changes in different calls to
Type1Font::draw_glyph(), so we need to make it part of the cache key.

Fixes rendering of the word "Version" on the first page of
pdf_reference_1-7.pdf.
2023-07-21 07:01:09 +02:00
Matthew Olsson
5f8fd47214 LibPDF: Resize fonts when the text and line matrices change 2023-07-20 06:56:41 +01:00
Matthew Olsson
9a0e1dde42 LibPDF: Propogate errors from ColorSpace::color() 2023-07-20 06:56:41 +01:00
Matthew Olsson
e989008471 LibPDF: Use proper ICC profiles for ICCBasedColorSpace 2023-07-20 06:56:41 +01:00
Nico Weber
c4bad2186f LibPDF: Implement 7.6.4.3.4 Algorithm 2.B: Computing a hash
This is a step towards AESV3 support for PDF files.

The straight-forward way of writing this with our APIs is pretty
allocation-heavy, but this code won't run all that often for the
regular "open PDF, check password" flow.
2023-07-19 21:26:55 +01:00
Nico Weber
7a48d59727 LibPDF: Simplify AESV2 code a bit
- `encrypt()` will always fill a multiple of block size,
  `decrypt()` might produce less data. But other than that,
  the middle span isn't modified even though it's a reference.
  So pass the ByteBuffer to assign() (kind of like before 5998072f15,
  but pass-by-move())

- In the encryption code path, assign a single buffer for IV and data
  instead of awkwardly copying the data around later.

Thanks to CxByte for suggesting most of this!

No intentional behavior change.
2023-07-18 18:48:57 +02:00
Lucas CHOLLET
4291288a31 LibGfx: Remove ImageDecoderPlugin::initialize()
No plugin is currently overriding the default implementation, which is a
no-op. So we can safely delete it.
2023-07-18 14:34:35 +01:00
Matthew Olsson
edd7de3c77 LibPDF: Fix incorrectly parsing subsections in xref stream
Subsections are generally not contiguous, however this logic assumed
that they were, and kept a persistent "entry_index" count while looping
through all subsections. This commit rewrites the logic to be more
straightforward; just loop through all of the subsections and handle
each one separately.
2023-07-18 00:51:23 +02:00