1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-25 15:55:07 +00:00
Commit graph

305 commits

Author SHA1 Message Date
Matthew Olsson
5f8fd47214 LibPDF: Resize fonts when the text and line matrices change 2023-07-20 06:56:41 +01:00
Matthew Olsson
9a0e1dde42 LibPDF: Propogate errors from ColorSpace::color() 2023-07-20 06:56:41 +01:00
Matthew Olsson
e989008471 LibPDF: Use proper ICC profiles for ICCBasedColorSpace 2023-07-20 06:56:41 +01:00
Nico Weber
c4bad2186f LibPDF: Implement 7.6.4.3.4 Algorithm 2.B: Computing a hash
This is a step towards AESV3 support for PDF files.

The straight-forward way of writing this with our APIs is pretty
allocation-heavy, but this code won't run all that often for the
regular "open PDF, check password" flow.
2023-07-19 21:26:55 +01:00
Nico Weber
7a48d59727 LibPDF: Simplify AESV2 code a bit
- `encrypt()` will always fill a multiple of block size,
  `decrypt()` might produce less data. But other than that,
  the middle span isn't modified even though it's a reference.
  So pass the ByteBuffer to assign() (kind of like before 5998072f15,
  but pass-by-move())

- In the encryption code path, assign a single buffer for IV and data
  instead of awkwardly copying the data around later.

Thanks to CxByte for suggesting most of this!

No intentional behavior change.
2023-07-18 18:48:57 +02:00
Lucas CHOLLET
4291288a31 LibGfx: Remove ImageDecoderPlugin::initialize()
No plugin is currently overriding the default implementation, which is a
no-op. So we can safely delete it.
2023-07-18 14:34:35 +01:00
Matthew Olsson
edd7de3c77 LibPDF: Fix incorrectly parsing subsections in xref stream
Subsections are generally not contiguous, however this logic assumed
that they were, and kept a persistent "entry_index" count while looping
through all subsections. This commit rewrites the logic to be more
straightforward; just loop through all of the subsections and handle
each one separately.
2023-07-18 00:51:23 +02:00
Matthew Olsson
bfd8faedf9 LibPDF: Assert compressed xref's 2nd field is non-zero 2023-07-18 00:51:23 +02:00
Matthew Olsson
f9c1d11380 LibPDF: Do not crash when linearized length is incorrect
This is a perfectly valid situation, and in this case we should just
parse a standard non-linearized xref table.
2023-07-18 00:51:23 +02:00
Nico Weber
93b3f12680 LibPDF: Fix quadratic runtime in stream dumping
DeprecatedString::substring() makes a copy of the substring.
Instead, use a StringView, which can make substring views in constant
time.

Reduces time for `pdf --dump-contents image-based-pdf-sample.pdf` to
2.2s (from not completing for 1+ minutes).

That file contains a 221 kB jpeg.

Find it on the internet here:
https://nlsblog.org/wp-content/uploads/2020/06/image-based-pdf-sample.pdf
2023-07-14 09:50:30 -04:00
Nico Weber
d18f01d7d7 LibPDF: Simplify a loop
No behavior change.
2023-07-14 09:50:30 -04:00
Nico Weber
281e3158c0 LibPDF: Some preparatory work for AESV3
This detects AESV3, and copies over the spec comments explaining what
needs to be done, but doesn't actually do it yet.

AESV3 is technically PDF 2.0-only, but
https://cipa.jp/std/documents/download_e.html?CIPA_DC-007-2021_E has a
1.7 PDF that uses it.

Previously we'd claim that we need a password to decrypt it.
Now, we cleanly crash with a TODO() \o/
2023-07-14 06:34:03 +02:00
Nico Weber
ca433befa0 LibPDF: Add method to Document to dump a Page and all related objects
...except for the /Parent object, else we'd print all pages :)
2023-07-13 20:29:58 +02:00
Nico Weber
b4c5a7d1a0 LibPDF: Make Object::to_deprecated_string() look more like PDF source
- No , between array or dict elements
- `stream` goes in front of stream data, _after_ the stream dict

Also, print string contents as ASCII if the string data is mostly ASCII.
2023-07-13 20:29:58 +02:00
Nico Weber
c625ba34fe LibPDF: Implement set_flatness_tolerance
We now track it in the graphics state. It isn't used for anything yet.
Fixes the one thing that rendering the first 100 pages of
pdf_reference_1-7.pdf complains about.
2023-07-12 18:22:52 -04:00
Nico Weber
afb99a67b2 LibPDF: Tweak Page::page_contents() implementation for brevity
Also replace a FIXME with a spec comment that answers it.
2023-07-12 18:22:35 -04:00
Nico Weber
69c965b987 LibPDF: Move code to compute full page contents into Page
Pure code move, no behavior change.
2023-07-12 18:22:35 -04:00
Nico Weber
f4f8a6a1bf LibPDF: Move Page into its own file Page.h 2023-07-12 18:22:35 -04:00
Nico Weber
fe3612ebcb LibPDF: Fix off-by-one in Reader
With this, looking at page 2 of pdf_reference_1-7.pdf no longer crashes.

Why did it crash in the first place? Because due to this bug, CFF.cpp
failed to parse the font program for the font used to render the `®`
character. `Renderer::render()` adds all errors that are encounterd
to an `errors` object but continues rendering. That meant that the
previous font was still active, and that didn't have a width for that
symbol in its width table.

SimpleFont::draw_string() falls back to get_glyph_width() if there's
no entry for a character for a symbol. `Type1Font::get_glyph_width()`
always dereferences `m_font` in that method, even if the font has
a font program (and m_font is hence nullptr).

With the off-by-one fixed, the second font is successfully installed
as current font, and the second font has a width entry for that symbol,
so the problem no longer occurs.
2023-07-12 14:19:14 -04:00
Nico Weber
117a5f1bd2 LibPDF: Remove an unused variable 2023-07-12 19:02:56 +02:00
Nico Weber
323d76fbb9 LibPDF: Make encrypted object streams work
There were two problems:
1. parse_compressed_object_with_index() parses indirect objects
   without going through Parser::parse_indirect_value(), so
   push_reference() / pop_reference() weren't called.
   Manually call them, both for the indirect object containing
   the object stream and for the indirect object within the
   object stream.
2. The indirect object within the object stream got decrypted
   twice: Once when the object stream data itself got decrypted,
   and then incorrectly a second time when the object data within
   the stream was read. To fix, disable encryption while parsing
   object stream data (since it's already decrypted).

The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/
which according to readme.md at the same location is CC0.
2023-07-12 17:16:25 +02:00
Nico Weber
e94f1e38d0 LibPDF: Mark PDF::Error nodiscard
No behavior change. Prevents mistakes like the one fixed in 26de2fd0b2.
2023-07-12 17:03:14 +02:00
Nico Weber
5998072f15 LibPDF: Add support for AESV2 encryption 2023-07-12 06:28:15 +02:00
Nico Weber
67d8c8badb LibPDF: Use more direct method to access linearization dict
We know indirect_value_or_error.value contains an IndirectObject,
so there's no need to go through resolve().

No behavior change.
2023-07-12 06:28:15 +02:00
Nico Weber
39b2eed3f6 LibPDF: Do not crash on encrypted files that start unluckily
PDF files can be linearized. In that case, they start with a
"linearization dict" that stores the key `/Linearized` and the value
`1`. To check if a file is linearized, we just read the first dict, and
then checked if it has that key.

If the first object of a PDF was a stream with a compression filter
and the input PDF was encrypted and not linearized, then us trying to
decode the linearization dict could crash due to stream contents being
encrypted, decryption state not yet being initialized, and us trying
to decompress stream data before decrypting it.

To prevent this, disable uncompression when parsing the first object
to determine if it's a lineralization dictionary.

(A linearization dict never stores string values, so decryption
not yet being initialized is not a problem. Integer values aren't
encrypted in encrypted PDF files.)
2023-07-12 06:28:15 +02:00
Nico Weber
63670f27de LibPDF: Rename m_disable_encryption to m_enable_encryption
Double negation is confusing.

No behavior change.
2023-07-12 06:28:15 +02:00
Nico Weber
92d2895057 LibPDF: Remove a pointless template specialization
We can just have two functions with actual names instead of specializing
on a bool template parameter.

No behavior change.
2023-07-12 06:28:15 +02:00
Nico Weber
ea89053c12 LibPDF: Make PDF version accessible on Document 2023-07-11 13:49:17 -04:00
MacDue
e1cf868e6e LibGfx: Use AntiAliasingPainter::fill_path() for drawing font glyphs
Using the general AA painter fill_path() is indistinguishable from the
previous rasterizer, so this switch simply allows us to share more code.
2023-07-10 20:56:25 +02:00
Nico Weber
c5c940b1c9 LibPDF: Add accessor for the document's info dict
This dict contains some metadata in some files.

Newer files also contain XMP metadata, but it's recommended to
still include this dict as well, for compatibility with older readers.
And it's much less complex than XMP, so let's support it.
2023-07-10 17:49:07 +01:00
Nico Weber
826c0426f3 LibPDF: Fix two use-after-frees
Two lambdas were capturing locals that were out of scope by the
time the lambdas ran.

With this, `pdf` can successfully load and print the page count of
pdf_reference_1.7.pdf.
2023-07-10 17:48:15 +01:00
Nico Weber
6111a9f9d0 LibPDF: Make Reference store two u32s instead of one
Reference used to be clever and stored the index of a ref in 18 bits
and the generation in 14 bits, so that both fit into a single u32.

However:
- It set MAX_REF_INDEX incorrectly (the max value of an 18-bit number
  is `(1 << 18) - 1`, not `(1 << 19) - 1`
- pdf_reference_1-7.pdf has 349223 objects, and that's larger
  than `(1 << 18) - 1` (which is 262143)

Since a Reference is stored in Value which is a Variant that also
stores a pointer, the size of Value is already 64-bit. So just don't
be clever here.

Makes pdf_reference_1-7.pdf get a bit further during decryption.
2023-07-10 17:48:15 +01:00
Timothy Flynn
c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Nico Weber
93357a8b70 LibPDF: Fix a typo in a function name
...and while here, a comment typo too.
2023-07-05 18:42:39 +01:00
Ben Wiederhake
f866c80222 LibPDF: Avoid unnecessary HashMap copy, mark other copies 2023-05-19 22:33:57 +02:00
Ben Wiederhake
da394abe04 LibGfx+Fuzz: Convert ImageDecoder::initialize to ErrorOr
This prevents callers from accidentally discarding the result of
initialize(), which was the root cause of this OSS Fuzz bug:

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55896&q=label%3AProj-serenity&sort=summary
2023-05-12 09:40:24 +01:00
Nico Weber
f56b897622 Everywhere: Fix a few typos
Some even user-visible!
2023-04-12 19:37:35 +02:00
Ben Wiederhake
560133a0c6 Everywhere: Remove unused DeprecatedString includes 2023-04-09 22:00:54 +02:00
Julian Offenhäuser
bdd5f36121 LibPDF: Load replacements for TrueTypeFonts without an embedded font
This previously only happened for Type 1 fonts.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
5deac3a7f5 LibPDF: Actually return an error when failing to load replacement fonts 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
fec7ccf020 LibPDF: Ask OpenType font programs for glyph widths if needed
If the font dictionary didn't specify custom glyph widths, we would fall
back to the specified "missing width" (or 0 in most cases!), which meant
that we would draw glyphs on top of each other in a lot of cases, namely
for TrueTypeFonts or standard Type1Fonts with an OpenType fallback.

What we actually want to do in this case is ask the OpenType font for
the correct width.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
2b3a41be74 LibPDF: Remove the subroutine length limit for PS1 font programs
A limit of 1024 subroutines seemed like a sensible choice, but some
fonts actually do exceed it. We will now only assert that the specified
amount is positive.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
4ec01669fc LibPDF: Scale vector paths with the view
This ensures that lines have the correct size at every scale factor.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
731676c041 LibPDF: Accept floats as line dash pattern phases 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
95a804bc4e LibPDF: Allow the page rotation to be inherited 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
b90a794d78 LibPDF: Allow pages with no specified contents
The contents object may be omitted as per spec, which will just leave
the page blank.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
fde990ead8 LibPDF: Allow optional inheritable page attributes
Previously, get_inheritable_object would always try to find the object
and throw an error if it couldn't. The spec tells us that some page
attributes, like CropBox, are optional but also inheritable. Others,
like the media box and resources, are technically required by the spec,
but omitted by some documents.

In both cases, we are now able to search for inheritable objects and
find a suitable replacement if there wasn't one.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
320f5f91ab LibPDF: Ignore whitespace in the ASCII hex filter
The spec tells us that any amount of whitespace may appear between the
hex digits and that it should just be ignored.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
3400779047 LibPDF: Pass the right point width to the font loader in TrueTypeFont 2023-03-22 09:04:00 +01:00
Julian Offenhäuser
fd78875662 LibPDF: Fix navigate_to_before_eof_marker() for PDFs not ending in EOL
The way this was factored before, we would miss the %%EOF marker if it
didn't have a valid end-of-line sequence after it.
2023-03-22 09:04:00 +01:00