serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-24 02:52:32 +00:00

Author	SHA1	Message	Date
Matthew Olsson	5f8fd47214	LibPDF: Resize fonts when the text and line matrices change	2023-07-20 06:56:41 +01:00
Matthew Olsson	9a0e1dde42	LibPDF: Propogate errors from ColorSpace::color()	2023-07-20 06:56:41 +01:00
Matthew Olsson	e989008471	LibPDF: Use proper ICC profiles for ICCBasedColorSpace	2023-07-20 06:56:41 +01:00
Nico Weber	c4bad2186f	LibPDF: Implement 7.6.4.3.4 Algorithm 2.B: Computing a hash This is a step towards AESV3 support for PDF files. The straight-forward way of writing this with our APIs is pretty allocation-heavy, but this code won't run all that often for the regular "open PDF, check password" flow.	2023-07-19 21:26:55 +01:00
Nico Weber	7a48d59727	LibPDF: Simplify AESV2 code a bit - `encrypt()` will always fill a multiple of block size, `decrypt()` might produce less data. But other than that, the middle span isn't modified even though it's a reference. So pass the ByteBuffer to assign() (kind of like before `5998072f15`, but pass-by-move()) - In the encryption code path, assign a single buffer for IV and data instead of awkwardly copying the data around later. Thanks to CxByte for suggesting most of this! No intentional behavior change.	2023-07-18 18:48:57 +02:00
Lucas CHOLLET	4291288a31	LibGfx: Remove `ImageDecoderPlugin::initialize()` No plugin is currently overriding the default implementation, which is a no-op. So we can safely delete it.	2023-07-18 14:34:35 +01:00
Matthew Olsson	edd7de3c77	LibPDF: Fix incorrectly parsing subsections in xref stream Subsections are generally not contiguous, however this logic assumed that they were, and kept a persistent "entry_index" count while looping through all subsections. This commit rewrites the logic to be more straightforward; just loop through all of the subsections and handle each one separately.	2023-07-18 00:51:23 +02:00
Matthew Olsson	bfd8faedf9	LibPDF: Assert compressed xref's 2nd field is non-zero	2023-07-18 00:51:23 +02:00
Matthew Olsson	f9c1d11380	LibPDF: Do not crash when linearized length is incorrect This is a perfectly valid situation, and in this case we should just parse a standard non-linearized xref table.	2023-07-18 00:51:23 +02:00
Nico Weber	93b3f12680	LibPDF: Fix quadratic runtime in stream dumping DeprecatedString::substring() makes a copy of the substring. Instead, use a StringView, which can make substring views in constant time. Reduces time for `pdf --dump-contents image-based-pdf-sample.pdf` to 2.2s (from not completing for 1+ minutes). That file contains a 221 kB jpeg. Find it on the internet here: https://nlsblog.org/wp-content/uploads/2020/06/image-based-pdf-sample.pdf	2023-07-14 09:50:30 -04:00
Nico Weber	d18f01d7d7	LibPDF: Simplify a loop No behavior change.	2023-07-14 09:50:30 -04:00
Nico Weber	281e3158c0	LibPDF: Some preparatory work for AESV3 This detects AESV3, and copies over the spec comments explaining what needs to be done, but doesn't actually do it yet. AESV3 is technically PDF 2.0-only, but https://cipa.jp/std/documents/download_e.html?CIPA_DC-007-2021_E has a 1.7 PDF that uses it. Previously we'd claim that we need a password to decrypt it. Now, we cleanly crash with a TODO() \o/	2023-07-14 06:34:03 +02:00
Nico Weber	ca433befa0	LibPDF: Add method to Document to dump a Page and all related objects ...except for the /Parent object, else we'd print all pages :)	2023-07-13 20:29:58 +02:00
Nico Weber	b4c5a7d1a0	LibPDF: Make Object::to_deprecated_string() look more like PDF source - No , between array or dict elements - `stream` goes in front of stream data, _after_ the stream dict Also, print string contents as ASCII if the string data is mostly ASCII.	2023-07-13 20:29:58 +02:00
Nico Weber	c625ba34fe	LibPDF: Implement set_flatness_tolerance We now track it in the graphics state. It isn't used for anything yet. Fixes the one thing that rendering the first 100 pages of pdf_reference_1-7.pdf complains about.	2023-07-12 18:22:52 -04:00
Nico Weber	afb99a67b2	LibPDF: Tweak Page::page_contents() implementation for brevity Also replace a FIXME with a spec comment that answers it.	2023-07-12 18:22:35 -04:00
Nico Weber	69c965b987	LibPDF: Move code to compute full page contents into Page Pure code move, no behavior change.	2023-07-12 18:22:35 -04:00
Nico Weber	f4f8a6a1bf	LibPDF: Move Page into its own file Page.h	2023-07-12 18:22:35 -04:00
Nico Weber	fe3612ebcb	LibPDF: Fix off-by-one in Reader With this, looking at page 2 of pdf_reference_1-7.pdf no longer crashes. Why did it crash in the first place? Because due to this bug, CFF.cpp failed to parse the font program for the font used to render the `®` character. `Renderer::render()` adds all errors that are encounterd to an `errors` object but continues rendering. That meant that the previous font was still active, and that didn't have a width for that symbol in its width table. SimpleFont::draw_string() falls back to get_glyph_width() if there's no entry for a character for a symbol. `Type1Font::get_glyph_width()` always dereferences `m_font` in that method, even if the font has a font program (and m_font is hence nullptr). With the off-by-one fixed, the second font is successfully installed as current font, and the second font has a width entry for that symbol, so the problem no longer occurs.	2023-07-12 14:19:14 -04:00
Nico Weber	117a5f1bd2	LibPDF: Remove an unused variable	2023-07-12 19:02:56 +02:00
Nico Weber	323d76fbb9	LibPDF: Make encrypted object streams work There were two problems: 1. parse_compressed_object_with_index() parses indirect objects without going through Parser::parse_indirect_value(), so push_reference() / pop_reference() weren't called. Manually call them, both for the indirect object containing the object stream and for the indirect object within the object stream. 2. The indirect object within the object stream got decrypted twice: Once when the object stream data itself got decrypted, and then incorrectly a second time when the object data within the stream was read. To fix, disable encryption while parsing object stream data (since it's already decrypted). The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/ which according to readme.md at the same location is CC0.	2023-07-12 17:16:25 +02:00
Nico Weber	e94f1e38d0	LibPDF: Mark PDF::Error nodiscard No behavior change. Prevents mistakes like the one fixed in 26de2fd0b2.	2023-07-12 17:03:14 +02:00
Nico Weber	5998072f15	LibPDF: Add support for AESV2 encryption	2023-07-12 06:28:15 +02:00
Nico Weber	67d8c8badb	LibPDF: Use more direct method to access linearization dict We know indirect_value_or_error.value contains an IndirectObject, so there's no need to go through resolve(). No behavior change.	2023-07-12 06:28:15 +02:00
Nico Weber	39b2eed3f6	LibPDF: Do not crash on encrypted files that start unluckily PDF files can be linearized. In that case, they start with a "linearization dict" that stores the key `/Linearized` and the value `1`. To check if a file is linearized, we just read the first dict, and then checked if it has that key. If the first object of a PDF was a stream with a compression filter and the input PDF was encrypted and not linearized, then us trying to decode the linearization dict could crash due to stream contents being encrypted, decryption state not yet being initialized, and us trying to decompress stream data before decrypting it. To prevent this, disable uncompression when parsing the first object to determine if it's a lineralization dictionary. (A linearization dict never stores string values, so decryption not yet being initialized is not a problem. Integer values aren't encrypted in encrypted PDF files.)	2023-07-12 06:28:15 +02:00
Nico Weber	63670f27de	LibPDF: Rename m_disable_encryption to m_enable_encryption Double negation is confusing. No behavior change.	2023-07-12 06:28:15 +02:00
Nico Weber	92d2895057	LibPDF: Remove a pointless template specialization We can just have two functions with actual names instead of specializing on a bool template parameter. No behavior change.	2023-07-12 06:28:15 +02:00
Nico Weber	ea89053c12	LibPDF: Make PDF version accessible on Document	2023-07-11 13:49:17 -04:00
MacDue	e1cf868e6e	LibGfx: Use AntiAliasingPainter::fill_path() for drawing font glyphs Using the general AA painter fill_path() is indistinguishable from the previous rasterizer, so this switch simply allows us to share more code.	2023-07-10 20:56:25 +02:00
Nico Weber	c5c940b1c9	LibPDF: Add accessor for the document's info dict This dict contains some metadata in some files. Newer files also contain XMP metadata, but it's recommended to still include this dict as well, for compatibility with older readers. And it's much less complex than XMP, so let's support it.	2023-07-10 17:49:07 +01:00
Nico Weber	826c0426f3	LibPDF: Fix two use-after-frees Two lambdas were capturing locals that were out of scope by the time the lambdas ran. With this, `pdf` can successfully load and print the page count of pdf_reference_1.7.pdf.	2023-07-10 17:48:15 +01:00
Nico Weber	6111a9f9d0	LibPDF: Make Reference store two u32s instead of one Reference used to be clever and stored the index of a ref in 18 bits and the generation in 14 bits, so that both fit into a single u32. However: - It set MAX_REF_INDEX incorrectly (the max value of an 18-bit number is `(1 << 18) - 1`, not `(1 << 19) - 1` - pdf_reference_1-7.pdf has 349223 objects, and that's larger than `(1 << 18) - 1` (which is 262143) Since a Reference is stored in Value which is a Variant that also stores a pointer, the size of Value is already 64-bit. So just don't be clever here. Makes pdf_reference_1-7.pdf get a bit further during decryption.	2023-07-10 17:48:15 +01:00
Timothy Flynn	c911781c21	Everywhere: Remove needless trailing semi-colons after functions This is a new option in clang-format-16.	2023-07-08 10:32:56 +01:00
Nico Weber	93357a8b70	LibPDF: Fix a typo in a function name ...and while here, a comment typo too.	2023-07-05 18:42:39 +01:00
Ben Wiederhake	f866c80222	LibPDF: Avoid unnecessary HashMap copy, mark other copies	2023-05-19 22:33:57 +02:00
Ben Wiederhake	da394abe04	LibGfx+Fuzz: Convert ImageDecoder::initialize to ErrorOr This prevents callers from accidentally discarding the result of initialize(), which was the root cause of this OSS Fuzz bug: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55896&q=label%3AProj-serenity&sort=summary	2023-05-12 09:40:24 +01:00
Nico Weber	f56b897622	Everywhere: Fix a few typos Some even user-visible!	2023-04-12 19:37:35 +02:00
Ben Wiederhake	560133a0c6	Everywhere: Remove unused DeprecatedString includes	2023-04-09 22:00:54 +02:00
Julian Offenhäuser	bdd5f36121	LibPDF: Load replacements for TrueTypeFonts without an embedded font This previously only happened for Type 1 fonts.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	5deac3a7f5	LibPDF: Actually return an error when failing to load replacement fonts	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	fec7ccf020	LibPDF: Ask OpenType font programs for glyph widths if needed If the font dictionary didn't specify custom glyph widths, we would fall back to the specified "missing width" (or 0 in most cases!), which meant that we would draw glyphs on top of each other in a lot of cases, namely for TrueTypeFonts or standard Type1Fonts with an OpenType fallback. What we actually want to do in this case is ask the OpenType font for the correct width.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	2b3a41be74	LibPDF: Remove the subroutine length limit for PS1 font programs A limit of 1024 subroutines seemed like a sensible choice, but some fonts actually do exceed it. We will now only assert that the specified amount is positive.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	4ec01669fc	LibPDF: Scale vector paths with the view This ensures that lines have the correct size at every scale factor.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	731676c041	LibPDF: Accept floats as line dash pattern phases	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	95a804bc4e	LibPDF: Allow the page rotation to be inherited	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	b90a794d78	LibPDF: Allow pages with no specified contents The contents object may be omitted as per spec, which will just leave the page blank.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	fde990ead8	LibPDF: Allow optional inheritable page attributes Previously, get_inheritable_object would always try to find the object and throw an error if it couldn't. The spec tells us that some page attributes, like CropBox, are optional but also inheritable. Others, like the media box and resources, are technically required by the spec, but omitted by some documents. In both cases, we are now able to search for inheritable objects and find a suitable replacement if there wasn't one.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	320f5f91ab	LibPDF: Ignore whitespace in the ASCII hex filter The spec tells us that any amount of whitespace may appear between the hex digits and that it should just be ignored.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	3400779047	LibPDF: Pass the right point width to the font loader in TrueTypeFont	2023-03-22 09:04:00 +01:00
Julian Offenhäuser	fd78875662	LibPDF: Fix navigate_to_before_eof_marker() for PDFs not ending in EOL The way this was factored before, we would miss the %%EOF marker if it didn't have a valid end-of-line sequence after it.	2023-03-22 09:04:00 +01:00

1 2 3 4 5 ...

305 commits