serenity

mirror of https://github.com/RGBCube/serenity synced 2025-07-05 22:47:35 +00:00

Author	SHA1	Message	Date
MacDue	6088374ad2	LibPDF: Ensure all subpaths are closed before filling paths This lets us correctly draw figure 3.4 in pdf_reference_1-7.pdf.	2023-07-25 13:42:40 +02:00
Nico Weber	715b6f868f	LibPDF: Sketch out Type0 font support some more Type0 fonts can be either CFF-based or TrueType-based. Create a subclass for each, put in some spec text, and give each case a dedicated error code, so that `--debugging-stats` can tell me which branch is more common.	2023-07-25 12:10:36 +02:00
Nico Weber	5aab31dc40	LibPDF: Dedicated messages for Indexed and Pattern spaces Makes them easier to interpret in `pdf --debugging-stats` output.	2023-07-24 11:01:25 -04:00
Nico Weber	fad834a21c	LibPDF: Add smoke-and-mirror implementation of SeparationColorSpace None of the methods actually do anything, but we now create an actual SeparationColorSpace object for /Separation color spaces. This fixes a crash on page 810 of pdf_reference_1-7.pdf. Previously, we'd log a "separation color space not supported" error, which would lead to Renderer not updating its current color space. It'd stay a DeviceCYMK color space, which would then later assert when it got a 1-argument array as color (which now the SeparationColorSpace gets instead, which logs an "unimplemented" error for that instead of asserting).	2023-07-24 09:52:01 -04:00
Nico Weber	af5a7b9a51	LibPDF: Don't crash on encrypted files with streams with filter arrays Makes it possible to render more than 0 pages of CIPA_DC-003-2020_E.pdf	2023-07-24 09:50:45 -04:00
Nico Weber	532230c0e4	LibPDF: Extract a Document::read_filters() method No behavior change.	2023-07-24 09:50:45 -04:00
Nico Weber	ca1a98ba9f	LibPDF: Replace two more crashes with messages	2023-07-23 23:05:32 -04:00
Nico Weber	29c3a9c5f0	LibPDF: Don't crash on images without /Filter Fixes a crash rendering page 819 of ISO_32000-2-2020_sponsored.pdf which contains an uncompressed 2x2 1bpp grayscale bitmap.	2023-07-23 23:04:55 -04:00
Nico Weber	7dfa5fc1dc	LibPDF: Make JPEG decoding errors not assert Instead, they're now turned into a diagnostic like other rendering problems, looking like so: Internal error while processing PDF file: Unsupported chroma subsampling factors Makes us no longer crash rendering page 1141 of pdf_reference_1.7-pdf.	2023-07-23 23:04:25 -04:00
Nico Weber	7b825fb44b	LibPDF: Replace two TODO()s with Error returns That way, we render an incomplete page and log a message instead of crashing the viewer application. Lets us survive e.g. page 489 of pdf_reference_1-7.pdf.	2023-07-23 11:42:44 -04:00
Nico Weber	77e6dbab33	LibPDF: Fix symbol for text_next_line_show_string_set_spacing operator It's `"`, not `''`. Now the `text_next_line_show_string_set_spacing` gets called and logs a TODO at page render time if `"` is used in a PDF: warning: Rendering of feature not supported: draw operation: text_next_line_show_string_set_spacing It caused a parse error (also at page render time) previously: [parse_value @ .../LibPDF/Parser.cpp:104] Parser error at offset 611: Unexpected char """	2023-07-22 12:25:30 -04:00
Nico Weber	18b86b1868	LibPDF: Apply text matrix scale to character and word spacing	2023-07-22 12:24:29 -04:00
Nico Weber	e3cc05b935	LibPDF: Don't ignore word_spacing	2023-07-22 12:24:29 -04:00
Nico Weber	164c132928	LibPDF: Fix dumping of toplevel indirects An indirect object starts `42 0 obj`, not `obj 42 0`.	2023-07-21 10:44:50 -04:00
Nico Weber	f956cd6e6a	LibPDF: Fix an off-by-one in computing_a_hash_r6_and_later() With this, `pdf` can print info for CIPA_DC-003-2020_E.pdf (from https://cipa.jp/e/std/std-sec.html), as well as all other files I've tried. CIPA_DC-003-2020_E.pdf is special because it quits this loop after exactly 64 interations, at round_number 63. While here, also update a comment to use the non-spec-comment style I'm now using elsewhere in the file.	2023-07-21 11:55:20 +02:00
Nico Weber	f26783596d	LibPDF: Implement StandardSecurityHandler::crypt for AESV3 With this, AESV3 support is complete and CIPA_DC-007-2021_E.pdf can be opened :^) (CIPA_DC-003-2020_E.pdf incorrectly cannot be opened yet. This is due to a minor bug in computing_a_hash_r6_and_later() that I'll fix a bit later. But except for this minor bug, all AESV3 files I've found so far seem to work.)	2023-07-21 11:55:20 +02:00
Nico Weber	12e77cba0a	LibPDF: Move "7.6.2 General Encryption Algorithm" comment down a bit The algorithm really only starts a bit later in the function, so move the comment to there.	2023-07-21 11:55:20 +02:00
Nico Weber	6d0dbaf9d7	LibPDF: Extract aes helper in StandardSecurityHandler::crypt() No behavior change, pure code move. We'll use this for AESV3.	2023-07-21 11:55:20 +02:00
Nico Weber	9cbdb334ab	LibPDF: Make try_provide_user_password() work for R6+ files try_provide_user_password() calls compute_encryption_key_r6_and_later() now. This checks both owner and user passwords. (For pre-R6 files, owner password checking isn't yet implemented, as far as I can tell.) With this, CIPA_DC-007-2021_E.pdf (or other AESV3-encrypted files) successfully compute a file encryption key (...and then hit the TODO() in StandardSecurityHandler::crypt() for AESV3, but it's still good progress.)	2023-07-21 11:55:20 +02:00
Nico Weber	0428308420	LibPDF: Implement 7.6.4.3.3 Algorithm 2.A: Retrieve file encryption key ...for handlers of revision 6. The spec for this algorithm has several quirks: 1. It describes how to authenticate a password as an owner password, but it redundantly inlines the description of algorithm 12 instead of referring to it. We just call that algorithm here. 2. It does _not_ describe how to authenticate a password as a user password before using the password to compute the file encryption key using an intermediate user key, despite the latter step that computes the file encryption key refers to the password as "user password". I added a call to algorithm 11 to check if the password is the user password that isn't in the spec. Maybe I'm misunderstanding the spec, but this looks like a spec bug to me. 3. It says "using AES-256 in ECB mode with an initialization vector of zero". ECB mode has no initialization vector. CBC mode with initialization vector of zero for message length 16 is the same as ECB mode though, so maybe that's meant? (In addition to the spec being a bit wobbly, using EBC in new software isn't recommended, but too late for that.) SASLprep / stringprep still aren't implemented. For ASCII passwords (including the important empty password), this is good enough.	2023-07-21 11:55:20 +02:00
Nico Weber	f8a3022ca2	LibPDF: Plumb OE, UE, Perms values to StandardSecurityHandler	2023-07-21 11:55:20 +02:00
Nico Weber	57768325cc	LibPDF: Implement 7.6.4.4.11 Algorithm 12: Authenticating owner password ...for handlers of revision 6. Since this adds U to the hash input, also trim the size of U and O to 48 bytes. The spec requires them to be 48 bytes, but all the newer PDFs on https://cipa.jp/e/std/std-sec.html have 127 bytes -- 48 real bytes and 79 nul padding bytes. These files were created by: Creator: Word 用 Acrobat PDFMaker 17 Producer: Adobe PDF Library 15.0 and Creator: Word 用 Acrobat PDFMaker 17 Producer: Adobe PDF Library 17.11.238	2023-07-21 11:55:20 +02:00
Nico Weber	8f6c67a71c	LibPDF: Implement 7.6.4.4.10 Algorithm 11: Authenticating user password ...for handlers of revision 6.	2023-07-21 11:55:20 +02:00
Nico Weber	f23a394aac	LibPDF: Stop using MUST in Encryption.cpp ...and use `release_value_but_fixme_should_propagate_errors()` instead, as requested by mattco98.	2023-07-21 11:55:20 +02:00
Nico Weber	6caaffa134	LibPDF: Add a few FIXMEs to set_graphics_state_from_dict	2023-07-21 08:17:12 +02:00
Nico Weber	9283c939bb	LibPDF: Include `width` in Type1Font glyph cache key LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and m_y_scale are immutable once the class is created, so it can get away with not doing it. In Type1Font, `width` changes in different calls to Type1Font::draw_glyph(), so we need to make it part of the cache key. Fixes rendering of the word "Version" on the first page of pdf_reference_1-7.pdf.	2023-07-21 07:01:09 +02:00
Matthew Olsson	5f8fd47214	LibPDF: Resize fonts when the text and line matrices change	2023-07-20 06:56:41 +01:00
Matthew Olsson	9a0e1dde42	LibPDF: Propogate errors from ColorSpace::color()	2023-07-20 06:56:41 +01:00
Matthew Olsson	e989008471	LibPDF: Use proper ICC profiles for ICCBasedColorSpace	2023-07-20 06:56:41 +01:00
Nico Weber	c4bad2186f	LibPDF: Implement 7.6.4.3.4 Algorithm 2.B: Computing a hash This is a step towards AESV3 support for PDF files. The straight-forward way of writing this with our APIs is pretty allocation-heavy, but this code won't run all that often for the regular "open PDF, check password" flow.	2023-07-19 21:26:55 +01:00
Nico Weber	7a48d59727	LibPDF: Simplify AESV2 code a bit - `encrypt()` will always fill a multiple of block size, `decrypt()` might produce less data. But other than that, the middle span isn't modified even though it's a reference. So pass the ByteBuffer to assign() (kind of like before `5998072f15`, but pass-by-move()) - In the encryption code path, assign a single buffer for IV and data instead of awkwardly copying the data around later. Thanks to CxByte for suggesting most of this! No intentional behavior change.	2023-07-18 18:48:57 +02:00
Lucas CHOLLET	4291288a31	LibGfx: Remove `ImageDecoderPlugin::initialize()` No plugin is currently overriding the default implementation, which is a no-op. So we can safely delete it.	2023-07-18 14:34:35 +01:00
Matthew Olsson	edd7de3c77	LibPDF: Fix incorrectly parsing subsections in xref stream Subsections are generally not contiguous, however this logic assumed that they were, and kept a persistent "entry_index" count while looping through all subsections. This commit rewrites the logic to be more straightforward; just loop through all of the subsections and handle each one separately.	2023-07-18 00:51:23 +02:00
Matthew Olsson	bfd8faedf9	LibPDF: Assert compressed xref's 2nd field is non-zero	2023-07-18 00:51:23 +02:00
Matthew Olsson	f9c1d11380	LibPDF: Do not crash when linearized length is incorrect This is a perfectly valid situation, and in this case we should just parse a standard non-linearized xref table.	2023-07-18 00:51:23 +02:00
Nico Weber	93b3f12680	LibPDF: Fix quadratic runtime in stream dumping DeprecatedString::substring() makes a copy of the substring. Instead, use a StringView, which can make substring views in constant time. Reduces time for `pdf --dump-contents image-based-pdf-sample.pdf` to 2.2s (from not completing for 1+ minutes). That file contains a 221 kB jpeg. Find it on the internet here: https://nlsblog.org/wp-content/uploads/2020/06/image-based-pdf-sample.pdf	2023-07-14 09:50:30 -04:00
Nico Weber	d18f01d7d7	LibPDF: Simplify a loop No behavior change.	2023-07-14 09:50:30 -04:00
Nico Weber	281e3158c0	LibPDF: Some preparatory work for AESV3 This detects AESV3, and copies over the spec comments explaining what needs to be done, but doesn't actually do it yet. AESV3 is technically PDF 2.0-only, but https://cipa.jp/std/documents/download_e.html?CIPA_DC-007-2021_E has a 1.7 PDF that uses it. Previously we'd claim that we need a password to decrypt it. Now, we cleanly crash with a TODO() \o/	2023-07-14 06:34:03 +02:00
Nico Weber	ca433befa0	LibPDF: Add method to Document to dump a Page and all related objects ...except for the /Parent object, else we'd print all pages :)	2023-07-13 20:29:58 +02:00
Nico Weber	b4c5a7d1a0	LibPDF: Make Object::to_deprecated_string() look more like PDF source - No , between array or dict elements - `stream` goes in front of stream data, _after_ the stream dict Also, print string contents as ASCII if the string data is mostly ASCII.	2023-07-13 20:29:58 +02:00
Nico Weber	c625ba34fe	LibPDF: Implement set_flatness_tolerance We now track it in the graphics state. It isn't used for anything yet. Fixes the one thing that rendering the first 100 pages of pdf_reference_1-7.pdf complains about.	2023-07-12 18:22:52 -04:00
Nico Weber	afb99a67b2	LibPDF: Tweak Page::page_contents() implementation for brevity Also replace a FIXME with a spec comment that answers it.	2023-07-12 18:22:35 -04:00
Nico Weber	69c965b987	LibPDF: Move code to compute full page contents into Page Pure code move, no behavior change.	2023-07-12 18:22:35 -04:00
Nico Weber	f4f8a6a1bf	LibPDF: Move Page into its own file Page.h	2023-07-12 18:22:35 -04:00
Nico Weber	fe3612ebcb	LibPDF: Fix off-by-one in Reader With this, looking at page 2 of pdf_reference_1-7.pdf no longer crashes. Why did it crash in the first place? Because due to this bug, CFF.cpp failed to parse the font program for the font used to render the `®` character. `Renderer::render()` adds all errors that are encounterd to an `errors` object but continues rendering. That meant that the previous font was still active, and that didn't have a width for that symbol in its width table. SimpleFont::draw_string() falls back to get_glyph_width() if there's no entry for a character for a symbol. `Type1Font::get_glyph_width()` always dereferences `m_font` in that method, even if the font has a font program (and m_font is hence nullptr). With the off-by-one fixed, the second font is successfully installed as current font, and the second font has a width entry for that symbol, so the problem no longer occurs.	2023-07-12 14:19:14 -04:00
Nico Weber	117a5f1bd2	LibPDF: Remove an unused variable	2023-07-12 19:02:56 +02:00
Nico Weber	323d76fbb9	LibPDF: Make encrypted object streams work There were two problems: 1. parse_compressed_object_with_index() parses indirect objects without going through Parser::parse_indirect_value(), so push_reference() / pop_reference() weren't called. Manually call them, both for the indirect object containing the object stream and for the indirect object within the object stream. 2. The indirect object within the object stream got decrypted twice: Once when the object stream data itself got decrypted, and then incorrectly a second time when the object data within the stream was read. To fix, disable encryption while parsing object stream data (since it's already decrypted). The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/ which according to readme.md at the same location is CC0.	2023-07-12 17:16:25 +02:00
Nico Weber	e94f1e38d0	LibPDF: Mark PDF::Error nodiscard No behavior change. Prevents mistakes like the one fixed in 26de2fd0b2.	2023-07-12 17:03:14 +02:00
Nico Weber	5998072f15	LibPDF: Add support for AESV2 encryption	2023-07-12 06:28:15 +02:00
Nico Weber	67d8c8badb	LibPDF: Use more direct method to access linearization dict We know indirect_value_or_error.value contains an IndirectObject, so there's no need to go through resolve(). No behavior change.	2023-07-12 06:28:15 +02:00

1 2 3 4 5 ...

331 commits