serenity

mirror of https://github.com/RGBCube/serenity synced 2025-05-31 19:48:12 +00:00

Author	SHA1	Message	Date
Nico Weber	c161b2d2f9	LibPDF: Extract ICCBasedColorSpace::sRGB() helper	2024-01-12 09:09:56 +01:00
Nico Weber	f7fc2df8ac	LibPDF: Simplify load_image() a tiny bit Images can't use Pattern color spaces, so we'll always have a Color. No behavior (or perf) change.	2024-01-10 23:26:57 +01:00
Nico Weber	df5451a889	LibPDF: Mark text rendering matrix dirty after changing it in text_begin A certain PDF was drawing some text used `9 0 0 9 474.54 700.6801 Tm` to set the text matrix to a matrix that scaled by 9 in one text object. Then, after ending that text object, it had the following new text object which contained nothing that invalidated the text matrix: ``` BT /F1 7 Tf /DeviceRGB CS 0 0 0 SC 10 TL 86.37849 21.908 Td (Authorized licensed use limited to: ...) Tj ET ``` `BT` did reset it as required, but since we didn't mark the matrix as dirty, we never recomputed it and drew the additional text scaled up 9x.	2024-01-10 19:42:08 +01:00
Nico Weber	4fd5d450be	LibPDF: Add support for image masks An image mask is a 1-bit-per-pixel bitmap that's black where the current color should be painted, and white where it should be transparent (think: like ink). load_image() already converts images like this into 8-bit-per-pixel images that have 0xff, 0xff, 0xff in rgb for opaque (originally 0 bit) pixels and 0, 0, 0 in rgb for transparent pixels. So we just move copy the image mask's image data into the alpha channel and replace rgb with the current color, and then draw it like a regular bitmap.	2024-01-10 09:10:11 +00:00
Nico Weber	e770cf06b0	LibPDF: Send jpeg data down the same path as all other data JPEG images now honor decode arrays and color spaces.	2024-01-10 09:39:00 +01:00
Nico Weber	f157cd50a1	LibPDF: Use mix() in SampledFunction::evaluate() No behavior change.	2024-01-04 21:12:23 +01:00
Nico Weber	e16345555b	LibPDF: Port 59b50fa43f8c2 to xref and object streams 0000440.pdf contains an xref stream object (at offset 3643676) starting: ``` 294 0 obj << /Type /XRef /Index [0 295] /Size 295 ``` and an object stream object (at offset 3640121) starting: ``` 230 0 obj << /Type /ObjStm /N 73 /First 614 ``` In both cases, the `obj` and the `<<` are separated by non-newline whitespace. `633e1632d0` made parse_indirect_value() tolerate this, but it didn't update neither parse_xref_stream() (which parses xref streams) nor parse_compressed_object_with_index() (which parses object streams), despite all three changes being part of #14873. Make parse_xref_stream() and parse_compressed_object_with_index() call parse_indirect_value() to pick up the fix over there. It's a bit less code too. (0000440.pdf is the only PDF in my 1000 test PDFs that this helps, somewhat surprisingly.)	2024-01-04 11:27:24 +01:00
Nico Weber	9d69c5d434	LibPDF: Tolerate trailing whitespace after %%EOF marker At first I tried implmenting the quirk from PDF 1.7 Appendix H, 3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file."" This would've been like #22548 but at end-of-file instead of at start-of-file. This helped a bunch of files, but also broke a bunch of files that made more than 1024 bytes of stuff at the end, and it wouldn't have helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF. So just tolerate whitespace after the %%EOF line, and keep ignoring and arbitrary amount of other stuff after that like before. This helps: * 0000599.pdf One trailing \0 byte after %%EOF. Due to that byte, the is_linearized() check fails and we go down the non-linearized codepath. But with this fix, that code path succeeds. * 0000937.pdf Same. * 0000055.pdf Has one space followed by a \n after %%EOF * 0000059.pdf Has over 40kB of trailing \0 bytes The following files keep working with it: * 0000242.pdf 5586 bytes of trailing HTML * 0000336.pdf 5586 bytes of trailing HTML fragment * 0000136.pdf 2054 bytes of trailing space characters This one kind of only worked by accident before since it found the %%EOF block before the final %%EOF block. Maybe this is even an intentional XRefStm compat hack? Anyways, now it find the final block instead. * 0000327.pdf 11044 bytes of trailing HTML	2024-01-04 11:19:15 +01:00
Nico Weber	2d12647e29	LibPDF: Add FIXME for "was linearized PDF incrementally updated" check It's pretty tricky to do, and also tricky with respect to skipping trailing bytes after %%EOF: The check requires knowning the full size of the PDF (which means web servers not sending content lengths are out), but that size has to be after stripping trailing bytes, which normal static file servers won't do. So PDF viewers would have to download the last couple bytes of the PDF unconditionally, then strip trailing bytes and use the count to figure out the final actual PDF size. Luckily, we don't incrementally download PDFs from the net but instead require all data to be available in one chunk, so it's not currently a problem.	2024-01-04 11:19:15 +01:00
Nico Weber	1b45c3e127	LibPDF: Tolerate whitespace after `xref` and `startxref` The spec isn't super clear on if this is allowed: """Each cross-reference section shall begin with a line containing the keyword xref. Following this line...""" """The two preceding lines shall contain, one per line and in order, the keyword startxref and...""" It kind of sounds like anything goes on both lines as long as they contain `xref` and `startxref`. In practice, both seem to always occur at the start of their line, but in 0000780.pdf (and nowhere else), there's one space after each keyword before the following linebreak, and this makes that file load.	2024-01-04 10:14:30 +01:00
Nico Weber	efb37f7252	LibPDF: Add Reader::consume_non_eol_whitespace()	2024-01-04 10:14:30 +01:00
Nico Weber	c59e08123b	LibPDF: Add a FIXME and a spec comment to Encoding::from_object()	2024-01-04 10:12:11 +01:00
Nico Weber	ad5fc0eda1	LibPDF: An Encoding's /Differences entry is optional Per "TABLE 5.11 Entries in an encoding dictionary", /Differences is optional. (Per "Encodings for TrueType Fonts" in 5.5.5 Character Encoding, nonsymbolic truetype fonts are even recommended to have "no Differences array." But in practice, most seem to have it.) Fixes crashes on: * 0000001.pdf * 0000574.pdf * 0000337.pdf All three don't render super great, but at least they no longer crash.	2024-01-04 10:12:11 +01:00
Nico Weber	0bb0c7dac2	LibPDF: Scan for PDF file start in first 1024 bytes Other readers do this too, and files depend on this. Fixes opening these four files from the PDFA 0000.zip dataset: * 0000015.pdf Starts with `C:\web\webeuncet\_cat\_docs\_publics\` before header * 0000408.pdf Starts with UTF-8 BOM * 0000524.pdf Starts with 867 bytes of HTML containing a PHP backtrace * 0000680.pdf Starts with `C:\web\webeuncet\_cat\_docs\_publics\` too	2024-01-03 10:12:35 +01:00
Nico Weber	9495f64f91	LibPDF: Improve hex string parsing A local (non-public) PDF I have lying around contains this in a page's operator stream: ``` [<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29 <00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31 <0057004b> 4 <00480003> -37 <0050 >] TJ ``` That is, there's a newline in a hexstring after a character. This led to `Parser error at offset 5184: Unexpected character`. The spec says in 3.2.3 String Objects, Hexadecimal Strings: """Each pair of hexadecimal digits defines one byte of the string. White-space characters (such as space, tab, carriage return, line feed, and form feed) are ignored.""" But we didn't ignore whitespace before or after a character, only in between the bytes. The spec also says: """If the final digit of a hexadecimal string is missing—that is, if there is an odd number of digits—the final digit is assumed to be 0.""" In that case, we were skipping the closing `>` twice -- or, more accurately, we ignored the character after it too. This has been wrong all the way back in #6974. Add a test that fails if either of the two changes isn't present.	2024-01-02 22:13:21 +01:00
Lucas CHOLLET	f389c1cdba	LibGfx+LibPDF: Use LibCompress' implementation of the PackBits decoder No need to have these three copies :^)	2023-12-27 17:40:11 +01:00
Shannon Booth	e2e7c4d574	Everywhere: Use to_number<T> instead of to_{int,uint,float,double} In a bunch of cases, this actually ends up simplifying the code as to_number will handle something such as: ``` Optional<I> opt; if constexpr (IsSigned<I>) opt = view.to_int<I>(); else opt = view.to_uint<I>(); ``` For us. The main goal here however is to have a single generic number conversion API between all of the String classes.	2023-12-23 20:41:07 +01:00
Nico Weber	b63eb4a4dd	LibPDF: Implement /Mask support with stream object argument	2023-12-23 20:39:11 +01:00
Nico Weber	a3507ef65b	LibPDF: Move error for /ImageMask out of load_image() ...and tweak load_image() to support loading mask images (which don't have a color space and are always 1 bit per pixel).	2023-12-23 20:39:11 +01:00
Nico Weber	3ad9782e25	LibPDF: Extract a apply_alpha_channel() function No behavior change.	2023-12-23 20:39:11 +01:00
Nico Weber	4bd11c8eb4	LibPDF: Show a 'rendering unsupported' error for images with /Mask key	2023-12-23 20:39:11 +01:00
Nico Weber	387fecea7f	LibPDF: Fix typo in a variable name No behavior change.	2023-12-23 10:10:24 +01:00
Nico Weber	6723552e95	LibPDF: Add a spec comment and remove a FIXME I think the ASCIIHexDecode / ASCII85Decode unfilter functions handle what this FIXME was about already.	2023-12-22 10:58:54 +01:00
Nico Weber	3d07684891	LibPDF: Extract Parser::parse_inline_image() Pure code move, no intended behavior change. The motivation is just to make Parser::parse_operators() less nested and more focused.	2023-12-22 10:58:54 +01:00
Nico Weber	6032c06f6b	Revert "LibPDF: Add basic tiled, coloured pattern rendering" This reverts commit `8ff87911a3`.	2023-12-21 19:24:56 +01:00
Nico Weber	7cb216c95b	Revert "LibPDF: Offset PaintStyle when painting so pattern overlaps..." This reverts commit `8c7fc4fe6c`.	2023-12-21 19:24:56 +01:00
Nico Weber	6de32e5359	LibPDF: Draw inline images The idea is to massage the inline image data into something that looks like a regular image, and then use the normal image drawing code: We translate the inline image abbreviations to the expanded version at rendering time, then unfilter (i.e. uncompress) the image data at rendering time, and the go down the usual image drawing path. Normal streams are unfiltered when they're first accessed, but inline image streams live in a page's drawing operators, and this fits the current approach of parsing a page's operators anew every time the page is rendered. (We also need to add some special-case handling for color spaces of inline images: Inline images can use named color spaces, while regular images always use direct color space objects.)	2023-12-20 12:45:16 -07:00
Nico Weber	d577d181e3	LibPDF: Clamp linear_srgb values in convert_to_srgb() This is very crude gamut mapping, but it's better than producing NaNs when passing negative values to powf(x, 1/2.2).	2023-12-20 12:45:07 +01:00
Nico Weber	022fce75a6	LibPDF: Get inline image data from parser to renderer We create a inline_image_end operator that has all the relevant data in a synthetic StreamObject. inline_image_end is still a RENDERER_TODO(), so no real behavior change. (Previously we'd call only inline_image_begin, so string the todo message is about is now a bit different. But no interesting behavior change.)	2023-12-20 12:19:08 +01:00
Nico Weber	3285502ec6	LibPDF: Extract a Parser::unfilter_stream() method No behavior change.	2023-12-20 12:19:08 +01:00
Nico Weber	b21f867e88	LibPDF: Don't crash on images with empty filter arrays 0000967.pdf page 2 contains a bunch of inline images with empty filter arrays.	2023-12-20 12:19:08 +01:00
Nico Weber	13641693cb	LibPDF: Use make_object<>() to make objects No behavior change.	2023-12-20 12:19:08 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Nico Weber	f2f07c3a80	LibPDF: Replace `if (a) VERIFY(0)` with `VERIFY(!a)` No behavior change.	2023-12-16 12:39:56 +01:00
Nico Weber	ee74bc2538	LibPDF: Tolerate 0-sized Subrs in PS1 font subprograms This regressed in `2b3a41be74` in #18031. Fixes a crash rendering page 2 and onward of https://pyx-project.org/presentation_dantemv35_en.pdf	2023-12-16 12:39:56 +01:00
Nico Weber	11354dbf9e	LibPDF: Remember inline image stream bytes We still don't process inline images, but now we have the pieces we need for doing it (`map` and `stream_bytes`).	2023-12-11 10:50:39 +01:00
Nico Weber	cabc6a9d80	LibPDF: Add a comment that PDF 2.0 added a length key for inline images In practice, basically no file has it, since it was only added in 2.0, and 1.7 explicitly said "in particular, the Type, Subtype, and Length entries normally found in a stream or image dictionary are unnecessary."	2023-12-11 10:50:39 +01:00
Nico Weber	071f890847	LibPDF: Require whitespace in front of inline image marker EI Fixes a crash on page 3 of 0000450.pdf of 0000.zip, where we previously started interpreting the middle of an inline image content stream as operators, since it contained `EI` in its pixel data.	2023-12-11 10:50:39 +01:00
Nico Weber	27aae7e2b1	LibPDF: Parse inline image key-value pairs Not used for anything yet.	2023-12-11 10:50:39 +01:00
Nico Weber	0912896ae0	LibPDF: Extract Parser::parse_dict_contents_until() No behavior change.	2023-12-11 10:50:39 +01:00
Kyle Pereira	8c7fc4fe6c	LibPDF: Offset PaintStyle when painting so pattern overlaps properly	2023-12-10 16:44:24 +01:00
Kyle Pereira	8ff87911a3	LibPDF: Add basic tiled, coloured pattern rendering	2023-12-10 16:44:24 +01:00
Kyle Pereira	8191f2b47a	LibPDF: Add parameter for background color of render	2023-12-10 16:44:24 +01:00
Kyle Pereira	60c4803dd3	LibPDF: Pass Renderer to ColorSpace	2023-12-10 16:44:24 +01:00
Kyle Pereira	082a4197b6	LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces This is in anticipation of Pattern color space support which does not yield a simple color.	2023-12-10 16:44:24 +01:00
Kyle Pereira	e4b8d68039	LibPDF: Permit comments at the end of a stream	2023-12-10 16:44:24 +01:00
Nico Weber	8b50b689f9	LibPDF: Reject invalid "hival" values Doesn't fire on any of the PDFs I have, and seems like a good thing to check.	2023-12-07 08:10:40 +00:00
Nico Weber	43cd3d7dbd	LibPDF: Tolerate palettes that are one byte too long Fixes these errors from `Meta/test_pdf.py path/to/0000`, with 0000 being 0000.zip from the PDF/A corpus in unzipped: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00
Nico Weber	832a065687	LibPDF: For low-bpp images, start scanlines on byte boundaries Required per spec, and we get slanted images without it. Fixes e.g. page 1 of 0000749.pdf.	2023-12-07 08:10:40 +00:00
Nico Weber	06b9633da5	LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors to 8-bit nicely, but is the wrong thing to do for palette indices. Stop doing this for palette indices. Fixes "Indexed color space index out of range" for 11 files in the PDF/A 0000.zip test set now that we correctly handle palette indices as of the previous commit: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00

1 2 3 4 5 ...

549 commits