serenity

mirror of https://github.com/RGBCube/serenity synced 2025-07-12 07:47:34 +00:00

Author	SHA1	Message	Date
Nico Weber	f54b0e7c22	LibPDF: Don't accidentally put horizontal_scaling in places Fonts should have size font_size times total scaling. We tried to get that by computing text_rendering_matrix.x_scale() * font_size, but text_rendering_matrix.x_scale() also includes horizontal_scaling, which shouldn't be part of font size. Same for character_spacing and word_spacing. This is all a big mess that's caused by LibPDF using ScaledFont, which requires scaling to be aprt of the text type. I have an in-progress local branch that moves LibPDF to directly use VectorFont, which will hopefully make this (and other things) nicer. But first, let's get this right, and then make sure we don't regress it when things change :^)	2024-01-18 14:01:30 +01:00
Nico Weber	abda5e66f6	LibPDF: Scale delta_x by horizontal_scaling in Renderer::show_text() While PDFFont::draw_string() already returns a position scaled by horizontal_scaling, the division by text_rendering_matrix.x_scale() (which also contains the scaling factor) undid it. Reapply it. Fixes the horizontal layout of the line "should be the same on all lines: super" in Tests/LibPDF/text.pdf.	2024-01-18 14:01:30 +01:00
Nico Weber	470d1d8dcf	LibPDF: Fix order of parameter, text, and current transform matrix PDF spec 1.7 5.3.3 Text Space Details gives the correct multiplication order: parameters * textmatrix * ctm. We used to do text * ctm * parameters (AffineTransform::multiply() does left-multiplication). This only matters if `text_state().rise` is non-zero. In practice, it's almost always zero, in which case the paramter matrix is a diagonal matrix that commutes. Fixes the horizontal offset of "super" in Tests/LibPDF/text.pdf.	2024-01-18 14:01:30 +01:00
Nico Weber	6c65c18c40	LibPDF: Add spec ref to Renderer::calculate_text_rendering_matrix()	2024-01-18 14:01:30 +01:00
Nico Weber	13f007aadb	LibPDF: Tweak vertical position of truetype fonts The vertical coordinates for truetype fonts are different somehow. We compensated a bit for that; now we compensate some more. This is still not 100% perfect, but much better than before.	2024-01-17 08:44:07 +00:00
Nico Weber	1845a406ea	LibPDF: Add debug settings for clipping paths and images	2024-01-17 08:42:56 +00:00
Nico Weber	2d8a22f4b4	LibPDF: Clip images too Since we can't clip against a general path yet, this clips images against the bounding box of the current clip path as well. Clips for images are often rectangular, so this works out well. (We wastefully still decode and color-convert the entire image. In a follow-up, we could consider only converting the unclipped part.)	2024-01-17 08:42:56 +00:00
Nico Weber	5615a2691a	LibPDF: Extract activate_clip() / deactivate_clip() functions No behavior change.	2024-01-17 08:42:56 +00:00
MacDue	d55867e563	LibPDF: Fix paths with negatively sized `re` (rect) commands Turns out the width/height in a `re` command can be negative. This results in rectangles with different winding orders. For example, a negative width results in a reversed winding order. Previously, this was lost by passing the rect through an `AffineTransform` before constructing the path. So instead, this constructs the rect path, and then transforms the resulting path.	2024-01-16 21:31:20 +00:00
Nico Weber	0e91682283	LibPDF: Be more forgiving about trailing image data The predictor code assumed that all stream data is image data (...which would make sense: trailing data there is wasted space). But some PDFs have trailing data there, e.g. 0000257.pdf, so be forgiving about it.	2024-01-16 09:55:11 -05:00
Nico Weber	b34509edd2	LibPDF: Make `pdf --dump-contents` handle \r line endings better Previously, all page contents ended up overprinting a single line over and over for PDFs that used only `\r` as line ending. This is for example useful for 0000364.pdf.	2024-01-15 23:16:45 -07:00
Nico Weber	9f9dbb325b	LibPDF: Make prediction filters error on user-controlled alloc OOM	2024-01-15 23:06:06 -07:00
Nico Weber	93f5420282	LibPDF: Start implementing the TIFF predictor This codepath is separate from the predictor in the TIFF decoder. The TIFF decoder currently does bits->Color conversion before processing the predictor. That doesn't fit the PDF model where filters are processed before converting streams into bitmaps. If this code here ever grows to handle all cases, maybe we can move it over to the TIFF decoder and then make it do predictions before decoding to colors, to share this code. (TIFF prediction is pretty messy since it's bits-per-pixel-dependent. PNG prediction is always byte-based, which makes things easier.)	2024-01-15 23:06:06 -07:00
Nico Weber	9a93f677f4	LibPDF: Mark text rendering matrix as dirty after TJ numbers Mostly because I audited all places that assigned to `m_text_matrix` after #22760. This one is very difficult to trigger in practice. `show_text()` marks the text rendering matrix dirty already, so this only has an effect if the `TJ` array starts with a number, and the matrix isn't marked dirty going in. `Tm` caches the text rendering matrix, so I changed text.pdf to contain: ``` 1 0 0 1 45 130 Tm [ 200 (Hello) -2000 (World) ] TJ T* ``` This first sets an x offset of 5 (on top of the normal 40), and then undoes it (`200` is multiplied by font size (25) / -1000, and `200 * 25 / -1000` is -5). Before this change, the topmost "Hello World" ended up slightly indented. Likely no behavior change in practice, but makes the code easier to understand, and maybe it helps in the wild somewhere.	2024-01-15 08:39:04 +00:00
Nico Weber	f23f5dcd62	LibPDF: Mark text rendering matrix dirty for Td operator 0000342.pdf page 5 contains this snippet: ``` /T1_1 10.976 Tf 0 -31.643 TD (This)Tj 1 0 0 1 54 745.563 Tm 22.181 -31.643 Td [(vehicle)-270.926(uses)... ``` The `Tm` marked the text rendering matrix as dirty at the start, but it then calls calculate_text_rendering_matrix() almost in the next line, which recalculates the text rendering matrix and caches the new matrix. The `Td` used to not mark it as dirty, and we'd draw "vehicle" with an incorrect matrix.	2024-01-15 08:37:55 +00:00
Nico Weber	f4ee9a2333	LibPDF: Support drawing images with 16 bits per channel This uses the tried-and-true "throw away the lower 8 bits" technique for now. This lets us render Tests/LibPDF/wide-gamut-only.pdf.	2024-01-12 16:20:46 -07:00
Nico Weber	5f85aff036	LibPDF: Move ColorSpace::style() to take ReadonlySpan<float> All ColorSpace subclasses converted to float anyways, and this allows us to save lots of float->Value->float conversions during image color space processing. A bit faster: ``` N Min Max Median Avg Stddev x 50 0.99054313 1.0412271 0.99933481 1.0052408 0.012931916 + 50 0.97073889 1.0075941 0.97849107 0.98184034 0.0090329046 Difference at 95.0% confidence -0.0234004 +/- 0.00442595 -2.32785% +/- 0.440287% (Student's t, pooled s = 0.0111541) ```	2024-01-12 12:37:56 +00:00
Nico Weber	56a4af8d03	LibPDF: Don't reallocate Vectors in ICCBasedColorSpace all the time Microoptimization; according to ministat a bit faster: ``` N Min Max Median Avg Stddev x 50 1.0179932 1.0561159 1.0315337 1.0333617 0.0094757426 + 50 1.000875 1.0427601 1.0208509 1.0201902 0.01066116 Difference at 95.0% confidence -0.0131715 +/- 0.00400208 -1.27463% +/- 0.387287% (Student's t, pooled s = 0.0100859) ```	2024-01-12 12:37:56 +00:00
Nico Weber	cfd05b1a55	LibPDF: Use MatrixMatrixConversion when possible Reduces time spent rendering page 3 of 0000849.pdf from 1.32s to 1.13s on my machine. Also reduces the time to run Meta/test_pdf.py on 0000.zip (without 0000849.pdf) from 56s to 54s.	2024-01-12 09:09:56 +01:00
Nico Weber	c161b2d2f9	LibPDF: Extract ICCBasedColorSpace::sRGB() helper	2024-01-12 09:09:56 +01:00
Nico Weber	f7fc2df8ac	LibPDF: Simplify load_image() a tiny bit Images can't use Pattern color spaces, so we'll always have a Color. No behavior (or perf) change.	2024-01-10 23:26:57 +01:00
Nico Weber	df5451a889	LibPDF: Mark text rendering matrix dirty after changing it in text_begin A certain PDF was drawing some text used `9 0 0 9 474.54 700.6801 Tm` to set the text matrix to a matrix that scaled by 9 in one text object. Then, after ending that text object, it had the following new text object which contained nothing that invalidated the text matrix: ``` BT /F1 7 Tf /DeviceRGB CS 0 0 0 SC 10 TL 86.37849 21.908 Td (Authorized licensed use limited to: ...) Tj ET ``` `BT` did reset it as required, but since we didn't mark the matrix as dirty, we never recomputed it and drew the additional text scaled up 9x.	2024-01-10 19:42:08 +01:00
Nico Weber	4fd5d450be	LibPDF: Add support for image masks An image mask is a 1-bit-per-pixel bitmap that's black where the current color should be painted, and white where it should be transparent (think: like ink). load_image() already converts images like this into 8-bit-per-pixel images that have 0xff, 0xff, 0xff in rgb for opaque (originally 0 bit) pixels and 0, 0, 0 in rgb for transparent pixels. So we just move copy the image mask's image data into the alpha channel and replace rgb with the current color, and then draw it like a regular bitmap.	2024-01-10 09:10:11 +00:00
Nico Weber	e770cf06b0	LibPDF: Send jpeg data down the same path as all other data JPEG images now honor decode arrays and color spaces.	2024-01-10 09:39:00 +01:00
Nico Weber	f157cd50a1	LibPDF: Use mix() in SampledFunction::evaluate() No behavior change.	2024-01-04 21:12:23 +01:00
Nico Weber	e16345555b	LibPDF: Port 59b50fa43f8c2 to xref and object streams 0000440.pdf contains an xref stream object (at offset 3643676) starting: ``` 294 0 obj << /Type /XRef /Index [0 295] /Size 295 ``` and an object stream object (at offset 3640121) starting: ``` 230 0 obj << /Type /ObjStm /N 73 /First 614 ``` In both cases, the `obj` and the `<<` are separated by non-newline whitespace. `633e1632d0` made parse_indirect_value() tolerate this, but it didn't update neither parse_xref_stream() (which parses xref streams) nor parse_compressed_object_with_index() (which parses object streams), despite all three changes being part of #14873. Make parse_xref_stream() and parse_compressed_object_with_index() call parse_indirect_value() to pick up the fix over there. It's a bit less code too. (0000440.pdf is the only PDF in my 1000 test PDFs that this helps, somewhat surprisingly.)	2024-01-04 11:27:24 +01:00
Nico Weber	9d69c5d434	LibPDF: Tolerate trailing whitespace after %%EOF marker At first I tried implmenting the quirk from PDF 1.7 Appendix H, 3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file."" This would've been like #22548 but at end-of-file instead of at start-of-file. This helped a bunch of files, but also broke a bunch of files that made more than 1024 bytes of stuff at the end, and it wouldn't have helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF. So just tolerate whitespace after the %%EOF line, and keep ignoring and arbitrary amount of other stuff after that like before. This helps: * 0000599.pdf One trailing \0 byte after %%EOF. Due to that byte, the is_linearized() check fails and we go down the non-linearized codepath. But with this fix, that code path succeeds. * 0000937.pdf Same. * 0000055.pdf Has one space followed by a \n after %%EOF * 0000059.pdf Has over 40kB of trailing \0 bytes The following files keep working with it: * 0000242.pdf 5586 bytes of trailing HTML * 0000336.pdf 5586 bytes of trailing HTML fragment * 0000136.pdf 2054 bytes of trailing space characters This one kind of only worked by accident before since it found the %%EOF block before the final %%EOF block. Maybe this is even an intentional XRefStm compat hack? Anyways, now it find the final block instead. * 0000327.pdf 11044 bytes of trailing HTML	2024-01-04 11:19:15 +01:00
Nico Weber	2d12647e29	LibPDF: Add FIXME for "was linearized PDF incrementally updated" check It's pretty tricky to do, and also tricky with respect to skipping trailing bytes after %%EOF: The check requires knowning the full size of the PDF (which means web servers not sending content lengths are out), but that size has to be after stripping trailing bytes, which normal static file servers won't do. So PDF viewers would have to download the last couple bytes of the PDF unconditionally, then strip trailing bytes and use the count to figure out the final actual PDF size. Luckily, we don't incrementally download PDFs from the net but instead require all data to be available in one chunk, so it's not currently a problem.	2024-01-04 11:19:15 +01:00
Nico Weber	1b45c3e127	LibPDF: Tolerate whitespace after `xref` and `startxref` The spec isn't super clear on if this is allowed: """Each cross-reference section shall begin with a line containing the keyword xref. Following this line...""" """The two preceding lines shall contain, one per line and in order, the keyword startxref and...""" It kind of sounds like anything goes on both lines as long as they contain `xref` and `startxref`. In practice, both seem to always occur at the start of their line, but in 0000780.pdf (and nowhere else), there's one space after each keyword before the following linebreak, and this makes that file load.	2024-01-04 10:14:30 +01:00
Nico Weber	efb37f7252	LibPDF: Add Reader::consume_non_eol_whitespace()	2024-01-04 10:14:30 +01:00
Nico Weber	c59e08123b	LibPDF: Add a FIXME and a spec comment to Encoding::from_object()	2024-01-04 10:12:11 +01:00
Nico Weber	ad5fc0eda1	LibPDF: An Encoding's /Differences entry is optional Per "TABLE 5.11 Entries in an encoding dictionary", /Differences is optional. (Per "Encodings for TrueType Fonts" in 5.5.5 Character Encoding, nonsymbolic truetype fonts are even recommended to have "no Differences array." But in practice, most seem to have it.) Fixes crashes on: * 0000001.pdf * 0000574.pdf * 0000337.pdf All three don't render super great, but at least they no longer crash.	2024-01-04 10:12:11 +01:00
Nico Weber	0bb0c7dac2	LibPDF: Scan for PDF file start in first 1024 bytes Other readers do this too, and files depend on this. Fixes opening these four files from the PDFA 0000.zip dataset: * 0000015.pdf Starts with `C:\web\webeuncet\_cat\_docs\_publics\` before header * 0000408.pdf Starts with UTF-8 BOM * 0000524.pdf Starts with 867 bytes of HTML containing a PHP backtrace * 0000680.pdf Starts with `C:\web\webeuncet\_cat\_docs\_publics\` too	2024-01-03 10:12:35 +01:00
Nico Weber	9495f64f91	LibPDF: Improve hex string parsing A local (non-public) PDF I have lying around contains this in a page's operator stream: ``` [<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29 <00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31 <0057004b> 4 <00480003> -37 <0050 >] TJ ``` That is, there's a newline in a hexstring after a character. This led to `Parser error at offset 5184: Unexpected character`. The spec says in 3.2.3 String Objects, Hexadecimal Strings: """Each pair of hexadecimal digits defines one byte of the string. White-space characters (such as space, tab, carriage return, line feed, and form feed) are ignored.""" But we didn't ignore whitespace before or after a character, only in between the bytes. The spec also says: """If the final digit of a hexadecimal string is missing—that is, if there is an odd number of digits—the final digit is assumed to be 0.""" In that case, we were skipping the closing `>` twice -- or, more accurately, we ignored the character after it too. This has been wrong all the way back in #6974. Add a test that fails if either of the two changes isn't present.	2024-01-02 22:13:21 +01:00
Lucas CHOLLET	f389c1cdba	LibGfx+LibPDF: Use LibCompress' implementation of the PackBits decoder No need to have these three copies :^)	2023-12-27 17:40:11 +01:00
Shannon Booth	e2e7c4d574	Everywhere: Use to_number<T> instead of to_{int,uint,float,double} In a bunch of cases, this actually ends up simplifying the code as to_number will handle something such as: ``` Optional<I> opt; if constexpr (IsSigned<I>) opt = view.to_int<I>(); else opt = view.to_uint<I>(); ``` For us. The main goal here however is to have a single generic number conversion API between all of the String classes.	2023-12-23 20:41:07 +01:00
Nico Weber	b63eb4a4dd	LibPDF: Implement /Mask support with stream object argument	2023-12-23 20:39:11 +01:00
Nico Weber	a3507ef65b	LibPDF: Move error for /ImageMask out of load_image() ...and tweak load_image() to support loading mask images (which don't have a color space and are always 1 bit per pixel).	2023-12-23 20:39:11 +01:00
Nico Weber	3ad9782e25	LibPDF: Extract a apply_alpha_channel() function No behavior change.	2023-12-23 20:39:11 +01:00
Nico Weber	4bd11c8eb4	LibPDF: Show a 'rendering unsupported' error for images with /Mask key	2023-12-23 20:39:11 +01:00
Nico Weber	387fecea7f	LibPDF: Fix typo in a variable name No behavior change.	2023-12-23 10:10:24 +01:00
Nico Weber	6723552e95	LibPDF: Add a spec comment and remove a FIXME I think the ASCIIHexDecode / ASCII85Decode unfilter functions handle what this FIXME was about already.	2023-12-22 10:58:54 +01:00
Nico Weber	3d07684891	LibPDF: Extract Parser::parse_inline_image() Pure code move, no intended behavior change. The motivation is just to make Parser::parse_operators() less nested and more focused.	2023-12-22 10:58:54 +01:00
Nico Weber	6032c06f6b	Revert "LibPDF: Add basic tiled, coloured pattern rendering" This reverts commit `8ff87911a3`.	2023-12-21 19:24:56 +01:00
Nico Weber	7cb216c95b	Revert "LibPDF: Offset PaintStyle when painting so pattern overlaps..." This reverts commit `8c7fc4fe6c`.	2023-12-21 19:24:56 +01:00
Nico Weber	6de32e5359	LibPDF: Draw inline images The idea is to massage the inline image data into something that looks like a regular image, and then use the normal image drawing code: We translate the inline image abbreviations to the expanded version at rendering time, then unfilter (i.e. uncompress) the image data at rendering time, and the go down the usual image drawing path. Normal streams are unfiltered when they're first accessed, but inline image streams live in a page's drawing operators, and this fits the current approach of parsing a page's operators anew every time the page is rendered. (We also need to add some special-case handling for color spaces of inline images: Inline images can use named color spaces, while regular images always use direct color space objects.)	2023-12-20 12:45:16 -07:00
Nico Weber	d577d181e3	LibPDF: Clamp linear_srgb values in convert_to_srgb() This is very crude gamut mapping, but it's better than producing NaNs when passing negative values to powf(x, 1/2.2).	2023-12-20 12:45:07 +01:00
Nico Weber	022fce75a6	LibPDF: Get inline image data from parser to renderer We create a inline_image_end operator that has all the relevant data in a synthetic StreamObject. inline_image_end is still a RENDERER_TODO(), so no real behavior change. (Previously we'd call only inline_image_begin, so string the todo message is about is now a bit different. But no interesting behavior change.)	2023-12-20 12:19:08 +01:00
Nico Weber	3285502ec6	LibPDF: Extract a Parser::unfilter_stream() method No behavior change.	2023-12-20 12:19:08 +01:00
Nico Weber	b21f867e88	LibPDF: Don't crash on images with empty filter arrays 0000967.pdf page 2 contains a bunch of inline images with empty filter arrays.	2023-12-20 12:19:08 +01:00

1 2 3 4 5 ...

568 commits