serenity

mirror of https://github.com/RGBCube/serenity synced 2025-07-03 11:52:13 +00:00

Author	SHA1	Message	Date
Nico Weber	eb4632e08a	LibPDF: Give CFF built-in encoding and charset arrays an underlying type These arrays store SIDs ("String IDs"), so give them that type now that we have to_array() and it's easy to do. No behavior change.	2024-02-14 06:56:43 +01:00
Nico Weber	ddbcd901d1	LibPDF: Separate Type0 CMap errors No behavior change, just more granular "not implemented" diagnostics.	2024-02-13 19:46:31 +01:00
Nico Weber	8e50bbc9fb	LibPDF: Add string drawing code for Type0Fonts This is very similar to SimpleFont::draw_string() for now, but it'll become a bit different when we add support for vertical text. CIDFontType now only needs to draw single glyphs. Neither of the subclasses can do that yet, so no behavior change yet.	2024-02-13 19:46:18 +01:00
Nico Weber	eaa568210f	LibPDF: Split CCITT errors by group	2024-02-13 19:45:47 +01:00
Nico Weber	c201825cc8	LibPDF: Read CCITT decode params We don't do anything with them yet, so no behavior change.	2024-02-13 19:45:47 +01:00
Nico Weber	454a10774e	LibPDF: Let Filter::handle_lzw_and_flate_parameters() read decode params ...instead of reading them in Filter::decode() for all filters and then passing them around to only the LZW and flate filters. (EarlyChange is LZWDecode-only, so that's read there instead.) No behavior change.	2024-02-13 19:45:47 +01:00
Nico Weber	9875ce0c78	LibPDF: Reorder loops in SampledFunction::evaluate() Previously, we'd loop over the index of the output coordinate, for example for a CMYK->RGB function, we'd loop over RGB. For every output index, we'd then sample the function at the CMYK input point. Now, we sample at CMYK once and return a span for all outputs, since they're stored in contiguous memory. And we then loop over the outputs only to do weighting and mapping to the target range at the end. Reduces the runtime of (cd Tests/LibPDF; \ ../../Build/lagom/bin/BenchmarkPDF --benchmark_repetitions 5) from 235.6±2.3ms to 103.2±3.3ms on my system, and makes SampledFunction::evaluate() more similar to lerp_nd() in TagTypes.h.	2024-02-13 19:45:19 +01:00
Nico Weber	751185cb76	LibPDF: Scale default glyph width by font size and x scale This fixes rendering of commas in 0000941.pdf page 1. The commas use the default width, and without this they show up very large, covering the page. Also, it's nice that the code now looks like the regular case 4 lines further up.	2024-02-12 14:32:04 +00:00
Nico Weber	7ab4e53b99	LibPDF/CFF: Add code for fdselect parsing This is one of the two top dict entries we need for CID-keyed fonts. We don't send any CID-keyed font data into the CFF parser yet, so no behavior change.	2024-02-12 14:05:16 +01:00
Nico Weber	6ebddab448	LibPDF/CFF: Add enum values for CID-keyed font top dict entries No behavior change.	2024-02-12 14:05:16 +01:00
Nico Weber	6df0150671	LibPDF: Add some CIDFontType0C scaffolding No real behavior change. We don't actually load the CFF data yet (blocked on #23136 and some more), and we don't have drawing code yet, and Type0Font::draw_string() doesn't do any drawing yet. But it's a step in the right direction.	2024-02-12 13:59:00 +01:00
Nico Weber	8e7cb11856	LibPDF/CFF: Add enum values for remaining PrivDictOperators No behavior change, except that we now dbgln() if we see a PrivDictOperator we don't know about. (I haven't seen this in practice, but I found this useful while debugging things.)	2024-02-11 14:52:54 +01:00
Nico Weber	a91fecb17e	Revert "LibPDF: Don't over-read in charset formats 1 and 2" This reverts commit `52afa936c4`. No longer necessary after #23122 -- turns out things work better when you do them right. No behavior change.	2024-02-09 16:52:01 +00:00
Nico Weber	9bccb8c8d7	LibPDF: Make CFF::parse_charset() return SIDs ...and do string expansion at the call site. CID-keyed fonts treat the charset as CIDs instead of as SIDs, so having access to the SIDs in numberic form will be useful when we implement support for CID-keyed CFF fonts. No behavior change.	2024-02-09 13:57:23 +01:00
Nico Weber	9750261921	LibPDF: Rename charset to charset_names in CFF parser No behavior change.	2024-02-09 13:57:23 +01:00
Nico Weber	32f601f9a4	LibPDF: Fix small bug from #21452 I implemented CFF charset format 2 in `6f783929dd` with the note "I haven't seen this being used in the wild". Now that I have seen it (0000658.pdf), I can say that this has never worked, despite me claiming "it's easy to implement". But now it works!	2024-02-08 13:48:56 +00:00
Nico Weber	9fc47345ce	LibGfx+LibPDF: Make sample() functions take ReadonlySpan<> ...instead of Vector<>. No behavior (or performance) change.	2024-02-06 08:44:53 +01:00
Nico Weber	92a628c07c	LibPDF: Always treat `/Subtype /Image` as binary data when dumping Sometimes, the "is mostly text" heuristic fails for images. Before: Build/lagom/bin/pdf --render out.png ~/Downloads/0000/0000521.pdf \ --page 10 --dump-contents 2>&1 \| wc -l 25709 After: Build/lagom/bin/pdf --render out.png ~/Downloads/0000/0000521.pdf \ --page 10 --dump-contents 2>&1 \| wc -l 11376	2024-02-05 21:18:19 -05:00
Nico Weber	f562c470e2	LibGfx+LibPDF: Simpler and faster N-D linear sampling Previously, if we wanted to to e.g. do linear interpolation in 2-D, we'd get a sample point like (1.3, 4.4), then get 4 samples around it at (1, 4), (2, 4), (1, 5), (2, 5), then reduce the 4 samples to 2 samples by computing the combined samples `0.3 * f(1, 4) + 0.7 * f(2, 4)` and `0.3 * f(1, 5) + 0.8 * f(2, 5)`, and then 1-D linearly blending between these two samples with the factor 0.4. In the end we'd multiply the first value by 0.3 * 0.4, the second by 0.7 * 0.4, the third by 0.3 * 0.6, and the third by 0.7 * 0.6, and then sum them all up. This requires computing and storing 2N samples, followed by another 2N iterations to combine the 2N sampls to a single value. (N is in practice either 4 or 3, so 2N isn't super huge.) Instead, for every sample we can directly compute the product of weights and sum them up directly. This lets us omit the second loop and storing 2**N values, in exchange for doing an additional O(n) work to compute the product. Takes Build/lagom/bin/image --no-output --invert-cmyk \ --assign-color-profile \ Build/lagom/Root/res/icc/Adobe/CMYK/USWebCoatedSWOP.icc \ --convert-to-color-profile serenity-sRGB.icc \ cmyk.jpg form 3.42s to 3.08s on my machine, almost 10% faster (and less code). Here cmyk.jpg is a 2253x3080 cmyk jpeg, and USWebCoatedSWOP.icc is an mft2 profile with input tables with 256 samples and a 9x9x9x9 CLUT. The LibPDF change is covered by TEST_CASE(sampled) in LibPDF.cpp, and the LibGfx change is basically the same change as the one in LibPDF (where the test results don't change) and the output subjectively looks identical. So hopefully this causes indeed no behavior change :^)	2024-02-04 21:49:23 +01:00
Nico Weber	955d73657e	LibPDF: Make `pdf --dump-contents` dump less binary data For pages containing images or embedded fonts, --dump-contents used to dump a ton of binary data. That isn't very useful, so stop doing it. Before: % time Build/lagom/bin/pdf --render out.png \ ~/Downloads/0000/0000711.pdf --dump-contents \| wc -l 937972 Now: % time Build/lagom/bin/pdf --render out.png \ ~/Downloads/0000/0000711.pdf --dump-contents \| wc -l 6566 Printing 7k lines is also much faster than printing 940k, 0.15s instead of 2s.	2024-02-03 08:26:29 +00:00
Nico Weber	9c762b9650	LibPDF+Meta: Use a CMYK ICC profile to convert CMYK to RGB CMYK data describes which inks a printer should use to print a color. If a screen should display a color that's supposed to look similar to what the printer produces, it results in a color very different to what Color::from_cmyk() produces. (It's also printer-dependent.) There are many ICC profiles describing printing processes. It doesn't matter too much which one we use -- most of them look somewhat similar, and they all look dramatically better than Color::from_cmyk(). This patch adds a function to download a zip file that Adobe offers on their web site. They even have a page for redistribution: https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html (That one leads to a broken download though, so this downloads the end-user version.) In case we have to move off this download at some point, there are also a whole bunch of profiles at https://www.color.org/registry/index.xalter that "may be used, embedded, exchanged, and shared without restriction". The adobe zip contains a whole bunch of other useful and fun profiles, so I went with it. For now, this only unzips the USWebCoatedSWOP.icc file though, and installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In Serenity builds, this will make it to /res/icc/Adobe/CMYK in the disk image. And in lagom build, after #23016 this is the lagom res staging directory that tools can install via Core::ResourceImplementation. `pdf` and `MacPDF` already do that, `TestPDF` now does it too. The final piece is that LibPDF then loads the profile from there and uses it for DeviceCMYK color conversions. (Doing file access from the bowels of a library is a bit weird, especially in a system that has sandboxing built in. But LibGfx does that in FontDatabase too already, and LibPDF uses that, so it's not a new problem.)	2024-02-01 13:42:04 -07:00
Nico Weber	f840fb6b4e	LibPDF: Make DeviceCMYKColorSpace::the() fallible No behavior change.	2024-02-01 13:42:04 -07:00
Nico Weber	384c6cf0f9	LibPDF: Tweak vertical position of truetype fonts again See #22821 for a previous attempt. This attempt should settle things once and for all. The opentype render path adjusts by `-font_ascender * -y_scale` in Glyf::Glyph::append_simple_path(), so that's what we need to undo to draw at the font's baseline. (OpenType::Font::metrics() returns ascender scaled by y_scale already, so no need to have the scale here where we undo the shift.) Previously, we called `baseline()` which just returns the font's font size, which is pretty meaningless: https://tonsky.me/blog/font-size/ https://simoncozens.github.io/fonts-and-layout/opentype.html#vertical-metrics-hhea-and-os2 Also, conceptually it makes sense to translate up by the ascender to get from the upper edge of the glyph to the baseline.	2024-02-01 10:05:40 +01:00
Nico Weber	87112dcbdc	LibPDF: Return null for invalid refs, tolerate null objects as outline https://llvm.org/devmtg/2022-11/slides/TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf has an xref table that starts like so: ``` xref 0 214 0000000002 65535 f 0000924663 00000 n 0000000003 00000 f 0000000000 00000 f 0000000016 00000 n 0000000160 00000 n 0000000263 00000 n ``` This is a list of objects in the PDF file. The lines ending with 'f' mean that this object is "free", that is it's not stored in the file. In this file, objects 0, 2, 3 are free. For free objects, the first number is the offset of the next free object: Object 0 refers to object 2, 2 to 3, and 3 back to 0 (since it's the last free object). The lines ending with "n" are actual objects; here the first number is a byte offset to where that object is stored in the file. Furthermore, the file contains ``` /Outlines 2 0 R ``` in its root object, meaning that object 2 stores the page outlines. Since object 2 is set as free, there is no object 2. But the spec says that an invalid object reference is just the null object. This patch makes us return null objects for references to free objects, and it also makes us treat a null object as /Outlines value the same as not having /Outlines in the first place. Fixes #23023 -- we can now open that file. (We don't render it super well, but only for already-known reasons.) Since I found it a bit confusing: XRefTable has two related methods here: 1. has_object() returns if an object was explicitly listed in an xref table. The first number right after `xref` is the start index. So if an xref table were to start with `10`, we'd implicitly create 10 trailing objects for which has_object() would return false 2. is_object_in_use() returns true if an object that was in a table (i.e. one where has_object() returns true) was listed with 'n' and false if it was listed with 'f'. DocumentParser::parse_object_with_index() should probably return a null object for the `!has_object()` case as well instead of VERIFY()ing that has_object() is true. But I haven't seen this in the wild yet, so keeping as-is for now.	2024-01-31 12:10:19 -05:00
Timothy Flynn	aa0a6d58b2	Userland: Remove LibCore dependency from libraries that do not use it	2024-01-22 08:48:34 -05:00
Nico Weber	a0462f495c	LibPDF+MacPDF: Clip text, and add a debug option for disabling it	2024-01-20 08:56:03 +01:00
Nico Weber	90fdf738a1	LibPDF: Alphabetize clip_ fields in RenderingPreferences No behavior change.	2024-01-20 08:56:03 +01:00
Nico Weber	66f8259a0b	LibPDF: Move ClipRAII to .h file No behavior change.	2024-01-20 08:56:03 +01:00
Tim Ledbetter	459fa8b840	LibPDF: Ensure that xref subsection numbers are u32 Previously, parsing an xref entry with a floating point subsection number would cause a crash.	2024-01-18 15:11:42 +01:00
Nico Weber	d2f3288666	LibPDF: Apply text matrix to each glyph's position We still don't apply it to the glyph itself, so they don't show up scaled or rotated, but they're at the right spot now. One big thing this here hsa going for it is that the final glyph position is now calculated with just `ext_rendering_matrix.map(glyph_position)`. Also, character_spacing and word_spacing are now used unmodified in the SimpleFont::draw_string() loop. This also means we no longer have to undo a scale when updating the position in `Renderer::show_text()`. Most of the rest stays pretty yucky though. The root cause of many problems is that ScaledFont has its rendering sized baked into the object. We want to render fonts at size font_size times scale from text matrix times scale from current transformation matrix (but not size from hotizontal_scaling). So we have to make that the font_size, but then we have to undo that in a bunch of places to get the actualy font size. This will eventually get better when LibPDF moves off ScaledFont.	2024-01-18 14:01:30 +01:00
Nico Weber	f54b0e7c22	LibPDF: Don't accidentally put horizontal_scaling in places Fonts should have size font_size times total scaling. We tried to get that by computing text_rendering_matrix.x_scale() * font_size, but text_rendering_matrix.x_scale() also includes horizontal_scaling, which shouldn't be part of font size. Same for character_spacing and word_spacing. This is all a big mess that's caused by LibPDF using ScaledFont, which requires scaling to be aprt of the text type. I have an in-progress local branch that moves LibPDF to directly use VectorFont, which will hopefully make this (and other things) nicer. But first, let's get this right, and then make sure we don't regress it when things change :^)	2024-01-18 14:01:30 +01:00
Nico Weber	abda5e66f6	LibPDF: Scale delta_x by horizontal_scaling in Renderer::show_text() While PDFFont::draw_string() already returns a position scaled by horizontal_scaling, the division by text_rendering_matrix.x_scale() (which also contains the scaling factor) undid it. Reapply it. Fixes the horizontal layout of the line "should be the same on all lines: super" in Tests/LibPDF/text.pdf.	2024-01-18 14:01:30 +01:00
Nico Weber	470d1d8dcf	LibPDF: Fix order of parameter, text, and current transform matrix PDF spec 1.7 5.3.3 Text Space Details gives the correct multiplication order: parameters * textmatrix * ctm. We used to do text * ctm * parameters (AffineTransform::multiply() does left-multiplication). This only matters if `text_state().rise` is non-zero. In practice, it's almost always zero, in which case the paramter matrix is a diagonal matrix that commutes. Fixes the horizontal offset of "super" in Tests/LibPDF/text.pdf.	2024-01-18 14:01:30 +01:00
Nico Weber	6c65c18c40	LibPDF: Add spec ref to Renderer::calculate_text_rendering_matrix()	2024-01-18 14:01:30 +01:00
Nico Weber	13f007aadb	LibPDF: Tweak vertical position of truetype fonts The vertical coordinates for truetype fonts are different somehow. We compensated a bit for that; now we compensate some more. This is still not 100% perfect, but much better than before.	2024-01-17 08:44:07 +00:00
Nico Weber	1845a406ea	LibPDF: Add debug settings for clipping paths and images	2024-01-17 08:42:56 +00:00
Nico Weber	2d8a22f4b4	LibPDF: Clip images too Since we can't clip against a general path yet, this clips images against the bounding box of the current clip path as well. Clips for images are often rectangular, so this works out well. (We wastefully still decode and color-convert the entire image. In a follow-up, we could consider only converting the unclipped part.)	2024-01-17 08:42:56 +00:00
Nico Weber	5615a2691a	LibPDF: Extract activate_clip() / deactivate_clip() functions No behavior change.	2024-01-17 08:42:56 +00:00
MacDue	d55867e563	LibPDF: Fix paths with negatively sized `re` (rect) commands Turns out the width/height in a `re` command can be negative. This results in rectangles with different winding orders. For example, a negative width results in a reversed winding order. Previously, this was lost by passing the rect through an `AffineTransform` before constructing the path. So instead, this constructs the rect path, and then transforms the resulting path.	2024-01-16 21:31:20 +00:00
Nico Weber	0e91682283	LibPDF: Be more forgiving about trailing image data The predictor code assumed that all stream data is image data (...which would make sense: trailing data there is wasted space). But some PDFs have trailing data there, e.g. 0000257.pdf, so be forgiving about it.	2024-01-16 09:55:11 -05:00
Nico Weber	b34509edd2	LibPDF: Make `pdf --dump-contents` handle \r line endings better Previously, all page contents ended up overprinting a single line over and over for PDFs that used only `\r` as line ending. This is for example useful for 0000364.pdf.	2024-01-15 23:16:45 -07:00
Nico Weber	9f9dbb325b	LibPDF: Make prediction filters error on user-controlled alloc OOM	2024-01-15 23:06:06 -07:00
Nico Weber	93f5420282	LibPDF: Start implementing the TIFF predictor This codepath is separate from the predictor in the TIFF decoder. The TIFF decoder currently does bits->Color conversion before processing the predictor. That doesn't fit the PDF model where filters are processed before converting streams into bitmaps. If this code here ever grows to handle all cases, maybe we can move it over to the TIFF decoder and then make it do predictions before decoding to colors, to share this code. (TIFF prediction is pretty messy since it's bits-per-pixel-dependent. PNG prediction is always byte-based, which makes things easier.)	2024-01-15 23:06:06 -07:00
Nico Weber	9a93f677f4	LibPDF: Mark text rendering matrix as dirty after TJ numbers Mostly because I audited all places that assigned to `m_text_matrix` after #22760. This one is very difficult to trigger in practice. `show_text()` marks the text rendering matrix dirty already, so this only has an effect if the `TJ` array starts with a number, and the matrix isn't marked dirty going in. `Tm` caches the text rendering matrix, so I changed text.pdf to contain: ``` 1 0 0 1 45 130 Tm [ 200 (Hello) -2000 (World) ] TJ T* ``` This first sets an x offset of 5 (on top of the normal 40), and then undoes it (`200` is multiplied by font size (25) / -1000, and `200 * 25 / -1000` is -5). Before this change, the topmost "Hello World" ended up slightly indented. Likely no behavior change in practice, but makes the code easier to understand, and maybe it helps in the wild somewhere.	2024-01-15 08:39:04 +00:00
Nico Weber	f23f5dcd62	LibPDF: Mark text rendering matrix dirty for Td operator 0000342.pdf page 5 contains this snippet: ``` /T1_1 10.976 Tf 0 -31.643 TD (This)Tj 1 0 0 1 54 745.563 Tm 22.181 -31.643 Td [(vehicle)-270.926(uses)... ``` The `Tm` marked the text rendering matrix as dirty at the start, but it then calls calculate_text_rendering_matrix() almost in the next line, which recalculates the text rendering matrix and caches the new matrix. The `Td` used to not mark it as dirty, and we'd draw "vehicle" with an incorrect matrix.	2024-01-15 08:37:55 +00:00
Nico Weber	f4ee9a2333	LibPDF: Support drawing images with 16 bits per channel This uses the tried-and-true "throw away the lower 8 bits" technique for now. This lets us render Tests/LibPDF/wide-gamut-only.pdf.	2024-01-12 16:20:46 -07:00
Nico Weber	5f85aff036	LibPDF: Move ColorSpace::style() to take ReadonlySpan<float> All ColorSpace subclasses converted to float anyways, and this allows us to save lots of float->Value->float conversions during image color space processing. A bit faster: ``` N Min Max Median Avg Stddev x 50 0.99054313 1.0412271 0.99933481 1.0052408 0.012931916 + 50 0.97073889 1.0075941 0.97849107 0.98184034 0.0090329046 Difference at 95.0% confidence -0.0234004 +/- 0.00442595 -2.32785% +/- 0.440287% (Student's t, pooled s = 0.0111541) ```	2024-01-12 12:37:56 +00:00
Nico Weber	56a4af8d03	LibPDF: Don't reallocate Vectors in ICCBasedColorSpace all the time Microoptimization; according to ministat a bit faster: ``` N Min Max Median Avg Stddev x 50 1.0179932 1.0561159 1.0315337 1.0333617 0.0094757426 + 50 1.000875 1.0427601 1.0208509 1.0201902 0.01066116 Difference at 95.0% confidence -0.0131715 +/- 0.00400208 -1.27463% +/- 0.387287% (Student's t, pooled s = 0.0100859) ```	2024-01-12 12:37:56 +00:00
Nico Weber	cfd05b1a55	LibPDF: Use MatrixMatrixConversion when possible Reduces time spent rendering page 3 of 0000849.pdf from 1.32s to 1.13s on my machine. Also reduces the time to run Meta/test_pdf.py on 0000.zip (without 0000849.pdf) from 56s to 54s.	2024-01-12 09:09:56 +01:00
Nico Weber	c161b2d2f9	LibPDF: Extract ICCBasedColorSpace::sRGB() helper	2024-01-12 09:09:56 +01:00

1 2 3 4 5 ...

598 commits