serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-28 17:42:06 +00:00

Author	SHA1	Message	Date
Nico Weber	83128d093e	LibPDF: Implement most of the spec algorithm for picking TrueType glyphs Non-CID-keyed fonts in PDFs have 8-bit codepoints which are mapped from bytes to character names via encoding. TrueType fonts don't index glyphs by name (Type1 fonts do), so the fix (codified in the spec) was to make a list of all possible glyph names and map those to (16-bit) unicode values, and then pass those into the truetype cmap. (As a fallback, we're supposed to look at the optional names in the font's "post" table. That part isn't implemented here yet.) (Note that this affects the behavior of fallback fonts for TrueType fonts, but not yet fallback fonts for Type1 fonts, and neither the behavior of the 14 built-in Type1 fonts (which we implement as fallback fonts), since the TrueType fallback in Type1Font.cpp does not use this algorithm yet. This will be fixed in a future patch.)	2024-02-25 15:15:20 +01:00
Nico Weber	207717982c	LibPDF: Read /Flags off font descriptors	2024-02-25 15:15:20 +01:00
Lucas CHOLLET	cb03ab4a5a	LibPDF: Handle the BlackIs1 parameter of the CCITTFaxDecode Filter	2024-02-24 16:24:45 -07:00
Lucas CHOLLET	6b3bab5c8a	LibPDF: Plug in the CCITTFaxDecode filter to our CCITT decoder We only call the decoder for Group 4 images. We do support Group 3 images, but let's wait to find a PDF with these before adding support.	2024-02-24 16:24:45 -07:00
Nico Weber	b258ba2767	LibPDF: Use decode_hex_digit() more For `:#xx` in names, we now also handle lower-case hex digits. The spec is silent on the case of these hex digits. Our previous check (isxdigit(), and now is_ascii_hex_digit()) lets through lower-case hex digits, so it seems better to handle them rather than computing e.g. `'a' - 'A' + 10` (== 42 -- off by 32!). I don't know if this has any visible effect on any files, but it's more correct, and less code, and the code looks more like the code in Filter::decode_ascii_hex().	2024-02-23 12:11:25 -05:00
Nico Weber	783b1d1c11	LibPDF: Use is_ascii_hex_digit() instead of isxdigit() See description of #7684 for motivation. Also, makes this code look more like the hex code in Filter::decode_ascii_hex(). No behavior change.	2024-02-23 12:11:25 -05:00
Nico Weber	c9234f35f1	LibPDF/CFF: Clear stack after "endchar" commands Both type 1 and type 2 spec tell us to do this. I haven't observed a difference from this, but I noticed it in the spec while I was touching this code. Probably good to do what the spec tells us to do.	2024-02-22 06:59:28 +01:00
Nico Weber	020c00ede2	LibPDF/CFF: Use offset in accented_character() data Without this, the dieresis above an a is all the way to the left instead of over the letter.	2024-02-22 06:59:28 +01:00
Nico Weber	12859dfde5	LibPDF/CFF: Treat endchar in type 2 as type 2 "seac" when requested With this, a character can be defined that uses two existing glyphs. This is useful for umlauts and the like, which then just need to reference e.g. the glyphs named "a" and "dieresis" and provide a translation. Makes umlauts appear on some PDFs using CFF type2 data in Type 1 fonts.	2024-02-22 06:59:28 +01:00
Nico Weber	cade76d240	LibPDF+LibGfx: Do not try to read "OS/2" table for PDFs It is sometimes truncated in fonts embedded in PDFs, and the data is not needed to render PDFs. 2 of my 1000 test PDFs used to complain "Could not load OS2 v1: Not enough data" and 1 "Could not load OS2 v2: Not enough data" before. Increases number of PDFs that render without diagnostics from 764 to 765 (and decreases the number of distinct error messages from 27 to 25).	2024-02-21 13:38:33 +01:00
Nico Weber	0dee94ef40	LibPDF+LibGfx: Do not try to read "hmtx" table for PDFs It is sometimes truncated in fonts embedded in PDFs, and the data is not needed to render PDFs. 26 of my 1000 test files complained "Could not load Hmtx: Not enough data" before. Increases number of PDFs that render without diagnostics from 743 to 764.	2024-02-21 13:38:33 +01:00
Nico Weber	5efe80af7f	LibPDF+LibGfx: Do not try to read "name" table for PDFs It is often missing in fonts embedded in PDFs. 75 of my 1000 test files complained "Font is missing Name" when trying to read fonts before. Increases number of PDFs that render without diagnostics from 682 to 743.	2024-02-21 13:38:33 +01:00
Nico Weber	41eca52b50	LibGfx/OpenType: Tweak Font::try_load_from_externally_owned_memory() It now takes an Options object instead of passing several default parameters. No behavior change.	2024-02-21 13:38:33 +01:00
Nico Weber	3b616b6af8	LibPDF: Use original error for failing ICC load	2024-02-21 13:37:08 +01:00
Nico Weber	fa95e5ec0e	LibPDF: Fix line drawing when line_width is 0 We used to skip lines with width 0. The correct behavior per spec is to draw them one pixel wide instead.	2024-02-21 10:30:57 +01:00
Nico Weber	1cb450e9a3	LibPDF: Give CFF Glyph 0 the name .notdef This is required by the CFF spec, and is consistent with what we do for the encoding 24 lines down. As far as I can tell, nothing in `Type1FontProgram::rasterize_glyph()` or in Type1Font.cpp implements the "If an encoding maps to a character name that does not exist in the Type 1 font pro- gram, the .notdef glyph is substituted." line from the PDF 1.7 spec (in 5.5.5 Character Encoding, Encodings for Type 1 Fonts) yet, so this does yet have an effect.	2024-02-20 06:54:50 -05:00
Nico Weber	05a7482118	LibPDF/CFF: Add dbgln() when failing encoding bounds check	2024-02-20 08:43:10 +00:00
Nico Weber	4705d38fa7	LibPDF/CFF: Fix off-by-one when reading internal encoding We use `i - 1` to index these arrays, so that's what we should use for the bounds check as well.	2024-02-20 08:43:10 +00:00
Nico Weber	012f6d46e7	LibPDF: Implement stream CIDToGIDMaps for Type0 CIDFontType2 fonts Of my 1000 test files, 73 have stream Type0 truetype fonts with stream CIDToGIDMaps. This makes that work. (With this patch, the number of files in my 1000 test files complaining "Font is missing Name" increases from 41 to 75, so a bit under half of the fonts using stream CIDToGIDMaps also have no 'name' table. So that's next.) Increases files without issues from 652 to 681.	2024-02-18 15:43:33 -05:00
Nico Weber	dde11e1757	LibPDF: Ignore unknown CFF operators https://adobe-type-tools.github.io/font-tech-notes/pdfs/5177.Type2.pdf says "The behavior of undefined operators is unspecified." but https://learn.microsoft.com/en-us/typography/opentype/spec/cff2 says "When an unrecognized operator is encountered, it is ignored and the stack is cleared." Some type 0 CIDFontType0C fonts (i.e. CID-keyed non-OpenType CFF fonts) depend on the latter, even though they're governed by the former spec. Fixes rendering of text in 0000521.pdf (e.g. page 10 or 5). The font there has a bunch of 0 opcodes for some reason.	2024-02-18 08:40:04 +00:00
Nico Weber	05f382fc6e	LibPDF: Add CIDFontType2::set_font_size() See #20084 commit 4. This does the same for truetye-based type0 fonts. Fixes font sizes on e.g. 1800-2017.pdf.	2024-02-17 16:08:48 +01:00
Nico Weber	f4a59246f5	LibPDF: Implement initial support for Type0 truetype fonts Disclaimers, similar to what's on #23202 (and most of the prerequisites mentioned there are needed for this too): * Only supports the `Identity-H` type0 cmap at the moment * Doesn't support vertical text yet * Only supports the `Identity` CIDToGIDMap at the moment (this one is a truetype-only thing)	2024-02-17 16:08:48 +01:00
Nico Weber	bd74447dba	LibPDF: Initial support for drawing CFF-based Type0 fonts Together with the already-merged #23122, #23128, #23135, #23136, #23162, and #23167, #23179, #23190, #23194 this adds initial support for rendering some CFF-based Type0 fonts :^) There's a long list of things that still need improving after this: * A small number of CFF programs contain the charstring command 0, which is invalid. Currently, this makes us reject the whole font. * Type1FontProgram::rasterize_glyph() is name-based. For CID-based fonts, we want a version that takes CIDs (character IDs) instead. For now, I'm printing the CID to a string and using that, yuck. (I looked into doing this nicely. I do want to do that, but I need to read up on how the `seac` type1 charstring command uses character names to identify parts of an accented character. Also, it looks like `seac`'s accented character handling moved over to `endchar` in type2 charstring commands (i.e. in CFF data), and it looks like we don't implement that at all. So I need to do more reading first, and I didn't want to block this on that.) * The name for the first string in name-based CFF fonts looks wrong; added a FIXME for that for now. * This supports the named Identity-H cmap only for now. Identity-H maps UTF16-BE values to glyph IDs with the idenity function, and assumes it's horizontal text. Other named cmaps in my test files are UniJIS-UCS2-H, UniCNS-UCS2-H, Identity-V, UniGB-UCS2-H, UniKS-UCS2-H. (There are also 2 files using the stream-based cmaps instead of the name-based ones.) * In particular, we can't draw vertical text (`-V`) yet * Passing in the encoding to CFF::create() is awkward (it's nullptr for CID-keyed fonts), and it's also not necessary since `Type1Font::draw_glyph()` already does the "take encoding from PDF, and only from font if the PDF doesn't store one" dance. * This doesn't cache glyphs but re-rasterizes them each time. Easy to add, but maybe I want to look at rotation first. And things don't feel glacial as-is. * Type0Font::draw_glyph() is pretty similar to second half of Type1Font::draw_glyph()	2024-02-16 12:41:10 -05:00
Nico Weber	c9d48bbca4	LibPDF/CFF: Add a comment to CFF::parse_charset()	2024-02-16 12:41:10 -05:00
Nico Weber	5c8778a161	LibPDF/CFF: Compute per-glyph glyph width in CID-keyed fonts Make TopDict's defaultWidthX and nominalWidthX Optional<>s so that we can check if they're set per fdselect-selected font dict, and if so use the value from there in CID-keyed fonts. Otherwise, keep using the value in the top dict.	2024-02-16 12:41:10 -05:00
Nico Weber	1d1e406b3a	LibPDF/CFF: Implement some special handling for CID-keyed fonts * FDArray, FDSelect must be present * Encoding must not be present * Charset maps from GID (Glyph ID) to CID (Character ID), instead of to character name	2024-02-15 12:32:31 +01:00
Nico Weber	7494f24430	LibPDF/CFF: Store if a font program is CID-keyed ...and reject CID-keyed font programs for Type1 fonts.	2024-02-15 12:32:31 +01:00
Nico Weber	bb7d29d007	LibPDF/CFF: Read font dicts pointed to by the fdarray offset The fdselect array (that we already read) maps eachs glyph ID to an fdarray index. The font dict at that index then stores information for that glyph. In practice, this is used to assign different defaultWidthX / nominalWidthX values to blocks of glyphs in CID-keyed fonts. We don't do anything yet with the data, and we also don't send data of CID-keyed CFFs into this parser either, so no behavior change.	2024-02-15 12:32:31 +01:00
Nico Weber	524a4f6256	LibPDF/CFF: Make parse_top_dict() return all top dicts This happens for CFFs that contain multiple fonts. This doesn't happen in practice, but the same code will be used for fdarray parsing, which will contain several dicts. No behavior change.	2024-02-15 12:32:31 +01:00
Nico Weber	9f1cf8babc	LibPDF/CFF: Extract parse_top_dict() function Pure code move, no behavior change.	2024-02-15 12:32:31 +01:00
Nico Weber	eb4632e08a	LibPDF: Give CFF built-in encoding and charset arrays an underlying type These arrays store SIDs ("String IDs"), so give them that type now that we have to_array() and it's easy to do. No behavior change.	2024-02-14 06:56:43 +01:00
Nico Weber	ddbcd901d1	LibPDF: Separate Type0 CMap errors No behavior change, just more granular "not implemented" diagnostics.	2024-02-13 19:46:31 +01:00
Nico Weber	8e50bbc9fb	LibPDF: Add string drawing code for Type0Fonts This is very similar to SimpleFont::draw_string() for now, but it'll become a bit different when we add support for vertical text. CIDFontType now only needs to draw single glyphs. Neither of the subclasses can do that yet, so no behavior change yet.	2024-02-13 19:46:18 +01:00
Nico Weber	eaa568210f	LibPDF: Split CCITT errors by group	2024-02-13 19:45:47 +01:00
Nico Weber	c201825cc8	LibPDF: Read CCITT decode params We don't do anything with them yet, so no behavior change.	2024-02-13 19:45:47 +01:00
Nico Weber	454a10774e	LibPDF: Let Filter::handle_lzw_and_flate_parameters() read decode params ...instead of reading them in Filter::decode() for all filters and then passing them around to only the LZW and flate filters. (EarlyChange is LZWDecode-only, so that's read there instead.) No behavior change.	2024-02-13 19:45:47 +01:00
Nico Weber	9875ce0c78	LibPDF: Reorder loops in SampledFunction::evaluate() Previously, we'd loop over the index of the output coordinate, for example for a CMYK->RGB function, we'd loop over RGB. For every output index, we'd then sample the function at the CMYK input point. Now, we sample at CMYK once and return a span for all outputs, since they're stored in contiguous memory. And we then loop over the outputs only to do weighting and mapping to the target range at the end. Reduces the runtime of (cd Tests/LibPDF; \ ../../Build/lagom/bin/BenchmarkPDF --benchmark_repetitions 5) from 235.6±2.3ms to 103.2±3.3ms on my system, and makes SampledFunction::evaluate() more similar to lerp_nd() in TagTypes.h.	2024-02-13 19:45:19 +01:00
Nico Weber	751185cb76	LibPDF: Scale default glyph width by font size and x scale This fixes rendering of commas in 0000941.pdf page 1. The commas use the default width, and without this they show up very large, covering the page. Also, it's nice that the code now looks like the regular case 4 lines further up.	2024-02-12 14:32:04 +00:00
Nico Weber	7ab4e53b99	LibPDF/CFF: Add code for fdselect parsing This is one of the two top dict entries we need for CID-keyed fonts. We don't send any CID-keyed font data into the CFF parser yet, so no behavior change.	2024-02-12 14:05:16 +01:00
Nico Weber	6ebddab448	LibPDF/CFF: Add enum values for CID-keyed font top dict entries No behavior change.	2024-02-12 14:05:16 +01:00
Nico Weber	6df0150671	LibPDF: Add some CIDFontType0C scaffolding No real behavior change. We don't actually load the CFF data yet (blocked on #23136 and some more), and we don't have drawing code yet, and Type0Font::draw_string() doesn't do any drawing yet. But it's a step in the right direction.	2024-02-12 13:59:00 +01:00
Nico Weber	8e7cb11856	LibPDF/CFF: Add enum values for remaining PrivDictOperators No behavior change, except that we now dbgln() if we see a PrivDictOperator we don't know about. (I haven't seen this in practice, but I found this useful while debugging things.)	2024-02-11 14:52:54 +01:00
Nico Weber	a91fecb17e	Revert "LibPDF: Don't over-read in charset formats 1 and 2" This reverts commit `52afa936c4`. No longer necessary after #23122 -- turns out things work better when you do them right. No behavior change.	2024-02-09 16:52:01 +00:00
Nico Weber	9bccb8c8d7	LibPDF: Make CFF::parse_charset() return SIDs ...and do string expansion at the call site. CID-keyed fonts treat the charset as CIDs instead of as SIDs, so having access to the SIDs in numberic form will be useful when we implement support for CID-keyed CFF fonts. No behavior change.	2024-02-09 13:57:23 +01:00
Nico Weber	9750261921	LibPDF: Rename charset to charset_names in CFF parser No behavior change.	2024-02-09 13:57:23 +01:00
Nico Weber	32f601f9a4	LibPDF: Fix small bug from #21452 I implemented CFF charset format 2 in `6f783929dd` with the note "I haven't seen this being used in the wild". Now that I have seen it (0000658.pdf), I can say that this has never worked, despite me claiming "it's easy to implement". But now it works!	2024-02-08 13:48:56 +00:00
Nico Weber	9fc47345ce	LibGfx+LibPDF: Make sample() functions take ReadonlySpan<> ...instead of Vector<>. No behavior (or performance) change.	2024-02-06 08:44:53 +01:00
Nico Weber	92a628c07c	LibPDF: Always treat `/Subtype /Image` as binary data when dumping Sometimes, the "is mostly text" heuristic fails for images. Before: Build/lagom/bin/pdf --render out.png ~/Downloads/0000/0000521.pdf \ --page 10 --dump-contents 2>&1 \| wc -l 25709 After: Build/lagom/bin/pdf --render out.png ~/Downloads/0000/0000521.pdf \ --page 10 --dump-contents 2>&1 \| wc -l 11376	2024-02-05 21:18:19 -05:00
Nico Weber	f562c470e2	LibGfx+LibPDF: Simpler and faster N-D linear sampling Previously, if we wanted to to e.g. do linear interpolation in 2-D, we'd get a sample point like (1.3, 4.4), then get 4 samples around it at (1, 4), (2, 4), (1, 5), (2, 5), then reduce the 4 samples to 2 samples by computing the combined samples `0.3 * f(1, 4) + 0.7 * f(2, 4)` and `0.3 * f(1, 5) + 0.8 * f(2, 5)`, and then 1-D linearly blending between these two samples with the factor 0.4. In the end we'd multiply the first value by 0.3 * 0.4, the second by 0.7 * 0.4, the third by 0.3 * 0.6, and the third by 0.7 * 0.6, and then sum them all up. This requires computing and storing 2N samples, followed by another 2N iterations to combine the 2N sampls to a single value. (N is in practice either 4 or 3, so 2N isn't super huge.) Instead, for every sample we can directly compute the product of weights and sum them up directly. This lets us omit the second loop and storing 2**N values, in exchange for doing an additional O(n) work to compute the product. Takes Build/lagom/bin/image --no-output --invert-cmyk \ --assign-color-profile \ Build/lagom/Root/res/icc/Adobe/CMYK/USWebCoatedSWOP.icc \ --convert-to-color-profile serenity-sRGB.icc \ cmyk.jpg form 3.42s to 3.08s on my machine, almost 10% faster (and less code). Here cmyk.jpg is a 2253x3080 cmyk jpeg, and USWebCoatedSWOP.icc is an mft2 profile with input tables with 256 samples and a 9x9x9x9 CLUT. The LibPDF change is covered by TEST_CASE(sampled) in LibPDF.cpp, and the LibGfx change is basically the same change as the one in LibPDF (where the test results don't change) and the output subjectively looks identical. So hopefully this causes indeed no behavior change :^)	2024-02-04 21:49:23 +01:00
Nico Weber	955d73657e	LibPDF: Make `pdf --dump-contents` dump less binary data For pages containing images or embedded fonts, --dump-contents used to dump a ton of binary data. That isn't very useful, so stop doing it. Before: % time Build/lagom/bin/pdf --render out.png \ ~/Downloads/0000/0000711.pdf --dump-contents \| wc -l 937972 Now: % time Build/lagom/bin/pdf --render out.png \ ~/Downloads/0000/0000711.pdf --dump-contents \| wc -l 6566 Printing 7k lines is also much faster than printing 940k, 0.15s instead of 2s.	2024-02-03 08:26:29 +00:00

1 2 3 4 5 ...

628 commits