serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-31 07:02:44 +00:00

Author	SHA1	Message	Date
Nico Weber	98e272ce15	LibPDF: Silently ignore BX / EX operators See the added comment for reasoning.	2024-03-02 17:43:53 -05:00
Nico Weber	9e502dcfe4	LibPDF: Honor writing mode in TJ operator as well	2024-03-02 12:25:09 +01:00
Nico Weber	c69797fda9	LibPDF: Implement support for vertical text for Type0 For Identity-V only for now.	2024-03-02 12:25:09 +01:00
Nico Weber	6348a857ea	LibPDF: Prepare for more encodings than just Identity-H in Type0 code Introduces CIDIterator, an iterator type for iterating over CIDs. Also introduces Type0CMap which can return a CIDIterator given some bytes. The existing code of treating the bytes as an identity map of big-endian u16s is now implemented in IdentityType0CMap. No behavior change.	2024-03-02 12:25:09 +01:00
Nico Weber	b9a4689af3	LibPDF: In Type0Font, read metrics /DW2 and /W2 for vertical text Not used for anything yet.	2024-03-02 12:25:09 +01:00
Nico Weber	ef5d7b685d	LibPDF: In Type0::initialize(), move variable increment next to cause No behavior change.	2024-03-02 12:25:09 +01:00
Nico Weber	fc9b2440bd	LibPDF: Add some spec comments in Type0Font::initialize()	2024-03-02 12:25:09 +01:00
Nico Weber	004e47df88	LibPDF: Remove minor duplication in Renderer::text_show_string_array() This "regressed" in #10080 (back then, the branches were smaller). No behavior change.	2024-03-02 12:25:09 +01:00
Hendiadyoin1	773a280bdf	LibPDF: Use a struct for the subsection in parse_xref_stream	2024-03-01 14:05:53 -07:00
Hendiadyoin1	fe0fde2154	Userland+Tests: Remove unused <AK/Tuple.h> includes	2024-03-01 14:05:53 -07:00
Nico Weber	c3980eda9e	LibPDF: Give Type0 CIDFontType2 a ScaledFont instead of a Font ...with the same reasoning as the previous commit. No behavior change.	2024-03-01 17:56:59 +01:00
Nico Weber	f374ad50a1	LibPDF: Give TrueTypePainter a ScaledFont instead of a Font This will allow us to get at the font's glyphs as paths, which will eventually enable us to implement glyph rotation. We'll have to do our own caching then, but we can then hopefully share the caching across the Type0 / Type1 / TrueType codepaths. It also gives us access to a font's glyphs by glyph id, which will help us implementing looking up glyph ids by postscript name. (Else we'd have to plumb through a whole Painter::draw_glyph_by_postscript_name() API just for LibPDF.) No behavior change.	2024-03-01 17:56:59 +01:00
Nico Weber	5dad8b693e	LibPDF: Make PDFFont::replacement_for() return a ScaledFont We only want to load non-bitmap fallback fonts as PDF fallback fonts, so let's make the return type represent that. No behavior change.	2024-03-01 17:56:59 +01:00
Nico Weber	2bbdfe0fba	LibPDF: Treat "Oblique" as italic indicator The standard 14 fonts include e.g. "CourierBoldOblique" and "HelveticaOblique". Let's map them to italic fonts :^)	2024-03-01 14:17:42 +01:00
Nico Weber	8e3c54f203	LibPDF: Implement ZapfDingbats clause of the adobe glphy list algorithm Liberation Sans still doesn't have the vast majority of the Zapf Dingbats glyphs, but now we map the Zapf Dingbats names to good unicode values. So we only need to use a different font and all should work. (And Liberation Sans has _some_ of the glyphs, like 13 of the 223.) And we now render empty squares instead of wrong glyphs for the ones we don't have. I haven't seen any PDFs using ZapfDingbats in the wild, but they probably exist somewhere. (Tests/LibPDF/standard-14-fonts.pdf is a synthetic PDF using it.)	2024-03-01 14:17:42 +01:00
Nico Weber	2eb099aabe	LibPDF: Implement some of the AdobeGlyphList algorithm Turns out there's a spec that goes with the table. The big change here is that we can now map `uni1234` to 0x1234 and `u123456` to 0x123456. The parts where we split a name on `_` and map each component and the part where we're supposed to allow multiple groups of 4 after `uni` aren't implemented yet. The ZapfDingbats lookup is also still missing. I haven't seen this have an effect in practice, but it's easy to construct a PDF with a custom encoding where it would make a difference.	2024-03-01 14:17:42 +01:00
Nico Weber	9aa31157d5	LibPDF: Use right encoding for standard fonts Symbol and ZapfDingbats We use Liberation Sans for the actual glyph for these, and that's missing some (Symbol) / all (ZapfDingbats) of the glyphs we need for these two standard fonts (...or at least the mapping from name to glyph, not sure). But still, better rendering squares than completely incorrect glpyhs. Our code deciding what to do when a value isn't found in an encoding, or when the name doesn't map to a glpyh, also needs work, but that's mostly independent of this change. I think this is a nice small standalone progression.	2024-02-27 17:42:08 -05:00
Nico Weber	76105d5d7f	LibPDF: Resize images to the larger of image and mask dimensions Makes text show up on 0000646.pdf pages 87-92, which for some reason renders all text using 2x2 images with huge masks that contain rendered text outlines.	2024-02-27 17:39:13 -05:00
Nico Weber	472bc367d3	LibPDF: Do not have redundant variables for image size This way, the size of the bitmap cannot become out of sync with these variables. No behavior change.	2024-02-27 17:39:13 -05:00
Nico Weber	83d29b3e45	LibPDF: Hack around a FIXME in TrueTypePainter::get_glyph_width() This will need further thought once we implement support for the truetype 'post' table, but for now it's correct most of the time, and better than not doing it.	2024-02-27 07:02:27 +01:00
Nico Weber	448eaa2966	LibPDF: Let Type1Font use TrueTypePainter for standard fonts ...and for fallback fonts too. We use Liberation Sans (a truetype font) for standard and fallback fonts. So we should use the standard PDF algorithm for mapping bytes to truetype glyphs. TrueTypePainter knows how to do this. Makes the "fi" ligature in the title on page 1 of 5014.CIDFont_Spec.pdf or the dotless-i in the title of page 2 of ThinkingInPostScript.pdf show up. They use Helvetica and TImes, and Helvetica and Symbol respecitively (with -Bold variants).	2024-02-27 07:02:27 +01:00
Nico Weber	86a7753d65	LibPDF: Move TrueType painting into a new class No behavior change.	2024-02-27 07:02:27 +01:00
Nico Weber	84d1e3956f	LibPDF: Make truetype ascent adjustment more local It's only used in this function. No behavior change.	2024-02-27 07:02:27 +01:00
Nico Weber	03fab7089a	LibPDF+PDFViewer: Extract Renderer::apply_page_rotation() No behavior change.	2024-02-27 07:02:02 +01:00
Nico Weber	cafaaa0e76	LibPDF: Don't crash on zero-width characters in type1 fonts Since ScaledFont bakes the size of the font into the font type, we do the same for Type1 fonts, and then have to divide by the font height when figuring out what to scale by. For a target width of 0, chances are the source width is also 0, and we end up with NaN due to dividing 0 by 0. This then triggered the `VERIFY(isfinite(error))` in can_approximate_bezier_curve() in Painter.cpp. Check for this case and scale by 0 instead of dividing. It could happen that the denominator is 0 without the numerator being 0, but it's not clear what that's supposed to mean. In this case we'd end up with +inf/-inf, which would also trigger the assert. I haven't seen this case in practice, so let's not worry about that for now. (A nicer longer-term fix is probably to make LibPDF use VectorFont instead of ScaledFont, so that we don't have to bake the font size into the font type. Then we won't need this division at all. In the meantime, this fixes the crash.) Fixes a crash on page 66 of https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf Fixes a crash on page 37 of https://open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf Fixes crashes in `0000310.pdf`, `0000430.pdf`, `0000229.pdf`. Brings down the number of crashes on my 1000 file test set from 5 with 3 distinct stacks to 2 with 1 distinct stack. (The number went up from 3 crashes with 2 distinct stacks to 5/3 when we started rendering much more text when Type0 font support was added. This fixes the crashes we had before Type0 support.)	2024-02-27 07:01:05 +01:00
Nico Weber	83128d093e	LibPDF: Implement most of the spec algorithm for picking TrueType glyphs Non-CID-keyed fonts in PDFs have 8-bit codepoints which are mapped from bytes to character names via encoding. TrueType fonts don't index glyphs by name (Type1 fonts do), so the fix (codified in the spec) was to make a list of all possible glyph names and map those to (16-bit) unicode values, and then pass those into the truetype cmap. (As a fallback, we're supposed to look at the optional names in the font's "post" table. That part isn't implemented here yet.) (Note that this affects the behavior of fallback fonts for TrueType fonts, but not yet fallback fonts for Type1 fonts, and neither the behavior of the 14 built-in Type1 fonts (which we implement as fallback fonts), since the TrueType fallback in Type1Font.cpp does not use this algorithm yet. This will be fixed in a future patch.)	2024-02-25 15:15:20 +01:00
Nico Weber	207717982c	LibPDF: Read /Flags off font descriptors	2024-02-25 15:15:20 +01:00
Lucas CHOLLET	cb03ab4a5a	LibPDF: Handle the BlackIs1 parameter of the CCITTFaxDecode Filter	2024-02-24 16:24:45 -07:00
Lucas CHOLLET	6b3bab5c8a	LibPDF: Plug in the CCITTFaxDecode filter to our CCITT decoder We only call the decoder for Group 4 images. We do support Group 3 images, but let's wait to find a PDF with these before adding support.	2024-02-24 16:24:45 -07:00
Nico Weber	b258ba2767	LibPDF: Use decode_hex_digit() more For `:#xx` in names, we now also handle lower-case hex digits. The spec is silent on the case of these hex digits. Our previous check (isxdigit(), and now is_ascii_hex_digit()) lets through lower-case hex digits, so it seems better to handle them rather than computing e.g. `'a' - 'A' + 10` (== 42 -- off by 32!). I don't know if this has any visible effect on any files, but it's more correct, and less code, and the code looks more like the code in Filter::decode_ascii_hex().	2024-02-23 12:11:25 -05:00
Nico Weber	783b1d1c11	LibPDF: Use is_ascii_hex_digit() instead of isxdigit() See description of #7684 for motivation. Also, makes this code look more like the hex code in Filter::decode_ascii_hex(). No behavior change.	2024-02-23 12:11:25 -05:00
Nico Weber	c9234f35f1	LibPDF/CFF: Clear stack after "endchar" commands Both type 1 and type 2 spec tell us to do this. I haven't observed a difference from this, but I noticed it in the spec while I was touching this code. Probably good to do what the spec tells us to do.	2024-02-22 06:59:28 +01:00
Nico Weber	020c00ede2	LibPDF/CFF: Use offset in accented_character() data Without this, the dieresis above an a is all the way to the left instead of over the letter.	2024-02-22 06:59:28 +01:00
Nico Weber	12859dfde5	LibPDF/CFF: Treat endchar in type 2 as type 2 "seac" when requested With this, a character can be defined that uses two existing glyphs. This is useful for umlauts and the like, which then just need to reference e.g. the glyphs named "a" and "dieresis" and provide a translation. Makes umlauts appear on some PDFs using CFF type2 data in Type 1 fonts.	2024-02-22 06:59:28 +01:00
Nico Weber	cade76d240	LibPDF+LibGfx: Do not try to read "OS/2" table for PDFs It is sometimes truncated in fonts embedded in PDFs, and the data is not needed to render PDFs. 2 of my 1000 test PDFs used to complain "Could not load OS2 v1: Not enough data" and 1 "Could not load OS2 v2: Not enough data" before. Increases number of PDFs that render without diagnostics from 764 to 765 (and decreases the number of distinct error messages from 27 to 25).	2024-02-21 13:38:33 +01:00
Nico Weber	0dee94ef40	LibPDF+LibGfx: Do not try to read "hmtx" table for PDFs It is sometimes truncated in fonts embedded in PDFs, and the data is not needed to render PDFs. 26 of my 1000 test files complained "Could not load Hmtx: Not enough data" before. Increases number of PDFs that render without diagnostics from 743 to 764.	2024-02-21 13:38:33 +01:00
Nico Weber	5efe80af7f	LibPDF+LibGfx: Do not try to read "name" table for PDFs It is often missing in fonts embedded in PDFs. 75 of my 1000 test files complained "Font is missing Name" when trying to read fonts before. Increases number of PDFs that render without diagnostics from 682 to 743.	2024-02-21 13:38:33 +01:00
Nico Weber	41eca52b50	LibGfx/OpenType: Tweak Font::try_load_from_externally_owned_memory() It now takes an Options object instead of passing several default parameters. No behavior change.	2024-02-21 13:38:33 +01:00
Nico Weber	3b616b6af8	LibPDF: Use original error for failing ICC load	2024-02-21 13:37:08 +01:00
Nico Weber	fa95e5ec0e	LibPDF: Fix line drawing when line_width is 0 We used to skip lines with width 0. The correct behavior per spec is to draw them one pixel wide instead.	2024-02-21 10:30:57 +01:00
Nico Weber	1cb450e9a3	LibPDF: Give CFF Glyph 0 the name .notdef This is required by the CFF spec, and is consistent with what we do for the encoding 24 lines down. As far as I can tell, nothing in `Type1FontProgram::rasterize_glyph()` or in Type1Font.cpp implements the "If an encoding maps to a character name that does not exist in the Type 1 font pro- gram, the .notdef glyph is substituted." line from the PDF 1.7 spec (in 5.5.5 Character Encoding, Encodings for Type 1 Fonts) yet, so this does yet have an effect.	2024-02-20 06:54:50 -05:00
Nico Weber	05a7482118	LibPDF/CFF: Add dbgln() when failing encoding bounds check	2024-02-20 08:43:10 +00:00
Nico Weber	4705d38fa7	LibPDF/CFF: Fix off-by-one when reading internal encoding We use `i - 1` to index these arrays, so that's what we should use for the bounds check as well.	2024-02-20 08:43:10 +00:00
Nico Weber	012f6d46e7	LibPDF: Implement stream CIDToGIDMaps for Type0 CIDFontType2 fonts Of my 1000 test files, 73 have stream Type0 truetype fonts with stream CIDToGIDMaps. This makes that work. (With this patch, the number of files in my 1000 test files complaining "Font is missing Name" increases from 41 to 75, so a bit under half of the fonts using stream CIDToGIDMaps also have no 'name' table. So that's next.) Increases files without issues from 652 to 681.	2024-02-18 15:43:33 -05:00
Nico Weber	dde11e1757	LibPDF: Ignore unknown CFF operators https://adobe-type-tools.github.io/font-tech-notes/pdfs/5177.Type2.pdf says "The behavior of undefined operators is unspecified." but https://learn.microsoft.com/en-us/typography/opentype/spec/cff2 says "When an unrecognized operator is encountered, it is ignored and the stack is cleared." Some type 0 CIDFontType0C fonts (i.e. CID-keyed non-OpenType CFF fonts) depend on the latter, even though they're governed by the former spec. Fixes rendering of text in 0000521.pdf (e.g. page 10 or 5). The font there has a bunch of 0 opcodes for some reason.	2024-02-18 08:40:04 +00:00
Nico Weber	05f382fc6e	LibPDF: Add CIDFontType2::set_font_size() See #20084 commit 4. This does the same for truetye-based type0 fonts. Fixes font sizes on e.g. 1800-2017.pdf.	2024-02-17 16:08:48 +01:00
Nico Weber	f4a59246f5	LibPDF: Implement initial support for Type0 truetype fonts Disclaimers, similar to what's on #23202 (and most of the prerequisites mentioned there are needed for this too): * Only supports the `Identity-H` type0 cmap at the moment * Doesn't support vertical text yet * Only supports the `Identity` CIDToGIDMap at the moment (this one is a truetype-only thing)	2024-02-17 16:08:48 +01:00
Nico Weber	bd74447dba	LibPDF: Initial support for drawing CFF-based Type0 fonts Together with the already-merged #23122, #23128, #23135, #23136, #23162, and #23167, #23179, #23190, #23194 this adds initial support for rendering some CFF-based Type0 fonts :^) There's a long list of things that still need improving after this: * A small number of CFF programs contain the charstring command 0, which is invalid. Currently, this makes us reject the whole font. * Type1FontProgram::rasterize_glyph() is name-based. For CID-based fonts, we want a version that takes CIDs (character IDs) instead. For now, I'm printing the CID to a string and using that, yuck. (I looked into doing this nicely. I do want to do that, but I need to read up on how the `seac` type1 charstring command uses character names to identify parts of an accented character. Also, it looks like `seac`'s accented character handling moved over to `endchar` in type2 charstring commands (i.e. in CFF data), and it looks like we don't implement that at all. So I need to do more reading first, and I didn't want to block this on that.) * The name for the first string in name-based CFF fonts looks wrong; added a FIXME for that for now. * This supports the named Identity-H cmap only for now. Identity-H maps UTF16-BE values to glyph IDs with the idenity function, and assumes it's horizontal text. Other named cmaps in my test files are UniJIS-UCS2-H, UniCNS-UCS2-H, Identity-V, UniGB-UCS2-H, UniKS-UCS2-H. (There are also 2 files using the stream-based cmaps instead of the name-based ones.) * In particular, we can't draw vertical text (`-V`) yet * Passing in the encoding to CFF::create() is awkward (it's nullptr for CID-keyed fonts), and it's also not necessary since `Type1Font::draw_glyph()` already does the "take encoding from PDF, and only from font if the PDF doesn't store one" dance. * This doesn't cache glyphs but re-rasterizes them each time. Easy to add, but maybe I want to look at rotation first. And things don't feel glacial as-is. * Type0Font::draw_glyph() is pretty similar to second half of Type1Font::draw_glyph()	2024-02-16 12:41:10 -05:00
Nico Weber	c9d48bbca4	LibPDF/CFF: Add a comment to CFF::parse_charset()	2024-02-16 12:41:10 -05:00
Nico Weber	5c8778a161	LibPDF/CFF: Compute per-glyph glyph width in CID-keyed fonts Make TopDict's defaultWidthX and nominalWidthX Optional<>s so that we can check if they're set per fdselect-selected font dict, and if so use the value from there in CID-keyed fonts. Otherwise, keep using the value in the top dict.	2024-02-16 12:41:10 -05:00

1 2 3 4 5 ...

653 commits