serenity

mirror of https://github.com/RGBCube/serenity synced 2025-05-31 06:58:11 +00:00

Author	SHA1	Message	Date
Nico Weber	eb4632e08a	LibPDF: Give CFF built-in encoding and charset arrays an underlying type These arrays store SIDs ("String IDs"), so give them that type now that we have to_array() and it's easy to do. No behavior change.	2024-02-14 06:56:43 +01:00
Nico Weber	ddbcd901d1	LibPDF: Separate Type0 CMap errors No behavior change, just more granular "not implemented" diagnostics.	2024-02-13 19:46:31 +01:00
Nico Weber	8e50bbc9fb	LibPDF: Add string drawing code for Type0Fonts This is very similar to SimpleFont::draw_string() for now, but it'll become a bit different when we add support for vertical text. CIDFontType now only needs to draw single glyphs. Neither of the subclasses can do that yet, so no behavior change yet.	2024-02-13 19:46:18 +01:00
Nico Weber	751185cb76	LibPDF: Scale default glyph width by font size and x scale This fixes rendering of commas in 0000941.pdf page 1. The commas use the default width, and without this they show up very large, covering the page. Also, it's nice that the code now looks like the regular case 4 lines further up.	2024-02-12 14:32:04 +00:00
Nico Weber	7ab4e53b99	LibPDF/CFF: Add code for fdselect parsing This is one of the two top dict entries we need for CID-keyed fonts. We don't send any CID-keyed font data into the CFF parser yet, so no behavior change.	2024-02-12 14:05:16 +01:00
Nico Weber	6ebddab448	LibPDF/CFF: Add enum values for CID-keyed font top dict entries No behavior change.	2024-02-12 14:05:16 +01:00
Nico Weber	6df0150671	LibPDF: Add some CIDFontType0C scaffolding No real behavior change. We don't actually load the CFF data yet (blocked on #23136 and some more), and we don't have drawing code yet, and Type0Font::draw_string() doesn't do any drawing yet. But it's a step in the right direction.	2024-02-12 13:59:00 +01:00
Nico Weber	8e7cb11856	LibPDF/CFF: Add enum values for remaining PrivDictOperators No behavior change, except that we now dbgln() if we see a PrivDictOperator we don't know about. (I haven't seen this in practice, but I found this useful while debugging things.)	2024-02-11 14:52:54 +01:00
Nico Weber	a91fecb17e	Revert "LibPDF: Don't over-read in charset formats 1 and 2" This reverts commit `52afa936c4`. No longer necessary after #23122 -- turns out things work better when you do them right. No behavior change.	2024-02-09 16:52:01 +00:00
Nico Weber	9bccb8c8d7	LibPDF: Make CFF::parse_charset() return SIDs ...and do string expansion at the call site. CID-keyed fonts treat the charset as CIDs instead of as SIDs, so having access to the SIDs in numberic form will be useful when we implement support for CID-keyed CFF fonts. No behavior change.	2024-02-09 13:57:23 +01:00
Nico Weber	9750261921	LibPDF: Rename charset to charset_names in CFF parser No behavior change.	2024-02-09 13:57:23 +01:00
Nico Weber	32f601f9a4	LibPDF: Fix small bug from #21452 I implemented CFF charset format 2 in `6f783929dd` with the note "I haven't seen this being used in the wild". Now that I have seen it (0000658.pdf), I can say that this has never worked, despite me claiming "it's easy to implement". But now it works!	2024-02-08 13:48:56 +00:00
Nico Weber	384c6cf0f9	LibPDF: Tweak vertical position of truetype fonts again See #22821 for a previous attempt. This attempt should settle things once and for all. The opentype render path adjusts by `-font_ascender * -y_scale` in Glyf::Glyph::append_simple_path(), so that's what we need to undo to draw at the font's baseline. (OpenType::Font::metrics() returns ascender scaled by y_scale already, so no need to have the scale here where we undo the shift.) Previously, we called `baseline()` which just returns the font's font size, which is pretty meaningless: https://tonsky.me/blog/font-size/ https://simoncozens.github.io/fonts-and-layout/opentype.html#vertical-metrics-hhea-and-os2 Also, conceptually it makes sense to translate up by the ascender to get from the upper edge of the glyph to the baseline.	2024-02-01 10:05:40 +01:00
Nico Weber	d2f3288666	LibPDF: Apply text matrix to each glyph's position We still don't apply it to the glyph itself, so they don't show up scaled or rotated, but they're at the right spot now. One big thing this here hsa going for it is that the final glyph position is now calculated with just `ext_rendering_matrix.map(glyph_position)`. Also, character_spacing and word_spacing are now used unmodified in the SimpleFont::draw_string() loop. This also means we no longer have to undo a scale when updating the position in `Renderer::show_text()`. Most of the rest stays pretty yucky though. The root cause of many problems is that ScaledFont has its rendering sized baked into the object. We want to render fonts at size font_size times scale from text matrix times scale from current transformation matrix (but not size from hotizontal_scaling). So we have to make that the font_size, but then we have to undo that in a bunch of places to get the actualy font size. This will eventually get better when LibPDF moves off ScaledFont.	2024-01-18 14:01:30 +01:00
Nico Weber	f54b0e7c22	LibPDF: Don't accidentally put horizontal_scaling in places Fonts should have size font_size times total scaling. We tried to get that by computing text_rendering_matrix.x_scale() * font_size, but text_rendering_matrix.x_scale() also includes horizontal_scaling, which shouldn't be part of font size. Same for character_spacing and word_spacing. This is all a big mess that's caused by LibPDF using ScaledFont, which requires scaling to be aprt of the text type. I have an in-progress local branch that moves LibPDF to directly use VectorFont, which will hopefully make this (and other things) nicer. But first, let's get this right, and then make sure we don't regress it when things change :^)	2024-01-18 14:01:30 +01:00
Nico Weber	13f007aadb	LibPDF: Tweak vertical position of truetype fonts The vertical coordinates for truetype fonts are different somehow. We compensated a bit for that; now we compensate some more. This is still not 100% perfect, but much better than before.	2024-01-17 08:44:07 +00:00
Shannon Booth	e2e7c4d574	Everywhere: Use to_number<T> instead of to_{int,uint,float,double} In a bunch of cases, this actually ends up simplifying the code as to_number will handle something such as: ``` Optional<I> opt; if constexpr (IsSigned<I>) opt = view.to_int<I>(); else opt = view.to_uint<I>(); ``` For us. The main goal here however is to have a single generic number conversion API between all of the String classes.	2023-12-23 20:41:07 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Nico Weber	f2f07c3a80	LibPDF: Replace `if (a) VERIFY(0)` with `VERIFY(!a)` No behavior change.	2023-12-16 12:39:56 +01:00
Nico Weber	ee74bc2538	LibPDF: Tolerate 0-sized Subrs in PS1 font subprograms This regressed in `2b3a41be74` in #18031. Fixes a crash rendering page 2 and onward of https://pyx-project.org/presentation_dantemv35_en.pdf	2023-12-16 12:39:56 +01:00
Kyle Pereira	082a4197b6	LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces This is in anticipation of Pattern color space support which does not yield a simple color.	2023-12-10 16:44:24 +01:00
Nico Weber	29396415d5	LibPDF: Add an initial implementation of type 3 glyph rendering This is a very inefficient implementation: Every time a type 3 font glyph is drawn, we parse its operator stream and execute all the operators therein. We'll want to instead cache the glyphs in bitmaps (at least in most cases), like we do for other fonts. But it's a good first step, and all the coordinate math seems to work in the files I've tested. Good test files from pdfa dataset 0000.zip: - 0000559.pdf page 1 (and 2): Has a non-default font matrix; text appears mirrored if the font matrix isn't handled correctly - 0000425.pdf, page 1: Draws several glyphs in a single run; glyphs overlap if Renderer::render_type3_glyph() ignores the passed-in point - 0000211.pdf, any page: Uses type 3 glyphs for all text. Good perf test (already "reasonably fast") - 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the purple box is a type 3 font glyph, and it's colored (which in part means the first operator is `d0`, while all the other documents above use `d1`)	2023-11-17 19:47:53 +00:00
Nico Weber	14ddab5519	LibPDF: Stub out type3_font_set_glyph_width* Type 3 font glyphs begin with either `d0` or `d1`. If we bail out with an "unsupported" error on the very first operator in a glyph, we'll never paint the glyph. Just stub these out for now. We probably want to do more in here in the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).	2023-11-17 19:47:53 +00:00
Nico Weber	126a0be595	LibPDF: Pass Renderer to SimpleFont::draw_glyph() This makes it available in Type3Font::draw_glyph(). No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	bcc6439b5f	LibPDF: Pass Renderer to PDFFont::draw_string() It's a bit unfortunate that fonts need to know about the renderer, but type 3 fonts contain PDF drawing operators, so it's necessary. On the bright side, it makes it possible to pass fewer parameters around and compute things locally as needed. (As we implement more fonts, we'll probably want to create some functions to do these computations in a central place, eventually.) No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	e0c0864ddf	LibPDF: Load a few values off a type 3 font dictionary	2023-11-17 19:47:53 +00:00
Nico Weber	9632d8ee49	LibPDF: Make SimpleFont font matrix configurable Type 3 fonts can set it to a custom value.	2023-11-17 19:47:53 +00:00
Nico Weber	4cd1a2d319	LibPDF: Add some scaffolding for type 3 fonts	2023-11-17 19:47:53 +00:00
Nico Weber	7f999b1ff5	LibPDF: Sink m_base_font_name from PDFFont into subclasses /BaseFont is a required key for type 0, type 1, and truetype font dictionaries, but not for type 3 font dictionaries. This is mechanical; type 0 fonts don't even use this yet (but probably should). PDFFont::initialize() is now empty and could be removed, but maybe we'll put stuff there again later, so I'm leaving it around for a bit longer.	2023-11-17 19:47:53 +00:00
Nico Weber	6c1da5db54	LibPDF: Make SimpleFont::draw_glyph() fallible	2023-11-17 19:47:53 +00:00
Nico Weber	843e9daa8c	LibPDF: Remove unused PDFFont::type() This got added in #15270, but its one use then got removed again in #16150. No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	26fd29baf8	LibPDF: Give Type3 fonts a dedicated error message They're described in "5.5.4 Type 3 Fonts" in the PDF 1.7 spec, so we shouldn't `internal_error()` on them. They're just not implemented yet.	2023-11-17 19:47:53 +00:00
Nico Weber	1c2b0feb7b	LibPDF: Change how CFF optional width prefix is stored Per 5177.Type2.pdf 3.1 "Type 2 Charstring Organization", a glyph's charstring looks like: w? {hs* vs* cm* hm* mt subpath}? {mt subpath}* endchar The `w?` is the width of the glyph, but it's optional. So all possible commands after it (hstem* vstem* cntrmask hintmask moveto endchar) check if there's an extra number at the start and interpret it as a width, for the very first command we read. This was done by having an `is_first_command` local bool that got set to false after the first command. That didn't work with subrs: If the first command was a call to a subr that just pushed a bunch of numbers, then the second command after it is the actual first command. Instead, move that bool into the state. Set it to false the first time we try to read a width, since that means we just read a command that could've been prefixed by a width.	2023-11-14 10:10:34 +01:00
Tim Schumacher	a2f60911fe	AK: Rename GenericTraits to DefaultTraits This feels like a more fitting name for something that provides the default values for Traits.	2023-11-09 10:05:51 -05:00
Nico Weber	d24289eef4	LibPDF: Always log unhandled type 1 and type 2 font program opcodes This would've made it easy to see that we were missing flex opcodes for https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf	2023-11-01 11:40:16 -04:00
Nico Weber	e1a743f286	LibPDF: Implement type 2 flex, hflex, hflex1, flex1 operators This is the type 2 equivalent to type2 othersubr, from what I can tell. See "4.1 Path Construction Operators" in 5177.Type2.pdf, "The Type 2 Charstring Format". Makes text show up alright on https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf	2023-11-01 11:40:16 -04:00
Nico Weber	3e707efdfa	LibPDF: Move type1 subr 0 handling into othersubr handler https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf, 8.4 First Four Subrs Entries: """If Flex or hint replacement is used in a Type 1 font program, the first four entries in the Subrs array in the Private dictionary must be assigned charstrings that correspond to the following code sequences. If neither Flex nor hint replacement is used in the font program, then this requirement is removed, and the first Subrs entry may be a normal charstring subroutine sequence. The first four Subrs entries contain: Subrs entry number 0: 3 0 callothersubr pop pop setcurrentpoint return """ othersubr handler 0 gets three arguments: * The flex height (the distance after which the bezier splines are replaced with just straight lines) * The current position after the flex It pushes that position on the postscript stack, where predefined subr handler number 0 then pops it from. It then passes it to setcurrentpoint. In theory, we now correctly do that setcurrentpoint call, which we previously weren't. In practice, that setcurrentpoint call always receives the last point of the flex -- and our path api apparently gets confused when move_to() is called on it when the current point is already at that same location. So tweak the SetCurrentPoint handler to not set the current point on the path if it's already the path's current point, with a FIXME to figure out what exactly is happening in Gfx::Path. No big behavior change if flex is used, but this is more correct if it isn't. (This only works because our `return` handler is empty, else we would have to make the callothersubr handler start a call frame.)	2023-11-01 11:38:41 -04:00
Nico Weber	0bb8249780	LibPDF: Move type1 subr 1 and 2 handling into othersubr handler https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf, 8.4 First Four Subrs Entries: """If Flex or hint replacement is used in a Type 1 font program, the first four entries in the Subrs array in the Private dictionary must be assigned charstrings that correspond to the following code sequences. If neither Flex nor hint replacement is used in the font program, then this requirement is removed, and the first Subrs entry may be a normal charstring subroutine sequence. The first four Subrs entries contain: [...] Subrs entry number 1: 0 1 callothersubr return Subrs entry number 2: 0 2 callothersubr return """ So subr entry numbers 1 and 2 just call othersubr 1 and and 2, which means we can just move the handling code over. No behavior change if flex is used, but more correct if it isn't. (This only works because our `return` handler is empty, else we would have to make the callothersubr handler start a call frame.)	2023-11-01 11:38:41 -04:00
Nico Weber	4cc24548f6	LibPDF: Call dbgln() for unimplemented flex upcodes	2023-10-28 13:28:05 -04:00
Nico Weber	e484fae8e1	LibPDF: Don't do special subr processing for type 2 CFFs This is a subset of #21484: Type 2 CFFs never use the special subrs, so stop doing them for type 2 at least for now. Fixes an assert in 0000064.pdf in 0000.zip in the pdfa dataset (a stack underflow because a subr is supposed to push a bunch of stuff, but instead it ran one of the built-in routines instead of the subr from the font file). As discussed in #21484, this isn't right for type 1 CFFs either, but just removing the code there regresses Tests/LibPDF/type1.pdf. A slightly more involved thing is needed there; I added a FIXME for that here.	2023-10-28 13:28:05 -04:00
Tim Ledbetter	b4296e1c9b	LibPDF: Don't use unsanitized values in error messages Previously, constructing error messages with unsanitized input could fail because error message strings must be UTF-8.	2023-10-26 11:05:32 +02:00
Nico Weber	5dd7639386	LibPDF: Tolerate indirect references in Type0 /W array Makes e.g. 0000236.pdf in 0000.zip in the pdfa dataset work.	2023-10-26 10:58:45 +02:00
Nico Weber	b928fadba7	LibPDF: Swap int and array branches in outline item reading No intended behavior change. It does have the effect that indirect object references now go down the array path instead of the number path. They still fall over there, but now that's easy to fix.	2023-10-26 10:58:45 +02:00
Nico Weber	11bee7a075	LibPDF: Don't crash on fixed-width type 1 fonts that use /MissingWidth Type 1 fonts usually have a m_font_program and no m_font -- they only have m_font if we're using a replacement font for the fonts that were built-in to PDFs before Acrobat 4.0 (and must still work to show existing files). However, SimpleFont::get_glyph_width() used to always return a float, which in Type1Font was only implemented if m_font was set. Per spec, we're supposed to just use /MissingWidth for fonts that are missing an entry in the descriptor's /Width array. However, for built-in fonts, no explicit /Width array is needed (PDF 1.7 spec, Appendix H.3, 5.5.1). So if we just always use /MissingWidth, then PDFs that use a built-in font draw all their text on top of each other (e.g. 000333.pdf from stillhq.com-pdfdb). So change get_glyph_width() to return Optional<float>, return it only in Type1Font if m_font is set, and use MissingWidth if it isn't set. That way, replacement fonts still return a width, and real fonts that are supposed to have /Width and use /MissingWidth for missing entries do what they're supposed to too, instead of crashing. From 20 (6%) to 16 (5%) crashes on the 300 first PDFs, and from 39 (7.8%) to 31 (6.2%) on the 500-random PDFs test.	2023-10-23 09:33:03 -04:00
Nico Weber	52afa936c4	LibPDF: Don't over-read in charset formats 1 and 2 `left` might be a number bigger than there are actually glyphs in the CFF. The spec says "The number of ranges is not explicitly specified in the font. Instead, software utilizing this data simply processes ranges until all glyphs in the font are covered." Apparently we have to check for this within each range as well. Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the pdfa dataset. Together with the previous commit: From 21 (7%) to 20 (6%) crashes on the 300 first PDFs, and from 41 (8.2%) to 39 (7.8%) on the 500-random PDFs test.	2023-10-23 09:31:11 -04:00
Nico Weber	58ff7b5336	LibPDF: Support offset size 3 in CFF index reading ...and replace template instantiations with a loop, to make this easily possible. Vaguely nice for code size as well. Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the pdfa dataset.	2023-10-23 09:31:11 -04:00
Nico Weber	3197f0cab6	LibPDF: Handle CFF fonts with charset format 0 and > 255 glyphs better We used to use an u8 as loop counter, which would overflow if there were more than 255 glyphs, producing hundreds of megabytes of Couldn't find string for SID x, going with space output in the process, while all data until the end of the CFF section got interpreted as SIDs, until a try_read() would finally fail. We now no longer fail miserably trying to render page 2 of 0000352.pdf of 0000.zip from the pdfa dataset. Fixes just one crash of the larger 500-document test set, but when I tweak test_pdf.py to print all stacks instead of just the top 5, it no longer produces 260 MB of output.	2023-10-23 09:31:11 -04:00
Nico Weber	0869ca5615	LibPDF: Add more CFF_DEBUG output	2023-10-23 09:31:11 -04:00
Nico Weber	04aec4a032	LibPDF: Don't log CFF Copyright tag as unknown	2023-10-21 21:04:02 +02:00
Nico Weber	095a2a17ed	LibPDF: Replace TODO()s in Type0Font code with Errors ...which causes us to not render these fonts instead of crashing. Reduces number of crashes on 300 random PDFs from the web (the first 300 from 0000.zip from https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/) from 64 (21%) to 42 (14%).	2023-10-20 10:33:59 -06:00

1 2 3

145 commits