serenity

mirror of https://github.com/RGBCube/serenity synced 2025-05-31 03:48:13 +00:00

Author	SHA1	Message	Date
Nico Weber	4cc24548f6	LibPDF: Call dbgln() for unimplemented flex upcodes	2023-10-28 13:28:05 -04:00
Nico Weber	e484fae8e1	LibPDF: Don't do special subr processing for type 2 CFFs This is a subset of #21484: Type 2 CFFs never use the special subrs, so stop doing them for type 2 at least for now. Fixes an assert in 0000064.pdf in 0000.zip in the pdfa dataset (a stack underflow because a subr is supposed to push a bunch of stuff, but instead it ran one of the built-in routines instead of the subr from the font file). As discussed in #21484, this isn't right for type 1 CFFs either, but just removing the code there regresses Tests/LibPDF/type1.pdf. A slightly more involved thing is needed there; I added a FIXME for that here.	2023-10-28 13:28:05 -04:00
Tim Ledbetter	b4296e1c9b	LibPDF: Don't use unsanitized values in error messages Previously, constructing error messages with unsanitized input could fail because error message strings must be UTF-8.	2023-10-26 11:05:32 +02:00
Nico Weber	5dd7639386	LibPDF: Tolerate indirect references in Type0 /W array Makes e.g. 0000236.pdf in 0000.zip in the pdfa dataset work.	2023-10-26 10:58:45 +02:00
Nico Weber	b928fadba7	LibPDF: Swap int and array branches in outline item reading No intended behavior change. It does have the effect that indirect object references now go down the array path instead of the number path. They still fall over there, but now that's easy to fix.	2023-10-26 10:58:45 +02:00
Nico Weber	11bee7a075	LibPDF: Don't crash on fixed-width type 1 fonts that use /MissingWidth Type 1 fonts usually have a m_font_program and no m_font -- they only have m_font if we're using a replacement font for the fonts that were built-in to PDFs before Acrobat 4.0 (and must still work to show existing files). However, SimpleFont::get_glyph_width() used to always return a float, which in Type1Font was only implemented if m_font was set. Per spec, we're supposed to just use /MissingWidth for fonts that are missing an entry in the descriptor's /Width array. However, for built-in fonts, no explicit /Width array is needed (PDF 1.7 spec, Appendix H.3, 5.5.1). So if we just always use /MissingWidth, then PDFs that use a built-in font draw all their text on top of each other (e.g. 000333.pdf from stillhq.com-pdfdb). So change get_glyph_width() to return Optional<float>, return it only in Type1Font if m_font is set, and use MissingWidth if it isn't set. That way, replacement fonts still return a width, and real fonts that are supposed to have /Width and use /MissingWidth for missing entries do what they're supposed to too, instead of crashing. From 20 (6%) to 16 (5%) crashes on the 300 first PDFs, and from 39 (7.8%) to 31 (6.2%) on the 500-random PDFs test.	2023-10-23 09:33:03 -04:00
Nico Weber	52afa936c4	LibPDF: Don't over-read in charset formats 1 and 2 `left` might be a number bigger than there are actually glyphs in the CFF. The spec says "The number of ranges is not explicitly specified in the font. Instead, software utilizing this data simply processes ranges until all glyphs in the font are covered." Apparently we have to check for this within each range as well. Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the pdfa dataset. Together with the previous commit: From 21 (7%) to 20 (6%) crashes on the 300 first PDFs, and from 41 (8.2%) to 39 (7.8%) on the 500-random PDFs test.	2023-10-23 09:31:11 -04:00
Nico Weber	58ff7b5336	LibPDF: Support offset size 3 in CFF index reading ...and replace template instantiations with a loop, to make this easily possible. Vaguely nice for code size as well. Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the pdfa dataset.	2023-10-23 09:31:11 -04:00
Nico Weber	3197f0cab6	LibPDF: Handle CFF fonts with charset format 0 and > 255 glyphs better We used to use an u8 as loop counter, which would overflow if there were more than 255 glyphs, producing hundreds of megabytes of Couldn't find string for SID x, going with space output in the process, while all data until the end of the CFF section got interpreted as SIDs, until a try_read() would finally fail. We now no longer fail miserably trying to render page 2 of 0000352.pdf of 0000.zip from the pdfa dataset. Fixes just one crash of the larger 500-document test set, but when I tweak test_pdf.py to print all stacks instead of just the top 5, it no longer produces 260 MB of output.	2023-10-23 09:31:11 -04:00
Nico Weber	0869ca5615	LibPDF: Add more CFF_DEBUG output	2023-10-23 09:31:11 -04:00
Nico Weber	04aec4a032	LibPDF: Don't log CFF Copyright tag as unknown	2023-10-21 21:04:02 +02:00
Nico Weber	095a2a17ed	LibPDF: Replace TODO()s in Type0Font code with Errors ...which causes us to not render these fonts instead of crashing. Reduces number of crashes on 300 random PDFs from the web (the first 300 from 0000.zip from https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/) from 64 (21%) to 42 (14%).	2023-10-20 10:33:59 -06:00
Nico Weber	ebba24b848	LibPDF: Fix lookup of built-in Bold Italic strings Liberation*-BoldItalic.ttf apparently self-identifies as "Bold Italic", not "BoldItalic".	2023-10-19 16:52:49 -04:00
Nico Weber	3907374621	LibPDF: Implement support for callgsubr in CFF font programs Font programs are bytecode programs defining glyphs. If several glyphs share a piece of outline, that opcode sequence can be put in a subroutine ("subr") table and the definition of those glyphs can then call that subroutine by number, to reduce file size. CFF fonts can in theory contain multiple fonts, and so there's a global subr table shared by all the fonts in one CFF, and a local per-fornt subr table. We used to only implement the local subr table, now we implement both. (We only support one font per CFF, and at least in PDF files, that's all that's ever used. So a global subr table isn't very useful. But the spec explicitly allows it -- "Global subroutines may be used in a FontSet even if it only contains one font." -- and it happens in practice.)	2023-10-18 10:50:32 -04:00
Nico Weber	185573c03f	LibPDF: Implement subr_number biasing for CFF font programs	2023-10-18 10:50:32 -04:00
Nico Weber	4dc4de052a	LibPDF: Implement opcode 28 for CFF font programs	2023-10-18 10:50:32 -04:00
Nico Weber	44efff81b9	LibPDF: Remove a dbgln() call in CFF subrs decoding This code is a lot more reliable now than it used to be, and this dbgln() is quite noisy for some files. So let's remove it.	2023-10-18 10:43:51 -04:00
Nico Weber	46fd6fdfa3	LibPDF: Read Global subr data in CFF reader This was the last piece of data we didn't read yet. (We also don't yet support multiple fonts per CFF, but I haven't found a PDF using that yet.) We still don't do anything with it, but now we at least print a warning if this data is there and we ignore it.	2023-10-18 11:02:10 +02:00
Nico Weber	3be5719987	LibPDF: Rename `subroutines` to `local_subroutines` in CFF code	2023-10-18 11:02:10 +02:00
Nico Weber	9a0b559932	LibPDF: Tweak formatting of built-in CFF tables This makes the code look more like the pages in the spec. No behavior change, whitespace change only.	2023-10-18 11:00:17 +02:00
Nico Weber	f0e7fb7038	LibPDF: Make Subrs optional in PS1FontProgram https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf : "Using charstring subroutines is not a requirement of a Type 1 font program." And some versions of Computer Modern do in fact not contain a Subrs array. Together with #21473, makes Problemset.pdf from the pdffiles repro render ok instead of crashing.	2023-10-18 11:00:02 +02:00
Nico Weber	cb961101c7	LibPDF: Implement CFF built-in Standard and Expert encodings With this, all tables from the spec appendixes are in CFF.cpp. This fixes a crash reading page 2 (and onward) of 2ThestructureoftheCIE1997ColourAppearanceModelCIECAM97s.pdf in the pdffiles repo.	2023-10-17 10:21:38 +02:00
Nico Weber	eeada4678c	LibPDF: Postpone CFF encoding processing after Top DICT has been read The encoding offset defaults to 0, i.e. the Standard Encoding. That means reading the encoding only if the tag is present causes us to not read it if a font uses the Standard Encoding. Now, we always read an encoding, even if it's the (implicit) default one.	2023-10-17 10:21:38 +02:00
Nico Weber	1cfe639b6c	LibPDF: Implement CFF supplemental encoding The main encoding data maps glyph ID ("GID") to its codepoint. If a glyph has several codepoints, then a secondary table mapping codepoint to string ID ("SID") of the glyph's name is present. (A separate table associates each glyph with its name already.) I haven't seen this used in the wild, but the structure of the supplemental data is also going to be needed for built-in encodings.	2023-10-17 10:21:38 +02:00
Nico Weber	37daeae6fd	LibPDF: Add spec comments, dbgln_if()s to CFF's parse_encoding()	2023-10-17 10:21:38 +02:00
Nico Weber	007d7cdd53	LibPDF: Fix sign (and fixed point) in glyph decoding opcode 24 Two bugs: 1. We decoded a u32, not an i32 as the spec wants 2. (minor) Our fixed-point divisor was off by one Fixes text rendering in Bakke2010a.pdf in pdffiles, and rendering of other fonts with negative width adjustments from optcode 255. That PDF was produced by "Apple pstopdf" and uses font SFBX1200, which is apparently a variant of Computer Modern. So maybe this helps with lots of PDFs produced from TeX files, but I haven't checked that.	2023-10-16 08:33:35 +02:00
Nico Weber	96a4936567	LibPDF: Checking for built-in CFF encodings Only prints a warning for them for now. Also warn on the not-yet-implemented encoding supplement.	2023-10-16 08:32:18 +02:00
Nico Weber	414a164850	LibPDF: Be louder about unimplemented CFF dict entries	2023-10-16 08:32:18 +02:00
Nico Weber	c825194fb9	LibPDF: Reject CFFs with more than one font The code assumes that there's just one Top DICT, so let's be loud when that isn't the case.	2023-10-16 08:32:18 +02:00
Nico Weber	6f783929dd	LibPDF: Implement support for CFF charset format 2 I haven't seen this being used in the wild (yet), but it's easy to implement, and with this we support all charset formats. So we can now mention if we see a format we don't know about.	2023-10-15 15:27:15 +02:00
Nico Weber	5b915fb15c	LibPDF: Add more spec comments to parse_charset()	2023-10-15 15:27:15 +02:00
Nico Weber	49275c4b17	LibPDF: Don't overflow SIDs in type 1 charset parsing first_sid has type SID (aka u16), so don't store it in an u8. This fixes (among other things) page 24 on the PDF 1.7 spec.	2023-10-15 15:27:15 +02:00
Nico Weber	23d6e9f577	LibPDF: Implement CFF built-in charsets ISOAdobe, Expert, Expert Subset	2023-10-15 09:33:34 +02:00
Nico Weber	8060957d8d	LibPDF: Use Appendix A instead of Appendix C for standard names From "10 String INDEX": "Further space saving is obtained by allocating commonly occurring strings to predefined SIDs. These strings, known as the standard strings, describe all the names used in the ISOAdobe and Expert character sets along with a few other strings common to Type 1 fonts. A complete list of standard strings is given in Appendix A. The client program will contain an array of standard strings with nStoStrings elements. Thus, the standard strings take SIDs in the range 0 to (nStaStrings-1)." And "13 Charsets" says that charsets store SIDs. Fixes all "Couldn't find string for SID $n, going with space" messages when going through the encoding pages (page 1010 and thereabouts) in the PDF 1.7 spec.	2023-10-15 09:33:34 +02:00
Nico Weber	aba787a441	LibPDF: Implement reading of CFF String Index Only really useful for reading SIDs in the Top DICT (copyright text etc), which we currently don't do. I haven't seen a difference from looking things up in the string table. The only real effect from the commit that I need is that it pulls a local resolve() labmda into a real function resolve_sid(), which I want to call in a future commit. But it makes things more spec-compliant, and if we ever want to read SIDs in metadata in the future, now we can.	2023-10-15 09:33:34 +02:00
Nico Weber	3c49d0dad3	LibPDF: Add a CFF_DEBUG toggle I'd like to put some debug prints behind this soon. No behavior change.	2023-10-15 07:14:29 +02:00
Nico Weber	2249e79630	LibPDF: Add two FIXMEs	2023-10-13 07:53:27 +02:00
Nico Weber	d451197d3d	LibPDF: Add spec comments to CFF	2023-10-13 07:53:27 +02:00
Nico Weber	349996f7f2	LibPDF: Don't crash on files with float CFF defaultWidthX We'd unconditionally get the int from a Variant<int, float> here, but PDFs often have a float for defaultWidthX and nominalWidthX. Fixes crash opening Bakke2010a.pdf from pdffiles (but while the file loads ok, it looks completely busted).	2023-10-12 19:43:57 +02:00
Andreas Kling	13db3c5ce0	LibGfx: Convert FontDatabase APIs to use FlyString	2023-09-06 11:29:03 -04:00
Nico Weber	934340d845	LibPDF: Add FIXME for CIDFontType2 creation Move some code only needed for CIDFontType2 creation into a new function and add a FIXME describing what needs to happen there.	2023-08-14 16:26:09 +02:00
Nico Weber	1c263eee61	LibPDF: Add spec comments and FIXMEs to Type0Font::draw_string()	2023-08-14 16:26:09 +02:00
Nico Weber	715b6f868f	LibPDF: Sketch out Type0 font support some more Type0 fonts can be either CFF-based or TrueType-based. Create a subclass for each, put in some spec text, and give each case a dedicated error code, so that `--debugging-stats` can tell me which branch is more common.	2023-07-25 12:10:36 +02:00
Nico Weber	e3cc05b935	LibPDF: Don't ignore word_spacing	2023-07-22 12:24:29 -04:00
Nico Weber	9283c939bb	LibPDF: Include `width` in Type1Font glyph cache key LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and m_y_scale are immutable once the class is created, so it can get away with not doing it. In Type1Font, `width` changes in different calls to Type1Font::draw_glyph(), so we need to make it part of the cache key. Fixes rendering of the word "Version" on the first page of pdf_reference_1-7.pdf.	2023-07-21 07:01:09 +02:00
Matthew Olsson	5f8fd47214	LibPDF: Resize fonts when the text and line matrices change	2023-07-20 06:56:41 +01:00
Nico Weber	117a5f1bd2	LibPDF: Remove an unused variable	2023-07-12 19:02:56 +02:00
MacDue	e1cf868e6e	LibGfx: Use AntiAliasingPainter::fill_path() for drawing font glyphs Using the general AA painter fill_path() is indistinguishable from the previous rasterizer, so this switch simply allows us to share more code.	2023-07-10 20:56:25 +02:00
Timothy Flynn	c911781c21	Everywhere: Remove needless trailing semi-colons after functions This is a new option in clang-format-16.	2023-07-08 10:32:56 +01:00
Nico Weber	f56b897622	Everywhere: Fix a few typos Some even user-visible!	2023-04-12 19:37:35 +02:00

1 2 3

107 commits