serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-21 04:42:06 +00:00

Author	SHA1	Message	Date
Nico Weber	04aec4a032	LibPDF: Don't log CFF Copyright tag as unknown	2023-10-21 21:04:02 +02:00
Nico Weber	095a2a17ed	LibPDF: Replace TODO()s in Type0Font code with Errors ...which causes us to not render these fonts instead of crashing. Reduces number of crashes on 300 random PDFs from the web (the first 300 from 0000.zip from https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/) from 64 (21%) to 42 (14%).	2023-10-20 10:33:59 -06:00
Nico Weber	ebba24b848	LibPDF: Fix lookup of built-in Bold Italic strings Liberation*-BoldItalic.ttf apparently self-identifies as "Bold Italic", not "BoldItalic".	2023-10-19 16:52:49 -04:00
Nico Weber	3907374621	LibPDF: Implement support for callgsubr in CFF font programs Font programs are bytecode programs defining glyphs. If several glyphs share a piece of outline, that opcode sequence can be put in a subroutine ("subr") table and the definition of those glyphs can then call that subroutine by number, to reduce file size. CFF fonts can in theory contain multiple fonts, and so there's a global subr table shared by all the fonts in one CFF, and a local per-fornt subr table. We used to only implement the local subr table, now we implement both. (We only support one font per CFF, and at least in PDF files, that's all that's ever used. So a global subr table isn't very useful. But the spec explicitly allows it -- "Global subroutines may be used in a FontSet even if it only contains one font." -- and it happens in practice.)	2023-10-18 10:50:32 -04:00
Nico Weber	185573c03f	LibPDF: Implement subr_number biasing for CFF font programs	2023-10-18 10:50:32 -04:00
Nico Weber	4dc4de052a	LibPDF: Implement opcode 28 for CFF font programs	2023-10-18 10:50:32 -04:00
Nico Weber	44efff81b9	LibPDF: Remove a dbgln() call in CFF subrs decoding This code is a lot more reliable now than it used to be, and this dbgln() is quite noisy for some files. So let's remove it.	2023-10-18 10:43:51 -04:00
Nico Weber	46fd6fdfa3	LibPDF: Read Global subr data in CFF reader This was the last piece of data we didn't read yet. (We also don't yet support multiple fonts per CFF, but I haven't found a PDF using that yet.) We still don't do anything with it, but now we at least print a warning if this data is there and we ignore it.	2023-10-18 11:02:10 +02:00
Nico Weber	3be5719987	LibPDF: Rename `subroutines` to `local_subroutines` in CFF code	2023-10-18 11:02:10 +02:00
Nico Weber	9a0b559932	LibPDF: Tweak formatting of built-in CFF tables This makes the code look more like the pages in the spec. No behavior change, whitespace change only.	2023-10-18 11:00:17 +02:00
Nico Weber	f0e7fb7038	LibPDF: Make Subrs optional in PS1FontProgram https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf : "Using charstring subroutines is not a requirement of a Type 1 font program." And some versions of Computer Modern do in fact not contain a Subrs array. Together with #21473, makes Problemset.pdf from the pdffiles repro render ok instead of crashing.	2023-10-18 11:00:02 +02:00
Nico Weber	cb961101c7	LibPDF: Implement CFF built-in Standard and Expert encodings With this, all tables from the spec appendixes are in CFF.cpp. This fixes a crash reading page 2 (and onward) of 2ThestructureoftheCIE1997ColourAppearanceModelCIECAM97s.pdf in the pdffiles repo.	2023-10-17 10:21:38 +02:00
Nico Weber	eeada4678c	LibPDF: Postpone CFF encoding processing after Top DICT has been read The encoding offset defaults to 0, i.e. the Standard Encoding. That means reading the encoding only if the tag is present causes us to not read it if a font uses the Standard Encoding. Now, we always read an encoding, even if it's the (implicit) default one.	2023-10-17 10:21:38 +02:00
Nico Weber	1cfe639b6c	LibPDF: Implement CFF supplemental encoding The main encoding data maps glyph ID ("GID") to its codepoint. If a glyph has several codepoints, then a secondary table mapping codepoint to string ID ("SID") of the glyph's name is present. (A separate table associates each glyph with its name already.) I haven't seen this used in the wild, but the structure of the supplemental data is also going to be needed for built-in encodings.	2023-10-17 10:21:38 +02:00
Nico Weber	37daeae6fd	LibPDF: Add spec comments, dbgln_if()s to CFF's parse_encoding()	2023-10-17 10:21:38 +02:00
Nico Weber	007d7cdd53	LibPDF: Fix sign (and fixed point) in glyph decoding opcode 24 Two bugs: 1. We decoded a u32, not an i32 as the spec wants 2. (minor) Our fixed-point divisor was off by one Fixes text rendering in Bakke2010a.pdf in pdffiles, and rendering of other fonts with negative width adjustments from optcode 255. That PDF was produced by "Apple pstopdf" and uses font SFBX1200, which is apparently a variant of Computer Modern. So maybe this helps with lots of PDFs produced from TeX files, but I haven't checked that.	2023-10-16 08:33:35 +02:00
Nico Weber	96a4936567	LibPDF: Checking for built-in CFF encodings Only prints a warning for them for now. Also warn on the not-yet-implemented encoding supplement.	2023-10-16 08:32:18 +02:00
Nico Weber	414a164850	LibPDF: Be louder about unimplemented CFF dict entries	2023-10-16 08:32:18 +02:00
Nico Weber	c825194fb9	LibPDF: Reject CFFs with more than one font The code assumes that there's just one Top DICT, so let's be loud when that isn't the case.	2023-10-16 08:32:18 +02:00
Nico Weber	6f783929dd	LibPDF: Implement support for CFF charset format 2 I haven't seen this being used in the wild (yet), but it's easy to implement, and with this we support all charset formats. So we can now mention if we see a format we don't know about.	2023-10-15 15:27:15 +02:00
Nico Weber	5b915fb15c	LibPDF: Add more spec comments to parse_charset()	2023-10-15 15:27:15 +02:00
Nico Weber	49275c4b17	LibPDF: Don't overflow SIDs in type 1 charset parsing first_sid has type SID (aka u16), so don't store it in an u8. This fixes (among other things) page 24 on the PDF 1.7 spec.	2023-10-15 15:27:15 +02:00
Nico Weber	23d6e9f577	LibPDF: Implement CFF built-in charsets ISOAdobe, Expert, Expert Subset	2023-10-15 09:33:34 +02:00
Nico Weber	8060957d8d	LibPDF: Use Appendix A instead of Appendix C for standard names From "10 String INDEX": "Further space saving is obtained by allocating commonly occurring strings to predefined SIDs. These strings, known as the standard strings, describe all the names used in the ISOAdobe and Expert character sets along with a few other strings common to Type 1 fonts. A complete list of standard strings is given in Appendix A. The client program will contain an array of standard strings with nStoStrings elements. Thus, the standard strings take SIDs in the range 0 to (nStaStrings-1)." And "13 Charsets" says that charsets store SIDs. Fixes all "Couldn't find string for SID $n, going with space" messages when going through the encoding pages (page 1010 and thereabouts) in the PDF 1.7 spec.	2023-10-15 09:33:34 +02:00
Nico Weber	aba787a441	LibPDF: Implement reading of CFF String Index Only really useful for reading SIDs in the Top DICT (copyright text etc), which we currently don't do. I haven't seen a difference from looking things up in the string table. The only real effect from the commit that I need is that it pulls a local resolve() labmda into a real function resolve_sid(), which I want to call in a future commit. But it makes things more spec-compliant, and if we ever want to read SIDs in metadata in the future, now we can.	2023-10-15 09:33:34 +02:00
Nico Weber	3c49d0dad3	LibPDF: Add a CFF_DEBUG toggle I'd like to put some debug prints behind this soon. No behavior change.	2023-10-15 07:14:29 +02:00
Nico Weber	2249e79630	LibPDF: Add two FIXMEs	2023-10-13 07:53:27 +02:00
Nico Weber	d451197d3d	LibPDF: Add spec comments to CFF	2023-10-13 07:53:27 +02:00
Nico Weber	349996f7f2	LibPDF: Don't crash on files with float CFF defaultWidthX We'd unconditionally get the int from a Variant<int, float> here, but PDFs often have a float for defaultWidthX and nominalWidthX. Fixes crash opening Bakke2010a.pdf from pdffiles (but while the file loads ok, it looks completely busted).	2023-10-12 19:43:57 +02:00
Andreas Kling	13db3c5ce0	LibGfx: Convert FontDatabase APIs to use FlyString	2023-09-06 11:29:03 -04:00
Nico Weber	934340d845	LibPDF: Add FIXME for CIDFontType2 creation Move some code only needed for CIDFontType2 creation into a new function and add a FIXME describing what needs to happen there.	2023-08-14 16:26:09 +02:00
Nico Weber	1c263eee61	LibPDF: Add spec comments and FIXMEs to Type0Font::draw_string()	2023-08-14 16:26:09 +02:00
Nico Weber	715b6f868f	LibPDF: Sketch out Type0 font support some more Type0 fonts can be either CFF-based or TrueType-based. Create a subclass for each, put in some spec text, and give each case a dedicated error code, so that `--debugging-stats` can tell me which branch is more common.	2023-07-25 12:10:36 +02:00
Nico Weber	e3cc05b935	LibPDF: Don't ignore word_spacing	2023-07-22 12:24:29 -04:00
Nico Weber	9283c939bb	LibPDF: Include `width` in Type1Font glyph cache key LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and m_y_scale are immutable once the class is created, so it can get away with not doing it. In Type1Font, `width` changes in different calls to Type1Font::draw_glyph(), so we need to make it part of the cache key. Fixes rendering of the word "Version" on the first page of pdf_reference_1-7.pdf.	2023-07-21 07:01:09 +02:00
Matthew Olsson	5f8fd47214	LibPDF: Resize fonts when the text and line matrices change	2023-07-20 06:56:41 +01:00
Nico Weber	117a5f1bd2	LibPDF: Remove an unused variable	2023-07-12 19:02:56 +02:00
MacDue	e1cf868e6e	LibGfx: Use AntiAliasingPainter::fill_path() for drawing font glyphs Using the general AA painter fill_path() is indistinguishable from the previous rasterizer, so this switch simply allows us to share more code.	2023-07-10 20:56:25 +02:00
Timothy Flynn	c911781c21	Everywhere: Remove needless trailing semi-colons after functions This is a new option in clang-format-16.	2023-07-08 10:32:56 +01:00
Nico Weber	f56b897622	Everywhere: Fix a few typos Some even user-visible!	2023-04-12 19:37:35 +02:00
Julian Offenhäuser	bdd5f36121	LibPDF: Load replacements for TrueTypeFonts without an embedded font This previously only happened for Type 1 fonts.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	5deac3a7f5	LibPDF: Actually return an error when failing to load replacement fonts	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	fec7ccf020	LibPDF: Ask OpenType font programs for glyph widths if needed If the font dictionary didn't specify custom glyph widths, we would fall back to the specified "missing width" (or 0 in most cases!), which meant that we would draw glyphs on top of each other in a lot of cases, namely for TrueTypeFonts or standard Type1Fonts with an OpenType fallback. What we actually want to do in this case is ask the OpenType font for the correct width.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	2b3a41be74	LibPDF: Remove the subroutine length limit for PS1 font programs A limit of 1024 subroutines seemed like a sensible choice, but some fonts actually do exceed it. We will now only assert that the specified amount is positive.	2023-03-25 16:27:30 -06:00
Julian Offenhäuser	3400779047	LibPDF: Pass the right point width to the font loader in TrueTypeFont	2023-03-22 09:04:00 +01:00
Rodrigo Tobar	4a20751ff6	LibPDF: Detect CFF encodings with supplements These are not yet actually parsed, but detecting them means we at least don't fail to understand the actual format value, which was causing some CFF fonts to fail to load.	2023-03-02 12:18:53 +01:00
Rodrigo Tobar	9bca62c5fa	LibPDF: Increase argument stack for Type1FontPrograms Type1 imposes a stack limit of 24 elements, but Type2 has a limit of 48. We are better off relaxing the limit of the former in favour of properly supporting the latter.	2023-03-02 12:18:53 +01:00
Rodrigo Tobar	de5e7b487c	LibPDF: Improve Type2 hint counting There were two issues with how we counted hints with Type2 CharString commands: the first was that we assumed a single hint per command, even though there are commands that accept multiple hints thanks to taking a variable number of operands; and secondly, the hintmask/ctrlmask commands can also take operands (i.e., hints) themselves in certain situations. This commit fixes these two issues by correctly counting hints in both cases. This in turn fixes cases when there were more than 8 hints in total, therefore a hintmask/ctrlmask command needed to read more than one byte past the operator itself.	2023-03-02 12:18:53 +01:00
Rodrigo Tobar	cb04e4e9da	LibPDF: Refactor Font classes The PDFFont class hierarchy was very simple (a top-level PDFFont class, followed by all the children classes that derived directly from it). While this design was good enough for some things, it didn't correctly model the actual organization of font types: PDF fonts are first divided between "simple" and "composite" fonts. The latter is the Type0 font, while the rest are all simple. * PDF fonts yield a glyph per "character code". Simple fonts char codes are always 1 byte long, while Type0 char codes are of variable size. To this effect, this commit changes the hierarchy of Font classes, introducing a new SimpleFont class, deriving from PDFFont, and acting as the parent of Type1Font and TrueTypeFont, while Type0 still derives from PDFFont directly. This distinction allows us now to: * Model string rendering differently from simple and composite fonts: PDFFont now offers a generic draw_string method that takes a whole string to be rendered instead of a single char code. SimpleFont implements this as a loop over individual bytes of the string, with T1 and TT implementing draw_glyph for drawing a single char code. * Some common fields between T1 and TT fonts now live under SimpleFont instead of under PDFfont, where they previously resided. * Some other interfaces specific to SimpleFont have been cleaned up, with u16/u32 not appearing on these classes (or in PDFFont) anymore. * Type0Font's rendering still remains unimplemented. As part of this exercise I also took the chance to perform the following cleanups and restructurings: * Refactored the creation and initialisation of fonts. They are all centrally created at PDFFont::create, with a virtual "initialize" method that allows them to initialise their inner members in the correct order (parent first, child later) after creation. * Removed duplicated code. * Cleaned up some public interfaces: receive const refs, removed unnecessary ctro/dtors, etc. * Slightly changed how Type1 and TrueType fonts are implemented: if there's an embedded font that takes priority, otherwise we always look for a replacement. * This means we don't do anything special for the standard fonts. The only behavior previously associated to standard fonts was choosing an encoding, and even that was under questioning.	2023-02-24 20:16:50 +01:00
Rodrigo Tobar	c4507bb56e	LibPDF: Add more built-in SIDs The first iteration has enough SIDs to display simple documents, but when trying more and more documents we started to need more of these SIDs to be properly defined. This is a copy/paste exercise from the CFF document, which is tedious, so it will continue in small drops. This commit fills all the gaps until SID 228, which covers all the ISOAdobe space, and should be enough for most use cases. Since this is a continuous space starting at 0, we now use an Array instead of a Map to store these names, which should be more performant. Also to simplify things I've moved the Array out of the CFF class, making it a simpler static variable, which allows us to use template type deduction.	2023-02-13 00:23:17 +00:00

1 2

97 commits