1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-21 04:42:06 +00:00
Commit graph

97 commits

Author SHA1 Message Date
Nico Weber
04aec4a032 LibPDF: Don't log CFF Copyright tag as unknown 2023-10-21 21:04:02 +02:00
Nico Weber
095a2a17ed LibPDF: Replace TODO()s in Type0Font code with Errors
...which causes us to not render these fonts instead of crashing.

Reduces number of crashes on 300 random PDFs from the web (the first 300
from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 64 (21%) to 42 (14%).
2023-10-20 10:33:59 -06:00
Nico Weber
ebba24b848 LibPDF: Fix lookup of built-in Bold Italic strings
Liberation*-BoldItalic.ttf apparently self-identifies as "Bold Italic",
not "BoldItalic".
2023-10-19 16:52:49 -04:00
Nico Weber
3907374621 LibPDF: Implement support for callgsubr in CFF font programs
Font programs are bytecode programs defining glyphs. If several glyphs
share a piece of outline, that opcode sequence can be put in a
subroutine ("subr") table and the definition of those glyphs can then
call that subroutine by number, to reduce file size.

CFF fonts can in theory contain multiple fonts, and so there's a global
subr table shared by all the fonts in one CFF, and a local per-fornt
subr table.  We used to only implement the local subr table, now we
implement both.

(We only support one font per CFF, and at least in PDF files, that's
all that's ever used. So a global subr table isn't very useful.
But the spec explicitly allows it -- "Global subroutines may be used in
a FontSet even if it only contains one font." -- and it happens in
practice.)
2023-10-18 10:50:32 -04:00
Nico Weber
185573c03f LibPDF: Implement subr_number biasing for CFF font programs 2023-10-18 10:50:32 -04:00
Nico Weber
4dc4de052a LibPDF: Implement opcode 28 for CFF font programs 2023-10-18 10:50:32 -04:00
Nico Weber
44efff81b9 LibPDF: Remove a dbgln() call in CFF subrs decoding
This code is a lot more reliable now than it used to be, and this
dbgln() is quite noisy for some files. So let's remove it.
2023-10-18 10:43:51 -04:00
Nico Weber
46fd6fdfa3 LibPDF: Read Global subr data in CFF reader
This was the last piece of data we didn't read yet.
(We also don't yet support multiple fonts per CFF, but I haven't
found a PDF using that yet.)

We still don't do anything with it, but now we at least print a
warning if this data is there and we ignore it.
2023-10-18 11:02:10 +02:00
Nico Weber
3be5719987 LibPDF: Rename subroutines to local_subroutines in CFF code 2023-10-18 11:02:10 +02:00
Nico Weber
9a0b559932 LibPDF: Tweak formatting of built-in CFF tables
This makes the code look more like the pages in the spec.

No behavior change, whitespace change only.
2023-10-18 11:00:17 +02:00
Nico Weber
f0e7fb7038 LibPDF: Make Subrs optional in PS1FontProgram
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf :

"Using charstring subroutines is not a requirement of a Type 1
font program."

And some versions of Computer Modern do in fact not contain a Subrs
array.

Together with #21473, makes Problemset.pdf from the pdffiles repro
render ok instead of crashing.
2023-10-18 11:00:02 +02:00
Nico Weber
cb961101c7 LibPDF: Implement CFF built-in Standard and Expert encodings
With this, all tables from the spec appendixes are in CFF.cpp.

This fixes a crash reading page 2 (and onward) of
2ThestructureoftheCIE1997ColourAppearanceModelCIECAM97s.pdf in
the pdffiles repo.
2023-10-17 10:21:38 +02:00
Nico Weber
eeada4678c LibPDF: Postpone CFF encoding processing after Top DICT has been read
The encoding offset defaults to 0, i.e. the Standard Encoding.
That means reading the encoding only if the tag is present causes
us to not read it if a font uses the Standard Encoding.

Now, we always read an encoding, even if it's the (implicit) default
one.
2023-10-17 10:21:38 +02:00
Nico Weber
1cfe639b6c LibPDF: Implement CFF supplemental encoding
The main encoding data maps glyph ID ("GID") to its codepoint.
If a glyph has several codepoints, then a secondary table mapping
codepoint to string ID ("SID") of the glyph's name is present.

(A separate table associates each glyph with its name already.)

I haven't seen this used in the wild, but the structure of the
supplemental data is also going to be needed for built-in encodings.
2023-10-17 10:21:38 +02:00
Nico Weber
37daeae6fd LibPDF: Add spec comments, dbgln_if()s to CFF's parse_encoding() 2023-10-17 10:21:38 +02:00
Nico Weber
007d7cdd53 LibPDF: Fix sign (and fixed point) in glyph decoding opcode 24
Two bugs:

1. We decoded a u32, not an i32 as the spec wants
2. (minor) Our fixed-point divisor was off by one

Fixes text rendering in Bakke2010a.pdf in pdffiles, and rendering of
other fonts with negative width adjustments from optcode 255.
That PDF was produced by "Apple pstopdf" and uses font SFBX1200,
which is apparently a variant of Computer Modern. So maybe this
helps with lots of PDFs produced from TeX files, but I haven't
checked that.
2023-10-16 08:33:35 +02:00
Nico Weber
96a4936567 LibPDF: Checking for built-in CFF encodings
Only prints a warning for them for now.

Also warn on the not-yet-implemented encoding supplement.
2023-10-16 08:32:18 +02:00
Nico Weber
414a164850 LibPDF: Be louder about unimplemented CFF dict entries 2023-10-16 08:32:18 +02:00
Nico Weber
c825194fb9 LibPDF: Reject CFFs with more than one font
The code assumes that there's just one Top DICT, so let's be loud
when that isn't the case.
2023-10-16 08:32:18 +02:00
Nico Weber
6f783929dd LibPDF: Implement support for CFF charset format 2
I haven't seen this being used in the wild (yet), but it's easy
to implement, and with this we support all charset formats.

So we can now mention if we see a format we don't know about.
2023-10-15 15:27:15 +02:00
Nico Weber
5b915fb15c LibPDF: Add more spec comments to parse_charset() 2023-10-15 15:27:15 +02:00
Nico Weber
49275c4b17 LibPDF: Don't overflow SIDs in type 1 charset parsing
first_sid has type SID (aka u16), so don't store it in an u8.

This fixes (among other things) page 24 on the PDF 1.7 spec.
2023-10-15 15:27:15 +02:00
Nico Weber
23d6e9f577 LibPDF: Implement CFF built-in charsets ISOAdobe, Expert, Expert Subset 2023-10-15 09:33:34 +02:00
Nico Weber
8060957d8d LibPDF: Use Appendix A instead of Appendix C for standard names
From "10 String INDEX":

"Further space saving is obtained by allocating commonly occurring
strings to predefined SIDs. These strings, known as the standard
strings, describe all the names used in the ISOAdobe and Expert
character sets along with a few other strings common to Type 1 fonts. A
complete list of standard strings is given in Appendix A.  The client
program will contain an array of standard strings with nStoStrings
elements. Thus, the standard strings take SIDs in the range 0 to
(nStaStrings-1)."

And "13 Charsets" says that charsets store SIDs.

Fixes all

    "Couldn't find string for SID $n, going with space"

messages when going through the encoding pages (page 1010 and
thereabouts) in the PDF 1.7 spec.
2023-10-15 09:33:34 +02:00
Nico Weber
aba787a441 LibPDF: Implement reading of CFF String Index
Only really useful for reading SIDs in the Top DICT (copyright
text etc), which we currently don't do.

I haven't seen a difference from looking things up in the string
table. The only real effect from the commit that I need is that
it pulls a local resolve() labmda into a real function
resolve_sid(), which I want to call in a future commit.

But it makes things more spec-compliant, and if we ever want to
read SIDs in metadata in the future, now we can.
2023-10-15 09:33:34 +02:00
Nico Weber
3c49d0dad3 LibPDF: Add a CFF_DEBUG toggle
I'd like to put some debug prints behind this soon.

No behavior change.
2023-10-15 07:14:29 +02:00
Nico Weber
2249e79630 LibPDF: Add two FIXMEs 2023-10-13 07:53:27 +02:00
Nico Weber
d451197d3d LibPDF: Add spec comments to CFF 2023-10-13 07:53:27 +02:00
Nico Weber
349996f7f2 LibPDF: Don't crash on files with float CFF defaultWidthX
We'd unconditionally get the int from a Variant<int, float> here,
but PDFs often have a float for defaultWidthX and nominalWidthX.

Fixes crash opening Bakke2010a.pdf from pdffiles (but while the
file loads ok, it looks completely busted).
2023-10-12 19:43:57 +02:00
Andreas Kling
13db3c5ce0 LibGfx: Convert FontDatabase APIs to use FlyString 2023-09-06 11:29:03 -04:00
Nico Weber
934340d845 LibPDF: Add FIXME for CIDFontType2 creation
Move some code only needed for CIDFontType2 creation into a new
function and add a FIXME describing what needs to happen there.
2023-08-14 16:26:09 +02:00
Nico Weber
1c263eee61 LibPDF: Add spec comments and FIXMEs to Type0Font::draw_string() 2023-08-14 16:26:09 +02:00
Nico Weber
715b6f868f LibPDF: Sketch out Type0 font support some more
Type0 fonts can be either CFF-based or TrueType-based.
Create a subclass for each, put in some spec text, and
give each case a dedicated error code, so that `--debugging-stats`
can tell me which branch is more common.
2023-07-25 12:10:36 +02:00
Nico Weber
e3cc05b935 LibPDF: Don't ignore word_spacing 2023-07-22 12:24:29 -04:00
Nico Weber
9283c939bb LibPDF: Include width in Type1Font glyph cache key
LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and
m_y_scale are immutable once the class is created, so it can get away
with not doing it.

In Type1Font, `width` changes in different calls to
Type1Font::draw_glyph(), so we need to make it part of the cache key.

Fixes rendering of the word "Version" on the first page of
pdf_reference_1-7.pdf.
2023-07-21 07:01:09 +02:00
Matthew Olsson
5f8fd47214 LibPDF: Resize fonts when the text and line matrices change 2023-07-20 06:56:41 +01:00
Nico Weber
117a5f1bd2 LibPDF: Remove an unused variable 2023-07-12 19:02:56 +02:00
MacDue
e1cf868e6e LibGfx: Use AntiAliasingPainter::fill_path() for drawing font glyphs
Using the general AA painter fill_path() is indistinguishable from the
previous rasterizer, so this switch simply allows us to share more code.
2023-07-10 20:56:25 +02:00
Timothy Flynn
c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Nico Weber
f56b897622 Everywhere: Fix a few typos
Some even user-visible!
2023-04-12 19:37:35 +02:00
Julian Offenhäuser
bdd5f36121 LibPDF: Load replacements for TrueTypeFonts without an embedded font
This previously only happened for Type 1 fonts.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
5deac3a7f5 LibPDF: Actually return an error when failing to load replacement fonts 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
fec7ccf020 LibPDF: Ask OpenType font programs for glyph widths if needed
If the font dictionary didn't specify custom glyph widths, we would fall
back to the specified "missing width" (or 0 in most cases!), which meant
that we would draw glyphs on top of each other in a lot of cases, namely
for TrueTypeFonts or standard Type1Fonts with an OpenType fallback.

What we actually want to do in this case is ask the OpenType font for
the correct width.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
2b3a41be74 LibPDF: Remove the subroutine length limit for PS1 font programs
A limit of 1024 subroutines seemed like a sensible choice, but some
fonts actually do exceed it. We will now only assert that the specified
amount is positive.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
3400779047 LibPDF: Pass the right point width to the font loader in TrueTypeFont 2023-03-22 09:04:00 +01:00
Rodrigo Tobar
4a20751ff6 LibPDF: Detect CFF encodings with supplements
These are not yet actually parsed, but detecting them means we at least
don't fail to understand the *actual* format value, which was causing
some CFF fonts to fail to load.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
9bca62c5fa LibPDF: Increase argument stack for Type1FontPrograms
Type1 imposes a stack limit of 24 elements, but Type2 has a limit of 48.
We are better off relaxing the limit of the former in favour of properly
supporting the latter.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
de5e7b487c LibPDF: Improve Type2 hint counting
There were two issues with how we counted hints with Type2 CharString
commands: the first was that we assumed a single hint per command, even
though there are commands that accept multiple hints thanks to taking a
variable number of operands; and secondly, the hintmask/ctrlmask
commands can also take operands (i.e., hints) themselves in certain
situations.

This commit fixes these two issues by correctly counting hints in both
cases. This in turn fixes cases when there were more than 8 hints in
total, therefore a hintmask/ctrlmask command needed to read more than
one byte past the operator itself.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
cb04e4e9da LibPDF: Refactor *Font classes
The PDFFont class hierarchy was very simple (a top-level PDFFont class,
followed by all the children classes that derived directly from it).
While this design was good enough for some things, it didn't correctly
model the actual organization of font types:

 * PDF fonts are first divided between "simple" and "composite" fonts.
   The latter is the Type0 font, while the rest are all simple.
 * PDF fonts yield a glyph per "character code". Simple fonts char codes
   are always 1 byte long, while Type0 char codes are of variable size.

To this effect, this commit changes the hierarchy of Font classes,
introducing a new SimpleFont class, deriving from PDFFont, and acting as
the parent of Type1Font and TrueTypeFont, while Type0 still derives from
PDFFont directly. This distinction allows us now to:

 * Model string rendering differently from simple and composite fonts:
   PDFFont now offers a generic draw_string method that takes a whole
   string to be rendered instead of a single char code. SimpleFont
   implements this as a loop over individual bytes of the string, with
   T1 and TT implementing draw_glyph for drawing a single char code.
 * Some common fields between T1 and TT fonts now live under SimpleFont
   instead of under PDFfont, where they previously resided.
 * Some other interfaces specific to SimpleFont have been cleaned up,
   with u16/u32 not appearing on these classes (or in PDFFont) anymore.
 * Type0Font's rendering still remains unimplemented.

As part of this exercise I also took the chance to perform the following
cleanups and restructurings:

 * Refactored the creation and initialisation of fonts. They are all
   centrally created at PDFFont::create, with a virtual "initialize"
   method that allows them to initialise their inner members in the
   correct order (parent first, child later) after creation.
 * Removed duplicated code.
 * Cleaned up some public interfaces: receive const refs, removed
   unnecessary ctro/dtors, etc.
 * Slightly changed how Type1 and TrueType fonts are implemented: if
   there's an embedded font that takes priority, otherwise we always
   look for a replacement.
 * This means we don't do anything special for the standard fonts. The
   only behavior previously associated to standard fonts was choosing an
   encoding, and even that was under questioning.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
c4507bb56e LibPDF: Add more built-in SIDs
The first iteration has enough SIDs to display simple documents, but
when trying more and more documents we started to need more of these
SIDs to be properly defined. This is a copy/paste exercise from the CFF
document, which is tedious, so it will continue in small drops.

This commit fills all the gaps until SID 228, which covers all the
ISOAdobe space, and should be enough for most use cases. Since this is a
continuous space starting at 0, we now use an Array instead of a Map to
store these names, which should be more performant. Also to simplify
things I've moved the Array out of the CFF class, making it a simpler
static variable, which allows us to use template type deduction.
2023-02-13 00:23:17 +00:00