1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-09-13 13:27:34 +00:00
Commit graph

189 commits

Author SHA1 Message Date
Nico Weber
3be5719987 LibPDF: Rename subroutines to local_subroutines in CFF code 2023-10-18 11:02:10 +02:00
Nico Weber
9a0b559932 LibPDF: Tweak formatting of built-in CFF tables
This makes the code look more like the pages in the spec.

No behavior change, whitespace change only.
2023-10-18 11:00:17 +02:00
Nico Weber
f0e7fb7038 LibPDF: Make Subrs optional in PS1FontProgram
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf :

"Using charstring subroutines is not a requirement of a Type 1
font program."

And some versions of Computer Modern do in fact not contain a Subrs
array.

Together with #21473, makes Problemset.pdf from the pdffiles repro
render ok instead of crashing.
2023-10-18 11:00:02 +02:00
Nico Weber
cb961101c7 LibPDF: Implement CFF built-in Standard and Expert encodings
With this, all tables from the spec appendixes are in CFF.cpp.

This fixes a crash reading page 2 (and onward) of
2ThestructureoftheCIE1997ColourAppearanceModelCIECAM97s.pdf in
the pdffiles repo.
2023-10-17 10:21:38 +02:00
Nico Weber
eeada4678c LibPDF: Postpone CFF encoding processing after Top DICT has been read
The encoding offset defaults to 0, i.e. the Standard Encoding.
That means reading the encoding only if the tag is present causes
us to not read it if a font uses the Standard Encoding.

Now, we always read an encoding, even if it's the (implicit) default
one.
2023-10-17 10:21:38 +02:00
Nico Weber
1cfe639b6c LibPDF: Implement CFF supplemental encoding
The main encoding data maps glyph ID ("GID") to its codepoint.
If a glyph has several codepoints, then a secondary table mapping
codepoint to string ID ("SID") of the glyph's name is present.

(A separate table associates each glyph with its name already.)

I haven't seen this used in the wild, but the structure of the
supplemental data is also going to be needed for built-in encodings.
2023-10-17 10:21:38 +02:00
Nico Weber
37daeae6fd LibPDF: Add spec comments, dbgln_if()s to CFF's parse_encoding() 2023-10-17 10:21:38 +02:00
Nico Weber
007d7cdd53 LibPDF: Fix sign (and fixed point) in glyph decoding opcode 24
Two bugs:

1. We decoded a u32, not an i32 as the spec wants
2. (minor) Our fixed-point divisor was off by one

Fixes text rendering in Bakke2010a.pdf in pdffiles, and rendering of
other fonts with negative width adjustments from optcode 255.
That PDF was produced by "Apple pstopdf" and uses font SFBX1200,
which is apparently a variant of Computer Modern. So maybe this
helps with lots of PDFs produced from TeX files, but I haven't
checked that.
2023-10-16 08:33:35 +02:00
Nico Weber
96a4936567 LibPDF: Checking for built-in CFF encodings
Only prints a warning for them for now.

Also warn on the not-yet-implemented encoding supplement.
2023-10-16 08:32:18 +02:00
Nico Weber
414a164850 LibPDF: Be louder about unimplemented CFF dict entries 2023-10-16 08:32:18 +02:00
Nico Weber
c825194fb9 LibPDF: Reject CFFs with more than one font
The code assumes that there's just one Top DICT, so let's be loud
when that isn't the case.
2023-10-16 08:32:18 +02:00
Nico Weber
6f783929dd LibPDF: Implement support for CFF charset format 2
I haven't seen this being used in the wild (yet), but it's easy
to implement, and with this we support all charset formats.

So we can now mention if we see a format we don't know about.
2023-10-15 15:27:15 +02:00
Nico Weber
5b915fb15c LibPDF: Add more spec comments to parse_charset() 2023-10-15 15:27:15 +02:00
Nico Weber
49275c4b17 LibPDF: Don't overflow SIDs in type 1 charset parsing
first_sid has type SID (aka u16), so don't store it in an u8.

This fixes (among other things) page 24 on the PDF 1.7 spec.
2023-10-15 15:27:15 +02:00
Nico Weber
23d6e9f577 LibPDF: Implement CFF built-in charsets ISOAdobe, Expert, Expert Subset 2023-10-15 09:33:34 +02:00
Nico Weber
8060957d8d LibPDF: Use Appendix A instead of Appendix C for standard names
From "10 String INDEX":

"Further space saving is obtained by allocating commonly occurring
strings to predefined SIDs. These strings, known as the standard
strings, describe all the names used in the ISOAdobe and Expert
character sets along with a few other strings common to Type 1 fonts. A
complete list of standard strings is given in Appendix A.  The client
program will contain an array of standard strings with nStoStrings
elements. Thus, the standard strings take SIDs in the range 0 to
(nStaStrings-1)."

And "13 Charsets" says that charsets store SIDs.

Fixes all

    "Couldn't find string for SID $n, going with space"

messages when going through the encoding pages (page 1010 and
thereabouts) in the PDF 1.7 spec.
2023-10-15 09:33:34 +02:00
Nico Weber
aba787a441 LibPDF: Implement reading of CFF String Index
Only really useful for reading SIDs in the Top DICT (copyright
text etc), which we currently don't do.

I haven't seen a difference from looking things up in the string
table. The only real effect from the commit that I need is that
it pulls a local resolve() labmda into a real function
resolve_sid(), which I want to call in a future commit.

But it makes things more spec-compliant, and if we ever want to
read SIDs in metadata in the future, now we can.
2023-10-15 09:33:34 +02:00
Nico Weber
3c49d0dad3 LibPDF: Add a CFF_DEBUG toggle
I'd like to put some debug prints behind this soon.

No behavior change.
2023-10-15 07:14:29 +02:00
Nico Weber
2249e79630 LibPDF: Add two FIXMEs 2023-10-13 07:53:27 +02:00
Nico Weber
d451197d3d LibPDF: Add spec comments to CFF 2023-10-13 07:53:27 +02:00
Nico Weber
349996f7f2 LibPDF: Don't crash on files with float CFF defaultWidthX
We'd unconditionally get the int from a Variant<int, float> here,
but PDFs often have a float for defaultWidthX and nominalWidthX.

Fixes crash opening Bakke2010a.pdf from pdffiles (but while the
file loads ok, it looks completely busted).
2023-10-12 19:43:57 +02:00
Andreas Kling
13db3c5ce0 LibGfx: Convert FontDatabase APIs to use FlyString 2023-09-06 11:29:03 -04:00
Nico Weber
934340d845 LibPDF: Add FIXME for CIDFontType2 creation
Move some code only needed for CIDFontType2 creation into a new
function and add a FIXME describing what needs to happen there.
2023-08-14 16:26:09 +02:00
Nico Weber
1c263eee61 LibPDF: Add spec comments and FIXMEs to Type0Font::draw_string() 2023-08-14 16:26:09 +02:00
Nico Weber
715b6f868f LibPDF: Sketch out Type0 font support some more
Type0 fonts can be either CFF-based or TrueType-based.
Create a subclass for each, put in some spec text, and
give each case a dedicated error code, so that `--debugging-stats`
can tell me which branch is more common.
2023-07-25 12:10:36 +02:00
Nico Weber
e3cc05b935 LibPDF: Don't ignore word_spacing 2023-07-22 12:24:29 -04:00
Nico Weber
9283c939bb LibPDF: Include width in Type1Font glyph cache key
LibGfx's ScaledFont doesn't do this, but in ScaledFont m_x_scale and
m_y_scale are immutable once the class is created, so it can get away
with not doing it.

In Type1Font, `width` changes in different calls to
Type1Font::draw_glyph(), so we need to make it part of the cache key.

Fixes rendering of the word "Version" on the first page of
pdf_reference_1-7.pdf.
2023-07-21 07:01:09 +02:00
Matthew Olsson
5f8fd47214 LibPDF: Resize fonts when the text and line matrices change 2023-07-20 06:56:41 +01:00
Nico Weber
117a5f1bd2 LibPDF: Remove an unused variable 2023-07-12 19:02:56 +02:00
MacDue
e1cf868e6e LibGfx: Use AntiAliasingPainter::fill_path() for drawing font glyphs
Using the general AA painter fill_path() is indistinguishable from the
previous rasterizer, so this switch simply allows us to share more code.
2023-07-10 20:56:25 +02:00
Timothy Flynn
c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Nico Weber
f56b897622 Everywhere: Fix a few typos
Some even user-visible!
2023-04-12 19:37:35 +02:00
Julian Offenhäuser
bdd5f36121 LibPDF: Load replacements for TrueTypeFonts without an embedded font
This previously only happened for Type 1 fonts.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
5deac3a7f5 LibPDF: Actually return an error when failing to load replacement fonts 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
fec7ccf020 LibPDF: Ask OpenType font programs for glyph widths if needed
If the font dictionary didn't specify custom glyph widths, we would fall
back to the specified "missing width" (or 0 in most cases!), which meant
that we would draw glyphs on top of each other in a lot of cases, namely
for TrueTypeFonts or standard Type1Fonts with an OpenType fallback.

What we actually want to do in this case is ask the OpenType font for
the correct width.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
2b3a41be74 LibPDF: Remove the subroutine length limit for PS1 font programs
A limit of 1024 subroutines seemed like a sensible choice, but some
fonts actually do exceed it. We will now only assert that the specified
amount is positive.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
3400779047 LibPDF: Pass the right point width to the font loader in TrueTypeFont 2023-03-22 09:04:00 +01:00
Rodrigo Tobar
4a20751ff6 LibPDF: Detect CFF encodings with supplements
These are not yet actually parsed, but detecting them means we at least
don't fail to understand the *actual* format value, which was causing
some CFF fonts to fail to load.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
9bca62c5fa LibPDF: Increase argument stack for Type1FontPrograms
Type1 imposes a stack limit of 24 elements, but Type2 has a limit of 48.
We are better off relaxing the limit of the former in favour of properly
supporting the latter.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
de5e7b487c LibPDF: Improve Type2 hint counting
There were two issues with how we counted hints with Type2 CharString
commands: the first was that we assumed a single hint per command, even
though there are commands that accept multiple hints thanks to taking a
variable number of operands; and secondly, the hintmask/ctrlmask
commands can also take operands (i.e., hints) themselves in certain
situations.

This commit fixes these two issues by correctly counting hints in both
cases. This in turn fixes cases when there were more than 8 hints in
total, therefore a hintmask/ctrlmask command needed to read more than
one byte past the operator itself.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
cb04e4e9da LibPDF: Refactor *Font classes
The PDFFont class hierarchy was very simple (a top-level PDFFont class,
followed by all the children classes that derived directly from it).
While this design was good enough for some things, it didn't correctly
model the actual organization of font types:

 * PDF fonts are first divided between "simple" and "composite" fonts.
   The latter is the Type0 font, while the rest are all simple.
 * PDF fonts yield a glyph per "character code". Simple fonts char codes
   are always 1 byte long, while Type0 char codes are of variable size.

To this effect, this commit changes the hierarchy of Font classes,
introducing a new SimpleFont class, deriving from PDFFont, and acting as
the parent of Type1Font and TrueTypeFont, while Type0 still derives from
PDFFont directly. This distinction allows us now to:

 * Model string rendering differently from simple and composite fonts:
   PDFFont now offers a generic draw_string method that takes a whole
   string to be rendered instead of a single char code. SimpleFont
   implements this as a loop over individual bytes of the string, with
   T1 and TT implementing draw_glyph for drawing a single char code.
 * Some common fields between T1 and TT fonts now live under SimpleFont
   instead of under PDFfont, where they previously resided.
 * Some other interfaces specific to SimpleFont have been cleaned up,
   with u16/u32 not appearing on these classes (or in PDFFont) anymore.
 * Type0Font's rendering still remains unimplemented.

As part of this exercise I also took the chance to perform the following
cleanups and restructurings:

 * Refactored the creation and initialisation of fonts. They are all
   centrally created at PDFFont::create, with a virtual "initialize"
   method that allows them to initialise their inner members in the
   correct order (parent first, child later) after creation.
 * Removed duplicated code.
 * Cleaned up some public interfaces: receive const refs, removed
   unnecessary ctro/dtors, etc.
 * Slightly changed how Type1 and TrueType fonts are implemented: if
   there's an embedded font that takes priority, otherwise we always
   look for a replacement.
 * This means we don't do anything special for the standard fonts. The
   only behavior previously associated to standard fonts was choosing an
   encoding, and even that was under questioning.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
c4507bb56e LibPDF: Add more built-in SIDs
The first iteration has enough SIDs to display simple documents, but
when trying more and more documents we started to need more of these
SIDs to be properly defined. This is a copy/paste exercise from the CFF
document, which is tedious, so it will continue in small drops.

This commit fills all the gaps until SID 228, which covers all the
ISOAdobe space, and should be enough for most use cases. Since this is a
continuous space starting at 0, we now use an Array instead of a Map to
store these names, which should be more performant. Also to simplify
things I've moved the Array out of the CFF class, making it a simpler
static variable, which allows us to use template type deduction.
2023-02-13 00:23:17 +00:00
Julian Offenhäuser
a2b57dd188 LibPDF: Return an error if we fail to load a replacement font 2023-02-12 10:55:37 +00:00
Julian Offenhäuser
4f4bd3793f LibPDF: Fix glyph sizing bug that caused incorrect spacing
When loading OpenType fonts, either as a replacement for the standard
14 fonts or an embedded one, we previously passed the font size as the
_point_ size to the loader class. The difference is quite subtle, being
that Gfx::ScaledFont uses the optional dpi parameter to convert the
input from inches to pixels.

This meant that our glyphs were exactly 1.333% too large, causing them
to overlap in places.
2023-02-10 15:37:51 +01:00
Julian Offenhäuser
152a8c5c43 LibPDF: Use more appropriate standard 14 replacement fonts
The mapping of standard font to replacement now looks like this:

Times New Roman -> Liberation Serif
Courier -> Liberation Mono
Helvetica, Arial -> Liberation Sans
2023-02-10 15:37:51 +01:00
Rodrigo Tobar
e4a7606b81 LibPDF: Construct accented characters with Type1 seac command
The seac command provides the base and accented character that are
needed to create an accented character glyph. Storing these values is
all that was left to properly support these composed glyphs.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
3eaa27f53a LibPDF: Add infrastructure for accented character glyphs
Type1 accented character glyphs are composed of two other glyphs in the
same font: a base glyph and an accent glyph, given as char codes in the
standard encoding. These two glyphs are then composed together to form
the accented character.

This commit adds the data structures to hold the information for
accented characters, and also the routine that composes the final glyph
path out of the two individual components. All glyphs must have been
loaded by the time this composition takes place, and thus a new
protected consolidate_glyphs() routine has been added to perform this
calculation.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
11a9bfd4b6 LibPDF: Turn Glyph into a class
Glyph was a simple structure, but even now it's become more complex that
it was initially. Turning it into a class hides some of that complexity,
and make sit easier to understand to external eyes.

While doing this I also decided to remove the float + bool combo for
keeping track of the glyph's width, and replaced it with an Optional
instead.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
c084943457 LibPDF: Index Type1 glyphs by name, not char code
Storing glyphs indexed by char code in a Type1 Font Program binds a Font
Program instance to the particular Encoding that was used at Font
Program construction time. This makes it difficult to reuse Font Program
instances against different Encodings, which would be otherwise
possible.

This commit changes how we store the glyphs on Type1 Font Programs.
Instead of storing them on a map indexed by char code, the map is now
indexed by glyph name. In turn, when rendering a glyph we use the
Encoding object to turn the char code into a glyph name, which in turn
is used to index into the map of glyphs.

This is the first step towards reusability of Type1 Font Programs. It
also unlocks the ability to render glyphs that are described via the
"seac" command (standard encoding accented character), which requires
accessing the base and accent glyphs by name.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
596119cf3e LibPDF: Add placeholders for *flex Type2 commands
These should be implemented properly in the future, but for now we are
adding the as placeholders to avoid crashes.
2023-02-08 19:47:15 +01:00