1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-26 05:07:35 +00:00
Commit graph

156 commits

Author SHA1 Message Date
Nico Weber
dde11e1757 LibPDF: Ignore unknown CFF operators
https://adobe-type-tools.github.io/font-tech-notes/pdfs/5177.Type2.pdf
says "The behavior of undefined operators is unspecified." but
https://learn.microsoft.com/en-us/typography/opentype/spec/cff2
says "When an unrecognized operator is encountered, it is ignored and
the stack is cleared."

Some type 0 CIDFontType0C fonts (i.e. CID-keyed non-OpenType CFF fonts)
depend on the latter, even though they're governed by the former spec.

Fixes rendering of text in 0000521.pdf (e.g. page 10 or 5). The font
there has a bunch of 0 opcodes for some reason.
2024-02-18 08:40:04 +00:00
Nico Weber
05f382fc6e LibPDF: Add CIDFontType2::set_font_size()
See #20084 commit 4. This does the same for truetye-based type0 fonts.

Fixes font sizes on e.g. 1800-2017.pdf.
2024-02-17 16:08:48 +01:00
Nico Weber
f4a59246f5 LibPDF: Implement initial support for Type0 truetype fonts
Disclaimers, similar to what's on #23202 (and most of the
prerequisites mentioned there are needed for this too):

* Only supports the `Identity-H` type0 cmap at the moment
* Doesn't support vertical text yet
* Only supports the `Identity` CIDToGIDMap at the moment
  (this one is a truetype-only thing)
2024-02-17 16:08:48 +01:00
Nico Weber
bd74447dba LibPDF: Initial support for drawing CFF-based Type0 fonts
Together with the already-merged #23122, #23128, #23135, #23136, #23162,
and #23167, #23179, #23190, #23194 this adds initial support for
rendering some CFF-based Type0 fonts :^)

There's a long list of things that still need improving after this:

* A small number of CFF programs contain the charstring command 0,
  which is invalid. Currently, this makes us reject the whole font.

* Type1FontProgram::rasterize_glyph() is name-based. For CID-based
  fonts, we want a version that takes CIDs (character IDs) instead.
  For now, I'm printing the CID to a string and using that, yuck.
  (I looked into doing this nicely. I do want to do that, but I
  need to read up on how the `seac` type1 charstring command uses
  character names to identify parts of an accented character.
  Also, it looks like `seac`'s accented character handling moved
  over to `endchar` in type2 charstring commands (i.e. in CFF data),
  and it looks like we don't implement that at all. So I need to do
  more reading first, and I didn't want to block this on that.)

* The name for the first string in name-based CFF fonts looks wrong;
  added a FIXME for that for now.

* This supports the named Identity-H cmap only for now. Identity-H
  maps UTF16-BE values to glyph IDs with the idenity function, and
  assumes it's horizontal text. Other named cmaps in my test files are
  UniJIS-UCS2-H, UniCNS-UCS2-H, Identity-V, UniGB-UCS2-H, UniKS-UCS2-H.
  (There are also 2 files using the stream-based cmaps instead of the
  name-based ones.)

  * In particular, we can't draw vertical text (`-V`) yet

* Passing in the encoding to CFF::create() is awkward (it's nullptr
  for CID-keyed fonts), and it's also not necessary since
  `Type1Font::draw_glyph()` already does the "take encoding from PDF,
  and only from font if the PDF doesn't store one" dance.

* This doesn't cache glyphs but re-rasterizes them each time. Easy
  to add, but maybe I want to look at rotation first. And things
  don't feel glacial as-is.

* Type0Font::draw_glyph() is pretty similar to second half of
  Type1Font::draw_glyph()
2024-02-16 12:41:10 -05:00
Nico Weber
c9d48bbca4 LibPDF/CFF: Add a comment to CFF::parse_charset() 2024-02-16 12:41:10 -05:00
Nico Weber
5c8778a161 LibPDF/CFF: Compute per-glyph glyph width in CID-keyed fonts
Make TopDict's defaultWidthX and nominalWidthX Optional<>s so that
we can check if they're set per fdselect-selected font dict, and
if so use the value from there in CID-keyed fonts. Otherwise, keep
using the value in the top dict.
2024-02-16 12:41:10 -05:00
Nico Weber
1d1e406b3a LibPDF/CFF: Implement some special handling for CID-keyed fonts
* FDArray, FDSelect must be present
* Encoding must not be present
* Charset maps from GID (Glyph ID) to CID (Character ID),
  instead of to character name
2024-02-15 12:32:31 +01:00
Nico Weber
7494f24430 LibPDF/CFF: Store if a font program is CID-keyed
...and reject CID-keyed font programs for Type1 fonts.
2024-02-15 12:32:31 +01:00
Nico Weber
bb7d29d007 LibPDF/CFF: Read font dicts pointed to by the fdarray offset
The fdselect array (that we already read) maps eachs glyph ID
to an fdarray index. The font dict at that index then stores
information for that glyph.

In practice, this is used to assign different defaultWidthX /
nominalWidthX values to blocks of glyphs in CID-keyed fonts.

We don't do anything yet with the data, and we also don't send
data of CID-keyed CFFs into this parser either, so no behavior
change.
2024-02-15 12:32:31 +01:00
Nico Weber
524a4f6256 LibPDF/CFF: Make parse_top_dict() return all top dicts
This happens for CFFs that contain multiple fonts. This doesn't
happen in practice, but the same code will be used for fdarray
parsing, which will contain several dicts.

No behavior change.
2024-02-15 12:32:31 +01:00
Nico Weber
9f1cf8babc LibPDF/CFF: Extract parse_top_dict() function
Pure code move, no behavior change.
2024-02-15 12:32:31 +01:00
Nico Weber
eb4632e08a LibPDF: Give CFF built-in encoding and charset arrays an underlying type
These arrays store SIDs ("String IDs"), so give them that type now
that we have to_array() and it's easy to do.

No behavior change.
2024-02-14 06:56:43 +01:00
Nico Weber
ddbcd901d1 LibPDF: Separate Type0 CMap errors
No behavior change, just more granular "not implemented" diagnostics.
2024-02-13 19:46:31 +01:00
Nico Weber
8e50bbc9fb LibPDF: Add string drawing code for Type0Fonts
This is very similar to SimpleFont::draw_string() for now, but
it'll become a bit different when we add support for vertical
text.

CIDFontType now only needs to draw single glyphs. Neither of the
subclasses can do that yet, so no behavior change yet.
2024-02-13 19:46:18 +01:00
Nico Weber
751185cb76 LibPDF: Scale default glyph width by font size and x scale
This fixes rendering of commas in 0000941.pdf page 1. The commas
use the default width, and without this they show up very large,
covering the page.

Also, it's nice that the code now looks like the regular case 4 lines
further up.
2024-02-12 14:32:04 +00:00
Nico Weber
7ab4e53b99 LibPDF/CFF: Add code for fdselect parsing
This is one of the two top dict entries we need for CID-keyed fonts.
We don't send any CID-keyed font data into the CFF parser yet,
so no behavior change.
2024-02-12 14:05:16 +01:00
Nico Weber
6ebddab448 LibPDF/CFF: Add enum values for CID-keyed font top dict entries
No behavior change.
2024-02-12 14:05:16 +01:00
Nico Weber
6df0150671 LibPDF: Add some CIDFontType0C scaffolding
No real behavior change. We don't actually load the CFF data yet
(blocked on #23136 and some more), and we don't have drawing code
yet, and Type0Font::draw_string() doesn't do any drawing yet.

But it's a step in the right direction.
2024-02-12 13:59:00 +01:00
Nico Weber
8e7cb11856 LibPDF/CFF: Add enum values for remaining PrivDictOperators
No behavior change, except that we now dbgln() if we see a
PrivDictOperator we don't know about. (I haven't seen this in
practice, but I found this useful while debugging things.)
2024-02-11 14:52:54 +01:00
Nico Weber
a91fecb17e Revert "LibPDF: Don't over-read in charset formats 1 and 2"
This reverts commit 52afa936c4.

No longer necessary after #23122 -- turns out things work
better when you do them right.

No behavior change.
2024-02-09 16:52:01 +00:00
Nico Weber
9bccb8c8d7 LibPDF: Make CFF::parse_charset() return SIDs
...and do string expansion at the call site.

CID-keyed fonts treat the charset as CIDs instead of as SIDs,
so having access to the SIDs in numberic form will be useful
when we implement support for CID-keyed CFF fonts.

No behavior change.
2024-02-09 13:57:23 +01:00
Nico Weber
9750261921 LibPDF: Rename charset to charset_names in CFF parser
No behavior change.
2024-02-09 13:57:23 +01:00
Nico Weber
32f601f9a4 LibPDF: Fix small bug from #21452
I implemented CFF charset format 2 in 6f783929dd with the note
"I haven't seen this being used in the wild". Now that I have
seen it (0000658.pdf), I can say that this has never worked,
despite me claiming "it's easy to implement".

But now it works!
2024-02-08 13:48:56 +00:00
Nico Weber
384c6cf0f9 LibPDF: Tweak vertical position of truetype fonts again
See #22821 for a previous attempt. This attempt should settle
things once and for all.

The opentype render path adjusts by `-font_ascender * -y_scale` in
Glyf::Glyph::append_simple_path(), so that's what we need to undo
to draw at the font's baseline.

(OpenType::Font::metrics() returns ascender scaled by y_scale already,
so no need to have the scale here where we undo the shift.)

Previously, we called `baseline()` which just returns the font's
font size, which is pretty meaningless:

https://tonsky.me/blog/font-size/
https://simoncozens.github.io/fonts-and-layout/opentype.html#vertical-metrics-hhea-and-os2

Also, conceptually it makes sense to translate up by the ascender
to get from the upper edge of the glyph to the baseline.
2024-02-01 10:05:40 +01:00
Nico Weber
d2f3288666 LibPDF: Apply text matrix to each glyph's position
We still don't apply it to the glyph itself, so they don't show up
scaled or rotated, but they're at the right spot now.

One big thing this here hsa going for it is that the final glyph
position is now calculated with just
`ext_rendering_matrix.map(glyph_position)`.

Also, character_spacing and word_spacing are now used unmodified
in the SimpleFont::draw_string() loop. This also means we no longer
have to undo a scale when updating the position in
`Renderer::show_text()`.

Most of the rest stays pretty yucky though. The root cause of many
problems is that ScaledFont has its rendering sized baked into the
object. We want to render fonts at size font_size times scale from
text matrix times scale from current transformation matrix (but
not size from hotizontal_scaling). So we have to make that the
font_size, but then we have to undo that in a bunch of places to
get the actualy font size.

This will eventually get better when LibPDF moves off ScaledFont.
2024-01-18 14:01:30 +01:00
Nico Weber
f54b0e7c22 LibPDF: Don't accidentally put horizontal_scaling in places
Fonts should have size font_size times total scaling. We tried to
get that by computing text_rendering_matrix.x_scale() * font_size,
but text_rendering_matrix.x_scale() also includes
horizontal_scaling, which shouldn't be part of font size.

Same for character_spacing and word_spacing.

This is all a big mess that's caused by LibPDF using ScaledFont,
which requires scaling to be aprt of the text type. I have an
in-progress local branch that moves LibPDF to directly use VectorFont,
which will hopefully make this (and other things) nicer. But first,
let's get this right, and then make sure we don't regress it when
things change :^)
2024-01-18 14:01:30 +01:00
Nico Weber
13f007aadb LibPDF: Tweak vertical position of truetype fonts
The vertical coordinates for truetype fonts are different somehow.
We compensated a bit for that; now we compensate some more.

This is still not 100% perfect, but much better than before.
2024-01-17 08:44:07 +00:00
Shannon Booth
e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Nico Weber
f2f07c3a80 LibPDF: Replace if (a) VERIFY(0) with VERIFY(!a)
No behavior change.
2023-12-16 12:39:56 +01:00
Nico Weber
ee74bc2538 LibPDF: Tolerate 0-sized Subrs in PS1 font subprograms
This regressed in 2b3a41be74 in #18031.

Fixes a crash rendering page 2 and onward of
https://pyx-project.org/presentation_dantemv35_en.pdf
2023-12-16 12:39:56 +01:00
Kyle Pereira
082a4197b6 LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces
This is in anticipation of Pattern color space support which does not
yield a simple color.
2023-12-10 16:44:24 +01:00
Nico Weber
29396415d5 LibPDF: Add an initial implementation of type 3 glyph rendering
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.

We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.

Good test files from pdfa dataset 0000.zip:

- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
  text appears mirrored if the font matrix isn't handled correctly

- 0000425.pdf, page 1: Draws several glyphs in a single run;
  glyphs overlap if Renderer::render_type3_glyph() ignores the
  passed-in point

- 0000211.pdf, any page: Uses type 3 glyphs for all text.
  Good perf test (already "reasonably fast")

- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
  purple box is a type 3 font glyph, and it's colored (which in part
  means the first operator is `d0`, while all the other documents above
  use `d1`)
2023-11-17 19:47:53 +00:00
Nico Weber
14ddab5519 LibPDF: Stub out type3_font_set_glyph_width*
Type 3 font glyphs begin with either `d0` or `d1`. If we bail out
with an "unsupported" error on the very first operator in a glyph,
we'll never paint the glyph.

Just stub these out for now. We probably want to do more in here in
the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).
2023-11-17 19:47:53 +00:00
Nico Weber
126a0be595 LibPDF: Pass Renderer to SimpleFont::draw_glyph()
This makes it available in Type3Font::draw_glyph().

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
bcc6439b5f LibPDF: Pass Renderer to PDFFont::draw_string()
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.

On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.

(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
e0c0864ddf LibPDF: Load a few values off a type 3 font dictionary 2023-11-17 19:47:53 +00:00
Nico Weber
9632d8ee49 LibPDF: Make SimpleFont font matrix configurable
Type 3 fonts can set it to a custom value.
2023-11-17 19:47:53 +00:00
Nico Weber
4cd1a2d319 LibPDF: Add some scaffolding for type 3 fonts 2023-11-17 19:47:53 +00:00
Nico Weber
7f999b1ff5 LibPDF: Sink m_base_font_name from PDFFont into subclasses
/BaseFont is a required key for type 0, type 1, and truetype
font dictionaries, but not for type 3 font dictionaries.

This is mechanical; type 0 fonts don't even use this yet
(but probably should).

PDFFont::initialize() is now empty and could be removed,
but maybe we'll put stuff there again later, so I'm leaving
it around for a bit longer.
2023-11-17 19:47:53 +00:00
Nico Weber
6c1da5db54 LibPDF: Make SimpleFont::draw_glyph() fallible 2023-11-17 19:47:53 +00:00
Nico Weber
843e9daa8c LibPDF: Remove unused PDFFont::type()
This got added in #15270, but its one use then got removed again
in #16150.

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
26fd29baf8 LibPDF: Give Type3 fonts a dedicated error message
They're described in "5.5.4 Type 3 Fonts" in the PDF 1.7 spec, so we
shouldn't `internal_error()` on them. They're just not implemented yet.
2023-11-17 19:47:53 +00:00
Nico Weber
1c2b0feb7b LibPDF: Change how CFF optional width prefix is stored
Per 5177.Type2.pdf 3.1 "Type 2 Charstring Organization",
a glyph's charstring looks like:

    w? {hs* vs* cm* hm* mt subpath}? {mt subpath}* endchar

The `w?` is the width of the glyph, but it's optional. So all
possible commands after it (hstem* vstem* cntrmask hintmask
moveto endchar) check if there's an extra number at the start
and interpret it as a width, for the very first command we read.

This was done by having an `is_first_command` local bool that
got set to false after the first command. That didn't work with
subrs: If the first command was a call to a subr that just pushed
a bunch of numbers, then the second command after it is the actual
first command.

Instead, move that bool into the state. Set it to false the
first time we try to read a width, since that means we just read
a command that could've been prefixed by a width.
2023-11-14 10:10:34 +01:00
Tim Schumacher
a2f60911fe AK: Rename GenericTraits to DefaultTraits
This feels like a more fitting name for something that provides the
default values for Traits.
2023-11-09 10:05:51 -05:00
Nico Weber
d24289eef4 LibPDF: Always log unhandled type 1 and type 2 font program opcodes
This would've made it easy to see that we were missing flex opcodes for
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf
2023-11-01 11:40:16 -04:00
Nico Weber
e1a743f286 LibPDF: Implement type 2 flex, hflex, hflex1, flex1 operators
This is the type 2 equivalent to type2 othersubr, from what I can tell.

See "4.1 Path Construction Operators" in 5177.Type2.pdf,
"The Type 2 Charstring Format".

Makes text show up alright on
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf
2023-11-01 11:40:16 -04:00
Nico Weber
3e707efdfa LibPDF: Move type1 subr 0 handling into othersubr handler
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf,
8.4 First Four Subrs Entries:

"""If Flex or hint replacement is used in a Type 1 font program, the
first four entries in the Subrs array in the Private dictionary must be
assigned charstrings that correspond to the following code sequences. If
neither Flex nor hint replacement is used in the font program, then this
requirement is removed, and the first Subrs entry may be a normal
charstring subroutine sequence. The first four Subrs entries contain:

Subrs entry number 0:
3 0 callothersubr pop pop setcurrentpoint return
"""

othersubr handler 0 gets three arguments:
* The flex height (the distance after which the bezier splines
  are replaced with just straight lines)
* The current position after the flex

It pushes that position on the postscript stack, where predefined subr
handler number 0 then pops it from. It then passes it to
setcurrentpoint.

In theory, we now correctly do that setcurrentpoint call, which we
previously weren't.

In practice, that setcurrentpoint call always receives the last point of
the flex -- and our path api apparently gets confused when move_to() is
called on it when the current point is already at that same location.

So tweak the SetCurrentPoint handler to not set the current point on
the path if it's already the path's current point, with a FIXME to
figure out what exactly is happening in Gfx::Path.

No big behavior change if flex is used, but this is more correct if it
isn't.

(This only works because our `return` handler is empty, else we would
have to make the callothersubr handler start a call frame.)
2023-11-01 11:38:41 -04:00
Nico Weber
0bb8249780 LibPDF: Move type1 subr 1 and 2 handling into othersubr handler
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf,
8.4 First Four Subrs Entries:

"""If Flex or hint replacement is used in a Type 1 font program, the
first four entries in the Subrs array in the Private dictionary must be
assigned charstrings that correspond to the following code sequences. If
neither Flex nor hint replacement is used in the font program, then this
requirement is removed, and the first Subrs entry may be a normal
charstring subroutine sequence. The first four Subrs entries contain:

[...]

Subrs entry number 1:
0 1 callothersubr return

Subrs entry number 2:
0 2 callothersubr return
"""

So subr entry numbers 1 and 2 just call othersubr 1 and and 2, which
means we can just move the handling code over.

No behavior change if flex is used, but more correct if it isn't.

(This only works because our `return` handler is empty, else we would
have to make the callothersubr handler start a call frame.)
2023-11-01 11:38:41 -04:00
Nico Weber
4cc24548f6 LibPDF: Call dbgln() for unimplemented flex upcodes 2023-10-28 13:28:05 -04:00