This is very similar to SimpleFont::draw_string() for now, but
it'll become a bit different when we add support for vertical
text.
CIDFontType now only needs to draw single glyphs. Neither of the
subclasses can do that yet, so no behavior change yet.
This fixes rendering of commas in 0000941.pdf page 1. The commas
use the default width, and without this they show up very large,
covering the page.
Also, it's nice that the code now looks like the regular case 4 lines
further up.
This is one of the two top dict entries we need for CID-keyed fonts.
We don't send any CID-keyed font data into the CFF parser yet,
so no behavior change.
No real behavior change. We don't actually load the CFF data yet
(blocked on #23136 and some more), and we don't have drawing code
yet, and Type0Font::draw_string() doesn't do any drawing yet.
But it's a step in the right direction.
No behavior change, except that we now dbgln() if we see a
PrivDictOperator we don't know about. (I haven't seen this in
practice, but I found this useful while debugging things.)
...and do string expansion at the call site.
CID-keyed fonts treat the charset as CIDs instead of as SIDs,
so having access to the SIDs in numberic form will be useful
when we implement support for CID-keyed CFF fonts.
No behavior change.
I implemented CFF charset format 2 in 6f783929dd with the note
"I haven't seen this being used in the wild". Now that I have
seen it (0000658.pdf), I can say that this has never worked,
despite me claiming "it's easy to implement".
But now it works!
See #22821 for a previous attempt. This attempt should settle
things once and for all.
The opentype render path adjusts by `-font_ascender * -y_scale` in
Glyf::Glyph::append_simple_path(), so that's what we need to undo
to draw at the font's baseline.
(OpenType::Font::metrics() returns ascender scaled by y_scale already,
so no need to have the scale here where we undo the shift.)
Previously, we called `baseline()` which just returns the font's
font size, which is pretty meaningless:
https://tonsky.me/blog/font-size/https://simoncozens.github.io/fonts-and-layout/opentype.html#vertical-metrics-hhea-and-os2
Also, conceptually it makes sense to translate up by the ascender
to get from the upper edge of the glyph to the baseline.
We still don't apply it to the glyph itself, so they don't show up
scaled or rotated, but they're at the right spot now.
One big thing this here hsa going for it is that the final glyph
position is now calculated with just
`ext_rendering_matrix.map(glyph_position)`.
Also, character_spacing and word_spacing are now used unmodified
in the SimpleFont::draw_string() loop. This also means we no longer
have to undo a scale when updating the position in
`Renderer::show_text()`.
Most of the rest stays pretty yucky though. The root cause of many
problems is that ScaledFont has its rendering sized baked into the
object. We want to render fonts at size font_size times scale from
text matrix times scale from current transformation matrix (but
not size from hotizontal_scaling). So we have to make that the
font_size, but then we have to undo that in a bunch of places to
get the actualy font size.
This will eventually get better when LibPDF moves off ScaledFont.
Fonts should have size font_size times total scaling. We tried to
get that by computing text_rendering_matrix.x_scale() * font_size,
but text_rendering_matrix.x_scale() also includes
horizontal_scaling, which shouldn't be part of font size.
Same for character_spacing and word_spacing.
This is all a big mess that's caused by LibPDF using ScaledFont,
which requires scaling to be aprt of the text type. I have an
in-progress local branch that moves LibPDF to directly use VectorFont,
which will hopefully make this (and other things) nicer. But first,
let's get this right, and then make sure we don't regress it when
things change :^)
The vertical coordinates for truetype fonts are different somehow.
We compensated a bit for that; now we compensate some more.
This is still not 100% perfect, but much better than before.
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.
We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.
Good test files from pdfa dataset 0000.zip:
- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
text appears mirrored if the font matrix isn't handled correctly
- 0000425.pdf, page 1: Draws several glyphs in a single run;
glyphs overlap if Renderer::render_type3_glyph() ignores the
passed-in point
- 0000211.pdf, any page: Uses type 3 glyphs for all text.
Good perf test (already "reasonably fast")
- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
purple box is a type 3 font glyph, and it's colored (which in part
means the first operator is `d0`, while all the other documents above
use `d1`)
Type 3 font glyphs begin with either `d0` or `d1`. If we bail out
with an "unsupported" error on the very first operator in a glyph,
we'll never paint the glyph.
Just stub these out for now. We probably want to do more in here in
the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.
On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.
(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)
No behavior change.
/BaseFont is a required key for type 0, type 1, and truetype
font dictionaries, but not for type 3 font dictionaries.
This is mechanical; type 0 fonts don't even use this yet
(but probably should).
PDFFont::initialize() is now empty and could be removed,
but maybe we'll put stuff there again later, so I'm leaving
it around for a bit longer.
Per 5177.Type2.pdf 3.1 "Type 2 Charstring Organization",
a glyph's charstring looks like:
w? {hs* vs* cm* hm* mt subpath}? {mt subpath}* endchar
The `w?` is the width of the glyph, but it's optional. So all
possible commands after it (hstem* vstem* cntrmask hintmask
moveto endchar) check if there's an extra number at the start
and interpret it as a width, for the very first command we read.
This was done by having an `is_first_command` local bool that
got set to false after the first command. That didn't work with
subrs: If the first command was a call to a subr that just pushed
a bunch of numbers, then the second command after it is the actual
first command.
Instead, move that bool into the state. Set it to false the
first time we try to read a width, since that means we just read
a command that could've been prefixed by a width.
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf,
8.4 First Four Subrs Entries:
"""If Flex or hint replacement is used in a Type 1 font program, the
first four entries in the Subrs array in the Private dictionary must be
assigned charstrings that correspond to the following code sequences. If
neither Flex nor hint replacement is used in the font program, then this
requirement is removed, and the first Subrs entry may be a normal
charstring subroutine sequence. The first four Subrs entries contain:
Subrs entry number 0:
3 0 callothersubr pop pop setcurrentpoint return
"""
othersubr handler 0 gets three arguments:
* The flex height (the distance after which the bezier splines
are replaced with just straight lines)
* The current position after the flex
It pushes that position on the postscript stack, where predefined subr
handler number 0 then pops it from. It then passes it to
setcurrentpoint.
In theory, we now correctly do that setcurrentpoint call, which we
previously weren't.
In practice, that setcurrentpoint call always receives the last point of
the flex -- and our path api apparently gets confused when move_to() is
called on it when the current point is already at that same location.
So tweak the SetCurrentPoint handler to not set the current point on
the path if it's already the path's current point, with a FIXME to
figure out what exactly is happening in Gfx::Path.
No big behavior change if flex is used, but this is more correct if it
isn't.
(This only works because our `return` handler is empty, else we would
have to make the callothersubr handler start a call frame.)
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf,
8.4 First Four Subrs Entries:
"""If Flex or hint replacement is used in a Type 1 font program, the
first four entries in the Subrs array in the Private dictionary must be
assigned charstrings that correspond to the following code sequences. If
neither Flex nor hint replacement is used in the font program, then this
requirement is removed, and the first Subrs entry may be a normal
charstring subroutine sequence. The first four Subrs entries contain:
[...]
Subrs entry number 1:
0 1 callothersubr return
Subrs entry number 2:
0 2 callothersubr return
"""
So subr entry numbers 1 and 2 just call othersubr 1 and and 2, which
means we can just move the handling code over.
No behavior change if flex is used, but more correct if it isn't.
(This only works because our `return` handler is empty, else we would
have to make the callothersubr handler start a call frame.)
This is a subset of #21484: Type 2 CFFs never use the special subrs,
so stop doing them for type 2 at least for now.
Fixes an assert in 0000064.pdf in 0000.zip in the pdfa dataset
(a stack underflow because a subr is supposed to push a bunch of
stuff, but instead it ran one of the built-in routines instead of
the subr from the font file).
As discussed in #21484, this isn't right for type 1 CFFs either,
but just removing the code there regresses Tests/LibPDF/type1.pdf.
A slightly more involved thing is needed there; I added a FIXME
for that here.
No intended behavior change.
It does have the effect that indirect object references now go down
the array path instead of the number path. They still fall over there,
but now that's easy to fix.
Type 1 fonts usually have a m_font_program and no m_font -- they only
have m_font if we're using a replacement font for the fonts that
were built-in to PDFs before Acrobat 4.0 (and must still work to
show existing files).
However, SimpleFont::get_glyph_width() used to always return a
float, which in Type1Font was only implemented if m_font was set.
Per spec, we're supposed to just use /MissingWidth for fonts that
are missing an entry in the descriptor's /Width array. However, for
built-in fonts, no explicit /Width array is needed (PDF 1.7 spec,
Appendix H.3, 5.5.1). So if we just always use /MissingWidth,
then PDFs that use a built-in font draw all their text on top
of each other (e.g. 000333.pdf from stillhq.com-pdfdb).
So change get_glyph_width() to return Optional<float>, return
it only in Type1Font if m_font is set, and use MissingWidth
if it isn't set.
That way, replacement fonts still return a width, and real
fonts that are supposed to have /Width and use /MissingWidth
for missing entries do what they're supposed to too, instead
of crashing.
From 20 (6%) to 16 (5%) crashes on the 300 first PDFs, and from
39 (7.8%) to 31 (6.2%) on the 500-random PDFs test.
`left` might be a number bigger than there are actually glyphs in the
CFF.
The spec says "The number of ranges is not explicitly specified in the
font. Instead, software utilizing this data simply processes ranges
until all glyphs in the font are covered." Apparently we have to check
for this within each range as well.
Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the
pdfa dataset.
Together with the previous commit:
From 21 (7%) to 20 (6%) crashes on the 300 first PDFs, and from
41 (8.2%) to 39 (7.8%) on the 500-random PDFs test.
...and replace template instantiations with a loop, to make this
easily possible.
Vaguely nice for code size as well.
Needed for example in 0000054.pdf and 0000354.pdf in 0000.zip in the
pdfa dataset.
We used to use an u8 as loop counter, which would overflow
if there were more than 255 glyphs, producing hundreds of megabytes
of
Couldn't find string for SID x, going with space
output in the process, while all data until the end of the CFF
section got interpreted as SIDs, until a try_read() would finally
fail.
We now no longer fail miserably trying to render page 2 of
0000352.pdf of 0000.zip from the pdfa dataset.
Fixes just one crash of the larger 500-document test set, but
when I tweak test_pdf.py to print all stacks instead of just the
top 5, it no longer produces 260 MB of output.