1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-02 23:12:08 +00:00
Commit graph

637 commits

Author SHA1 Message Date
Nico Weber
9aa31157d5 LibPDF: Use right encoding for standard fonts Symbol and ZapfDingbats
We use Liberation Sans for the actual glyph for these, and that's
missing some (Symbol) / all (ZapfDingbats) of the glyphs we need
for these two standard fonts (...or at least the mapping from
name to glyph, not sure). But still, better rendering squares than
completely incorrect glpyhs.

Our code deciding what to do when a value isn't found in an encoding,
or when the name doesn't map to a glpyh, also needs work, but that's
mostly independent of this change. I think this is a nice small
standalone progression.
2024-02-27 17:42:08 -05:00
Nico Weber
76105d5d7f LibPDF: Resize images to the larger of image and mask dimensions
Makes text show up on 0000646.pdf pages 87-92, which for some reason
renders all text using 2x2 images with huge masks that contain
rendered text outlines.
2024-02-27 17:39:13 -05:00
Nico Weber
472bc367d3 LibPDF: Do not have redundant variables for image size
This way, the size of the bitmap cannot become out of sync with these
variables.

No behavior change.
2024-02-27 17:39:13 -05:00
Nico Weber
83d29b3e45 LibPDF: Hack around a FIXME in TrueTypePainter::get_glyph_width()
This will need further thought once we implement support for the
truetype 'post' table, but for now it's correct most of the time,
and better than not doing it.
2024-02-27 07:02:27 +01:00
Nico Weber
448eaa2966 LibPDF: Let Type1Font use TrueTypePainter for standard fonts
...and for fallback fonts too.

We use Liberation Sans (a truetype font) for standard and fallback
fonts. So we should use the standard PDF algorithm for mapping bytes
to truetype glyphs. TrueTypePainter knows how to do this.

Makes the "fi" ligature in the title on page 1 of 5014.CIDFont_Spec.pdf
or the dotless-i in the title of page 2 of ThinkingInPostScript.pdf
show up. They use Helvetica and TImes, and Helvetica and Symbol
respecitively (with -Bold variants).
2024-02-27 07:02:27 +01:00
Nico Weber
86a7753d65 LibPDF: Move TrueType painting into a new class
No behavior change.
2024-02-27 07:02:27 +01:00
Nico Weber
84d1e3956f LibPDF: Make truetype ascent adjustment more local
It's only used in this function.

No behavior change.
2024-02-27 07:02:27 +01:00
Nico Weber
03fab7089a LibPDF+PDFViewer: Extract Renderer::apply_page_rotation()
No behavior change.
2024-02-27 07:02:02 +01:00
Nico Weber
cafaaa0e76 LibPDF: Don't crash on zero-width characters in type1 fonts
Since ScaledFont bakes the size of the font into the font type, we
do the same for Type1 fonts, and then have to divide by the font height
when figuring out what to scale by. For a target width of 0, chances are
the source width is also 0, and we end up with NaN due to dividing
0 by 0. This then triggered the `VERIFY(isfinite(error))` in
can_approximate_bezier_curve() in Painter.cpp.

Check for this case and scale by 0 instead of dividing.

It could happen that the denominator is 0 without the numerator being 0,
but it's not clear what that's supposed to mean. In this case we'd end
up with +inf/-inf, which would also trigger the assert. I haven't seen
this case in practice, so let's not worry about that for now.

(A nicer longer-term fix is probably to make LibPDF use VectorFont
instead of ScaledFont, so that we don't have to bake the font size into
the font type. Then we won't need this division at all. In the meantime,
this fixes the crash.)

Fixes a crash on page 66 of
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf

Fixes a crash on page 37 of
https://open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

Fixes crashes in `0000310.pdf`, `0000430.pdf`, `0000229.pdf`.

Brings down the number of crashes on my 1000 file test set from
5 with 3 distinct stacks to 2 with 1 distinct stack.

(The number went up from 3 crashes with 2 distinct stacks to 5/3 when we
started rendering much more text when Type0 font support was added.
This fixes the crashes we had before Type0 support.)
2024-02-27 07:01:05 +01:00
Nico Weber
83128d093e LibPDF: Implement most of the spec algorithm for picking TrueType glyphs
Non-CID-keyed fonts in PDFs have 8-bit codepoints which are mapped from
bytes to character names via encoding.

TrueType fonts don't index glyphs by name (Type1 fonts do), so the fix
(codified in the spec) was to make a list of all possible glyph names
and map those to (16-bit) unicode values, and then pass those into the
truetype cmap.

(As a fallback, we're supposed to look at the optional names in the
font's "post" table. That part isn't implemented here yet.)

(Note that this affects the behavior of fallback fonts for TrueType
fonts, but not yet fallback fonts for Type1 fonts, and neither the
behavior of the 14 built-in Type1 fonts (which we implement as
fallback fonts), since the TrueType fallback in Type1Font.cpp does
not use this algorithm yet. This will be fixed in a future patch.)
2024-02-25 15:15:20 +01:00
Nico Weber
207717982c LibPDF: Read /Flags off font descriptors 2024-02-25 15:15:20 +01:00
Lucas CHOLLET
cb03ab4a5a LibPDF: Handle the BlackIs1 parameter of the CCITTFaxDecode Filter 2024-02-24 16:24:45 -07:00
Lucas CHOLLET
6b3bab5c8a LibPDF: Plug in the CCITTFaxDecode filter to our CCITT decoder
We only call the decoder for Group 4 images. We do support Group 3
images, but let's wait to find a PDF with these before adding support.
2024-02-24 16:24:45 -07:00
Nico Weber
b258ba2767 LibPDF: Use decode_hex_digit() more
For `:#xx` in names, we now also handle lower-case hex digits.
The spec is silent on the case of these hex digits.
Our previous check (isxdigit(), and now is_ascii_hex_digit()) lets
through lower-case hex digits, so it seems better to handle them
rather than computing e.g. `'a' - 'A' + 10` (== 42 -- off by 32!).
I don't know if this has any visible effect on any files, but it's
more correct, and less code, and the code looks more like the code
in Filter::decode_ascii_hex().
2024-02-23 12:11:25 -05:00
Nico Weber
783b1d1c11 LibPDF: Use is_ascii_hex_digit() instead of isxdigit()
See description of #7684 for motivation.

Also, makes this code look more like the hex code in
Filter::decode_ascii_hex().

No behavior change.
2024-02-23 12:11:25 -05:00
Nico Weber
c9234f35f1 LibPDF/CFF: Clear stack after "endchar" commands
Both type 1 and type 2 spec tell us to do this.

I haven't observed a difference from this, but I noticed it in the
spec while I was touching this code. Probably good to do what the
spec tells us to do.
2024-02-22 06:59:28 +01:00
Nico Weber
020c00ede2 LibPDF/CFF: Use offset in accented_character() data
Without this, the dieresis above an a is all the way to the left
instead of over the letter.
2024-02-22 06:59:28 +01:00
Nico Weber
12859dfde5 LibPDF/CFF: Treat endchar in type 2 as type 2 "seac" when requested
With this, a character can be defined that uses two existing glyphs.
This is useful for umlauts and the like, which then just need to
reference e.g. the glyphs named "a" and "dieresis" and provide a
translation.

Makes umlauts appear on some PDFs using CFF type2 data in Type 1
fonts.
2024-02-22 06:59:28 +01:00
Nico Weber
cade76d240 LibPDF+LibGfx: Do not try to read "OS/2" table for PDFs
It is sometimes truncated in fonts embedded in PDFs, and the data
is not needed to render PDFs. 2 of my 1000 test PDFs used to
complain "Could not load OS2 v1: Not enough data" and 1
"Could not load OS2 v2: Not enough data" before.

Increases number of PDFs that render without diagnostics from
764 to 765 (and decreases the number of distinct error messages
from 27 to 25).
2024-02-21 13:38:33 +01:00
Nico Weber
0dee94ef40 LibPDF+LibGfx: Do not try to read "hmtx" table for PDFs
It is sometimes truncated in fonts embedded in PDFs, and the data
is not needed to render PDFs. 26 of my 1000 test files complained
"Could not load Hmtx: Not enough data" before.

Increases number of PDFs that render without diagnostics from
743 to 764.
2024-02-21 13:38:33 +01:00
Nico Weber
5efe80af7f LibPDF+LibGfx: Do not try to read "name" table for PDFs
It is often missing in fonts embedded in PDFs. 75 of my 1000 test
files complained "Font is missing Name" when trying to read fonts
before.

Increases number of PDFs that render without diagnostics from
682 to 743.
2024-02-21 13:38:33 +01:00
Nico Weber
41eca52b50 LibGfx/OpenType: Tweak Font::try_load_from_externally_owned_memory()
It now takes an Options object instead of passing several default
parameters.

No behavior change.
2024-02-21 13:38:33 +01:00
Nico Weber
3b616b6af8 LibPDF: Use original error for failing ICC load 2024-02-21 13:37:08 +01:00
Nico Weber
fa95e5ec0e LibPDF: Fix line drawing when line_width is 0
We used to skip lines with width 0. The correct behavior per spec
is to draw them one pixel wide instead.
2024-02-21 10:30:57 +01:00
Nico Weber
1cb450e9a3 LibPDF: Give CFF Glyph 0 the name .notdef
This is required by the CFF spec, and is consistent with what we do for
the encoding 24 lines down.

As far as I can tell, nothing in `Type1FontProgram::rasterize_glyph()`
or in Type1Font.cpp implements the "If an encoding maps to a character
name that does not exist in the Type 1 font pro- gram, the .notdef glyph
is substituted." line from the PDF 1.7 spec (in 5.5.5 Character
Encoding, Encodings for Type 1 Fonts) yet, so this does yet have an
effect.
2024-02-20 06:54:50 -05:00
Nico Weber
05a7482118 LibPDF/CFF: Add dbgln() when failing encoding bounds check 2024-02-20 08:43:10 +00:00
Nico Weber
4705d38fa7 LibPDF/CFF: Fix off-by-one when reading internal encoding
We use `i - 1` to index these arrays, so that's what we should use
for the bounds check as well.
2024-02-20 08:43:10 +00:00
Nico Weber
012f6d46e7 LibPDF: Implement stream CIDToGIDMaps for Type0 CIDFontType2 fonts
Of my 1000 test files, 73 have stream Type0 truetype fonts with stream
CIDToGIDMaps. This makes that work.

(With this patch, the number of files in my 1000 test files complaining
"Font is missing Name" increases from 41 to 75, so a bit under half of
the fonts using stream CIDToGIDMaps also have no 'name' table. So that's
next.)

Increases files without issues from 652 to 681.
2024-02-18 15:43:33 -05:00
Nico Weber
dde11e1757 LibPDF: Ignore unknown CFF operators
https://adobe-type-tools.github.io/font-tech-notes/pdfs/5177.Type2.pdf
says "The behavior of undefined operators is unspecified." but
https://learn.microsoft.com/en-us/typography/opentype/spec/cff2
says "When an unrecognized operator is encountered, it is ignored and
the stack is cleared."

Some type 0 CIDFontType0C fonts (i.e. CID-keyed non-OpenType CFF fonts)
depend on the latter, even though they're governed by the former spec.

Fixes rendering of text in 0000521.pdf (e.g. page 10 or 5). The font
there has a bunch of 0 opcodes for some reason.
2024-02-18 08:40:04 +00:00
Nico Weber
05f382fc6e LibPDF: Add CIDFontType2::set_font_size()
See #20084 commit 4. This does the same for truetye-based type0 fonts.

Fixes font sizes on e.g. 1800-2017.pdf.
2024-02-17 16:08:48 +01:00
Nico Weber
f4a59246f5 LibPDF: Implement initial support for Type0 truetype fonts
Disclaimers, similar to what's on #23202 (and most of the
prerequisites mentioned there are needed for this too):

* Only supports the `Identity-H` type0 cmap at the moment
* Doesn't support vertical text yet
* Only supports the `Identity` CIDToGIDMap at the moment
  (this one is a truetype-only thing)
2024-02-17 16:08:48 +01:00
Nico Weber
bd74447dba LibPDF: Initial support for drawing CFF-based Type0 fonts
Together with the already-merged #23122, #23128, #23135, #23136, #23162,
and #23167, #23179, #23190, #23194 this adds initial support for
rendering some CFF-based Type0 fonts :^)

There's a long list of things that still need improving after this:

* A small number of CFF programs contain the charstring command 0,
  which is invalid. Currently, this makes us reject the whole font.

* Type1FontProgram::rasterize_glyph() is name-based. For CID-based
  fonts, we want a version that takes CIDs (character IDs) instead.
  For now, I'm printing the CID to a string and using that, yuck.
  (I looked into doing this nicely. I do want to do that, but I
  need to read up on how the `seac` type1 charstring command uses
  character names to identify parts of an accented character.
  Also, it looks like `seac`'s accented character handling moved
  over to `endchar` in type2 charstring commands (i.e. in CFF data),
  and it looks like we don't implement that at all. So I need to do
  more reading first, and I didn't want to block this on that.)

* The name for the first string in name-based CFF fonts looks wrong;
  added a FIXME for that for now.

* This supports the named Identity-H cmap only for now. Identity-H
  maps UTF16-BE values to glyph IDs with the idenity function, and
  assumes it's horizontal text. Other named cmaps in my test files are
  UniJIS-UCS2-H, UniCNS-UCS2-H, Identity-V, UniGB-UCS2-H, UniKS-UCS2-H.
  (There are also 2 files using the stream-based cmaps instead of the
  name-based ones.)

  * In particular, we can't draw vertical text (`-V`) yet

* Passing in the encoding to CFF::create() is awkward (it's nullptr
  for CID-keyed fonts), and it's also not necessary since
  `Type1Font::draw_glyph()` already does the "take encoding from PDF,
  and only from font if the PDF doesn't store one" dance.

* This doesn't cache glyphs but re-rasterizes them each time. Easy
  to add, but maybe I want to look at rotation first. And things
  don't feel glacial as-is.

* Type0Font::draw_glyph() is pretty similar to second half of
  Type1Font::draw_glyph()
2024-02-16 12:41:10 -05:00
Nico Weber
c9d48bbca4 LibPDF/CFF: Add a comment to CFF::parse_charset() 2024-02-16 12:41:10 -05:00
Nico Weber
5c8778a161 LibPDF/CFF: Compute per-glyph glyph width in CID-keyed fonts
Make TopDict's defaultWidthX and nominalWidthX Optional<>s so that
we can check if they're set per fdselect-selected font dict, and
if so use the value from there in CID-keyed fonts. Otherwise, keep
using the value in the top dict.
2024-02-16 12:41:10 -05:00
Nico Weber
1d1e406b3a LibPDF/CFF: Implement some special handling for CID-keyed fonts
* FDArray, FDSelect must be present
* Encoding must not be present
* Charset maps from GID (Glyph ID) to CID (Character ID),
  instead of to character name
2024-02-15 12:32:31 +01:00
Nico Weber
7494f24430 LibPDF/CFF: Store if a font program is CID-keyed
...and reject CID-keyed font programs for Type1 fonts.
2024-02-15 12:32:31 +01:00
Nico Weber
bb7d29d007 LibPDF/CFF: Read font dicts pointed to by the fdarray offset
The fdselect array (that we already read) maps eachs glyph ID
to an fdarray index. The font dict at that index then stores
information for that glyph.

In practice, this is used to assign different defaultWidthX /
nominalWidthX values to blocks of glyphs in CID-keyed fonts.

We don't do anything yet with the data, and we also don't send
data of CID-keyed CFFs into this parser either, so no behavior
change.
2024-02-15 12:32:31 +01:00
Nico Weber
524a4f6256 LibPDF/CFF: Make parse_top_dict() return all top dicts
This happens for CFFs that contain multiple fonts. This doesn't
happen in practice, but the same code will be used for fdarray
parsing, which will contain several dicts.

No behavior change.
2024-02-15 12:32:31 +01:00
Nico Weber
9f1cf8babc LibPDF/CFF: Extract parse_top_dict() function
Pure code move, no behavior change.
2024-02-15 12:32:31 +01:00
Nico Weber
eb4632e08a LibPDF: Give CFF built-in encoding and charset arrays an underlying type
These arrays store SIDs ("String IDs"), so give them that type now
that we have to_array() and it's easy to do.

No behavior change.
2024-02-14 06:56:43 +01:00
Nico Weber
ddbcd901d1 LibPDF: Separate Type0 CMap errors
No behavior change, just more granular "not implemented" diagnostics.
2024-02-13 19:46:31 +01:00
Nico Weber
8e50bbc9fb LibPDF: Add string drawing code for Type0Fonts
This is very similar to SimpleFont::draw_string() for now, but
it'll become a bit different when we add support for vertical
text.

CIDFontType now only needs to draw single glyphs. Neither of the
subclasses can do that yet, so no behavior change yet.
2024-02-13 19:46:18 +01:00
Nico Weber
eaa568210f LibPDF: Split CCITT errors by group 2024-02-13 19:45:47 +01:00
Nico Weber
c201825cc8 LibPDF: Read CCITT decode params
We don't do anything with them yet, so no behavior change.
2024-02-13 19:45:47 +01:00
Nico Weber
454a10774e LibPDF: Let Filter::handle_lzw_and_flate_parameters() read decode params
...instead of reading them in Filter::decode() for all filters and
then passing them around to only the LZW and flate filters.

(EarlyChange is LZWDecode-only, so that's read there instead.)

No behavior change.
2024-02-13 19:45:47 +01:00
Nico Weber
9875ce0c78 LibPDF: Reorder loops in SampledFunction::evaluate()
Previously, we'd loop over the index of the output coordinate,
for example for a CMYK->RGB function, we'd loop over RGB. For
every output index, we'd then sample the function at the CMYK
input point.

Now, we sample at CMYK once and return a span for all outputs,
since they're stored in contiguous memory. And we then loop
over the outputs only to do weighting and mapping to the target
range at the end.

Reduces the runtime of

      (cd Tests/LibPDF; \
          ../../Build/lagom/bin/BenchmarkPDF --benchmark_repetitions 5)

from 235.6±2.3ms to 103.2±3.3ms on my system, and makes
SampledFunction::evaluate() more similar to lerp_nd() in TagTypes.h.
2024-02-13 19:45:19 +01:00
Nico Weber
751185cb76 LibPDF: Scale default glyph width by font size and x scale
This fixes rendering of commas in 0000941.pdf page 1. The commas
use the default width, and without this they show up very large,
covering the page.

Also, it's nice that the code now looks like the regular case 4 lines
further up.
2024-02-12 14:32:04 +00:00
Nico Weber
7ab4e53b99 LibPDF/CFF: Add code for fdselect parsing
This is one of the two top dict entries we need for CID-keyed fonts.
We don't send any CID-keyed font data into the CFF parser yet,
so no behavior change.
2024-02-12 14:05:16 +01:00
Nico Weber
6ebddab448 LibPDF/CFF: Add enum values for CID-keyed font top dict entries
No behavior change.
2024-02-12 14:05:16 +01:00
Nico Weber
6df0150671 LibPDF: Add some CIDFontType0C scaffolding
No real behavior change. We don't actually load the CFF data yet
(blocked on #23136 and some more), and we don't have drawing code
yet, and Type0Font::draw_string() doesn't do any drawing yet.

But it's a step in the right direction.
2024-02-12 13:59:00 +01:00