The Compat Font Format specification (Adobe's Technical Note #5176) is
used by PDF's Type1C fonts to store their data. While being similar in
spirit to PS1 Type 1 Font Programs, it was designed for a more compact
representation and thus space reduction (but an increment on
complexity). It also shares most of the charstring encoding logic, which
is why the CFF class also inherits from Type1FontProgram.
This initial implementation is still lacking many details, e.g.:
* It doesn't include all the built-in CFF SIDs
* It doesn't support CFF-provided SIDs (defaults those glyphs to the
space character)
* More checks in general
The Type1FontProgram logic was based on the Adobe Type 1 Font Format; in
particular, it implemented the CharStrings Dictionary section
(charstring decoding, and most commands). In the case of Type1, these
charstrings are read from a PS1 diciontary, with one entry per character
in the font's charset. This has served us well for Type1 font rendering.
When implementing Type1C font rendering, this wasn't enough. Type1C PDF
fonts are specified in embedded CFF (Compact Font File) streams, which
also contain a charstring dictionary with an entry for each character in
the font's charset. These entries can be slightly different from those
in a PS1 Font Program though: depending on a flag in the CFF, the
entries will be encoded either in the original charstring format from
the Adobe Type 1 Font Format, or in the "Type 2 Charstring Format"
(Adobe's Technical Note #1577). This new format is for the most part a
super-set of the original, with small differences, all in the name of
making the representation as compact as possible:
* The glyph's width is not specified via a separate command; instead
it's an optional additional argument to the first command of the
charstring stream (and even then, it's only the *difference* to a
nominal character width specified in the CFF).
* The interpretation of a 4-byte number is different from Type 1: in
Type 1 this is a 4-byte unsigned integer, whereas in Type 1 it's a
fixed decimal with 16 bits of fractional part.
* Many commands accept a variable set of arguments, so they can draw
more than one line/curve on a single go. These are all
retro-compatible with Type 1's commands.
All these changes are implemented in this patch in a
backwards-compatible way. To ensure Type 1/2 behavior is accessed, a new
parameter indicates which behavior is desired when decoding the
charstring stream.
I also took the chance to centralise some logic that was previously
duplicated across the parse_glyph function. Common lambdas capture the
logic for moving to, or drawing a line/curve to a given point and
updating the glyph state. Similarly, some command logic, including
reading parameters, are shared by several commands. Finally, I've
re-organised the cases in the main switch to group together related
commands.
We are planning to add support for CFF fonts to read Type1 fonts, and
therefore much of the logic already found in PS1FontProgram will be
useful for representing the Type1 fonts read from CFF.
This commit moves the PS1-independent bits of PS1FontProgram into a new
Type1FontProgram base class that can be used as the base for CFF-based
Type1 fonts in the future. The Type1Font class uses this new type now
instead of storing a PS1FontProgram pointer. While doing this
refactoring I also took care of making some minor adjustments to the
PS1FontProgram API, namely:
* Its create() method is static and returns a
NonnullRefPtr<Type1FontProgram>.
* Many (all?) of the parse_* methods are now static.
* Added const where possible.
Notably, the Type1FontProgram also contains at the moment the code that
parses the CharString data from the PS1 program. This logic is very
similar in CFF files, so after some minor adjustments later on it should
be possible to reuse most of it.
This might not be an issue at the moment, but moved-from objects are
usually in a unspecifed but valid state, meaning that we shouldn't read
from them.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
Previously, we would assume that all standard 14 fonts use a
TrueTypeFont dictionary. Now we render them in Type1Font as well,
given that it doesn't contain a PostScript font program.
This command is meant to print an Standard Encoding Accented Character.
It's not critical to implement it yet, but if we want to render more
documents we need to handle the instruction, even if simply ignore it.
Fonts with the encoding name "WinAnsiEncoding" should render missing
characters above character code 040 (octal) as a "bullet" character.
This patch adds Encoding::should_map_to_bullet(char_code) which is then
called by char_code_to_code_point() to check if the given char code
should be displayed as a bullet instead.
I didn't have a good way to test this, so I've only verified that it
works by manually overriding inputs to the function during the rendering
stage.
This takes care of a FIXME in the Annex D part of the PDF specification.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
It was previously the job of the renderer to create fonts, load
replacements for the standard 14 fonts and to pass the font size back
to the PDFFont when asking for glyph widths.
Now, the renderer tells the font its size at creation, as it doesn't
change throughout the life of the font. The PDFFont itself is now
responsible to decide whether or not it needs to use a replacement
font, which still is Liberation Serif for now.
This means that we can now render embedded TrueType fonts as well :^)
It also makes the renderer's job much more simple and leads to a much
cleaner API design.
We would previously pass this function a unicode code point, which is
not actually what we want here.
Instead, we want the "raw" code point, with the font itself deciding
whether or not it needs to be re-mapped.
This same mistake in terminology applied to PS1FontProgram.
This gives much better visual results than painting the path directly.
It also has the nice side effect that Type 1 fonts will now look much
more similar to TrueType fonts, which use the same class :^)
In addition, we can now cache glyph bitmaps for repeated use.
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.
We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
A PDFFont can now be asked for its specific type and whether it is part
of the standard 14 fonts. It now also contains a method to draw a
glyph, which is stubbed-out for now.
This will be useful for the renderer to take into consideration when
drawing text, since we don't include replacements for the standard set
of fonts yet, but still want to make use of embedded fonts when
available.