1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-14 09:24:57 +00:00
Commit graph

29 commits

Author SHA1 Message Date
Nico Weber
83128d093e LibPDF: Implement most of the spec algorithm for picking TrueType glyphs
Non-CID-keyed fonts in PDFs have 8-bit codepoints which are mapped from
bytes to character names via encoding.

TrueType fonts don't index glyphs by name (Type1 fonts do), so the fix
(codified in the spec) was to make a list of all possible glyph names
and map those to (16-bit) unicode values, and then pass those into the
truetype cmap.

(As a fallback, we're supposed to look at the optional names in the
font's "post" table. That part isn't implemented here yet.)

(Note that this affects the behavior of fallback fonts for TrueType
fonts, but not yet fallback fonts for Type1 fonts, and neither the
behavior of the 14 built-in Type1 fonts (which we implement as
fallback fonts), since the TrueType fallback in Type1Font.cpp does
not use this algorithm yet. This will be fixed in a future patch.)
2024-02-25 15:15:20 +01:00
Nico Weber
9c762b9650 LibPDF+Meta: Use a CMYK ICC profile to convert CMYK to RGB
CMYK data describes which inks a printer should use to print a color.
If a screen should display a color that's supposed to look similar
to what the printer produces, it results in a color very different
to what Color::from_cmyk() produces. (It's also printer-dependent.)

There are many ICC profiles describing printing processes. It doesn't
matter too much which one we use -- most of them look somewhat
similar, and they all look dramatically better than Color::from_cmyk().

This patch adds a function to download a zip file that Adobe offers
on their web site. They even have a page for redistribution:
https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html

(That one leads to a broken download though, so this downloads the
end-user version.)

In case we have to move off this download at some point, there are also
a whole bunch of profiles at https://www.color.org/registry/index.xalter
that "may be used, embedded, exchanged, and shared without restriction".

The adobe zip contains a whole bunch of other useful and fun profiles,
so I went with it.

For now, this only unzips the USWebCoatedSWOP.icc file though, and
installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In
Serenity builds, this will make it to /res/icc/Adobe/CMYK in the
disk image. And in lagom build, after #23016 this is the
lagom res staging directory that tools can install via
Core::ResourceImplementation. `pdf` and `MacPDF` already do that,
`TestPDF` now does it too.

The final piece is that LibPDF then loads the profile from there
and uses it for DeviceCMYK color conversions.

(Doing file access from the bowels of a library is a bit weird,
especially in a system that has sandboxing built in. But LibGfx does
that in FontDatabase too already, and LibPDF uses that, so it's not a
new problem.)
2024-02-01 13:42:04 -07:00
Timothy Flynn
aa0a6d58b2 Userland: Remove LibCore dependency from libraries that do not use it 2024-01-22 08:48:34 -05:00
Nico Weber
4cd1a2d319 LibPDF: Add some scaffolding for type 3 fonts 2023-11-17 19:47:53 +00:00
Nico Weber
9204252d02 LibPDF: Add scaffolding for function objects
See PDF 1.7 Spec, "3.9 Functions".
2023-11-06 10:01:05 +01:00
Nico Weber
69c965b987 LibPDF: Move code to compute full page contents into Page
Pure code move, no behavior change.
2023-07-12 18:22:35 -04:00
Rodrigo Tobar
cb04e4e9da LibPDF: Refactor *Font classes
The PDFFont class hierarchy was very simple (a top-level PDFFont class,
followed by all the children classes that derived directly from it).
While this design was good enough for some things, it didn't correctly
model the actual organization of font types:

 * PDF fonts are first divided between "simple" and "composite" fonts.
   The latter is the Type0 font, while the rest are all simple.
 * PDF fonts yield a glyph per "character code". Simple fonts char codes
   are always 1 byte long, while Type0 char codes are of variable size.

To this effect, this commit changes the hierarchy of Font classes,
introducing a new SimpleFont class, deriving from PDFFont, and acting as
the parent of Type1Font and TrueTypeFont, while Type0 still derives from
PDFFont directly. This distinction allows us now to:

 * Model string rendering differently from simple and composite fonts:
   PDFFont now offers a generic draw_string method that takes a whole
   string to be rendered instead of a single char code. SimpleFont
   implements this as a loop over individual bytes of the string, with
   T1 and TT implementing draw_glyph for drawing a single char code.
 * Some common fields between T1 and TT fonts now live under SimpleFont
   instead of under PDFfont, where they previously resided.
 * Some other interfaces specific to SimpleFont have been cleaned up,
   with u16/u32 not appearing on these classes (or in PDFFont) anymore.
 * Type0Font's rendering still remains unimplemented.

As part of this exercise I also took the chance to perform the following
cleanups and restructurings:

 * Refactored the creation and initialisation of fonts. They are all
   centrally created at PDFFont::create, with a virtual "initialize"
   method that allows them to initialise their inner members in the
   correct order (parent first, child later) after creation.
 * Removed duplicated code.
 * Cleaned up some public interfaces: receive const refs, removed
   unnecessary ctro/dtors, etc.
 * Slightly changed how Type1 and TrueType fonts are implemented: if
   there's an embedded font that takes priority, otherwise we always
   look for a replacement.
 * This means we don't do anything special for the standard fonts. The
   only behavior previously associated to standard fonts was choosing an
   encoding, and even that was under questioning.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
c4b45a82cd LibPDF: Add initial CFF parsing
The Compat Font Format specification (Adobe's Technical Note #5176) is
used by PDF's Type1C fonts to store their data. While being similar in
spirit to PS1 Type 1 Font Programs, it was designed for a more compact
representation and thus space reduction (but an increment on
complexity). It also shares most of the charstring encoding logic, which
is why the CFF class also inherits from Type1FontProgram.

This initial implementation is still lacking many details, e.g.:

 * It doesn't include all the built-in CFF SIDs
 * It doesn't support CFF-provided SIDs (defaults those glyphs to the
   space character)
 * More checks in general
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
416585f75a LibPDF: Add new Type1FontProgram base class
We are planning to add support for CFF fonts to read Type1 fonts, and
therefore much of the logic already found in PS1FontProgram will be
useful for representing the Type1 fonts read from CFF.

This commit moves the PS1-independent bits of PS1FontProgram into a new
Type1FontProgram base class that can be used as the base for CFF-based
Type1 fonts in the future. The Type1Font class uses this new type now
instead of storing a PS1FontProgram pointer. While doing this
refactoring I also took care of making some minor adjustments to the
PS1FontProgram API, namely:

 * Its create() method is static and returns a
   NonnullRefPtr<Type1FontProgram>.
 * Many (all?) of the parse_* methods are now static.
 * Added const where possible.

Notably, the Type1FontProgram also contains at the moment the code that
parses the CharString data from the PS1 program. This logic is very
similar in CFF files, so after some minor adjustments later on it should
be possible to reuse most of it.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
2331fe5e68 LibPDF: Add first interpolation methods
Interpolation is needed in more than one place, and I couldn't find a
central place where I could borrow a readily available interpolation
routine, so I've implemented the first simple interpolation object. More
will follow for more complex scenarios.
2022-12-10 10:49:03 +01:00
Tim Schumacher
ce2f1b845f Everywhere: Mark dependencies of most targets as PRIVATE
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.

Also included are changes to readd now missing dependencies to tools
that actually need them.
2022-11-01 14:49:09 +00:00
Tim Schumacher
7834e26ddb Everywhere: Explicitly link all binaries against the LibC target
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.

This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.

Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
2022-11-01 14:49:09 +00:00
Julian Offenhäuser
b14f0950a5 LibPDF: Add very basic support for Adobe Type 1 font rendering
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.

We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
4887aacec7 LibPDF: Move document-specific parsing functionality into its own class
The Parser class is now a generic PDF object parser, of which the new
DocumentParser class derives. DocumentParser now takes over all
functions relating to linearization, pages, xref and trailer handling.

This allows the use of multiple parsers in the same document's
context, which will be needed in order to handle PDF object streams.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
9f4659cc63 LibPDF: Move consume and match helper functions to the Reader class 2022-09-17 10:07:14 +01:00
Matthew Olsson
4d0f74a15c LibPDF: Add Type0 and TrueType fonts 2022-03-31 18:10:45 +02:00
Matthew Olsson
5f9d35909d LibPDF: Move font files into their own directory 2022-03-31 18:10:45 +02:00
Matthew Olsson
5b316462b2 LibPDF: Add implementation of the Standard security handler
Security handlers manage encryption and decription of PDF files. The
standard security handler uses RC4/MD5 to perform its crypto (AES as
well, but that is not yet implemented).
2022-03-29 02:52:57 +02:00
Matthew Olsson
0624472768 LibPDF: Add initial support for Type1 fonts
This is enough to get a char code -> code point mapping
2022-03-29 02:52:57 +02:00
Matthew Olsson
8441fa2bc4 LibPDF: Add support for builtin and custom Encodings 2022-03-29 02:52:57 +02:00
Ben Wiederhake
edc0cd29f8 LibPDF: Break weird dependency cycle
Old situation:
Object.h defines Object
Object.h defines ArrayObject
ArrayObject requires the definition of Object
ArrayObject requires the definition of Value
Value.h defines Value
Value requires the definition of Object

Therefore, a file with the single line "#include <Value.h>" used to
raise compilation errors; certainly not something that one might expect
from a library.

This patch splits up the definitions in Object.h to break the cycle.
Now, Object.h only defines Object, Value.h still only defines Value (and
includes Object.h), and the new header ObjectDerivatives.h defines
ArrayObject (and includes both Object.h and Value.h).
2021-09-20 17:39:36 +04:30
Matthew Olsson
7b4e36bf88 LibPDF: Split ColorSpace into a different class for each color space
While unnecessary at the moment, this will allow for more fine-grained
control when complex color spaces get added.
2021-06-12 22:45:01 +04:30
Matthew Olsson
78f3bad7e6 LibPDF: Pre-initialize common FlyStrings in CommonNames.h 2021-05-25 00:24:09 +04:30
Matthew Olsson
67b65dffa8 LibPDF: Handle string encodings
Strings can be encoded in either UTF16-BE or UTF8. In either case,
there are a few initial bytes which specify the encoding that must
be checked and also removed from the final string.
2021-05-25 00:24:09 +04:30
Matthew Olsson
477e3946e5 LibPDF: Add support for stream filters
This commit also splits up StreamObject into PlainTextStreamObject and
EncodedStreamObject, which is essentially just a stream object which
does not own its bytes vs one which does.
2021-05-25 00:24:09 +04:30
Matthew Olsson
4479c1bff0 LibPDF: Add a bitmap renderer
This commit adds the Renderer class, which is responsible for rendering
a page into a Gfx::Bitmap. There are many improvements to make here,
but this is a great start!
2021-05-18 16:35:23 +02:00
Matthew Olsson
8c745ad0d9 LibPDF: Parse page structures
This commit introduces the ability to parse the document catalog dict,
as well as the page tree and individual pages. Pages obviously aren't
fully parsed, as we won't care about most of the fields until we
start actually rendering PDFs.

One of the primary benefits of the PDF format is laziness. PDFs are
not meant to be parsed all at once, and the same is true for pages.
When a Document is constructed, it builds a map of page number to
object index, but it does not fetch and parse any of the pages. A page
is only parsed when a caller requests that particular page (and is
cached going forwards).

Additionally, this commit also adds an object_cast function which
logs bad casts if DEBUG_PDF is set. Additionally, utility functions
were added to ArrayObject and DictObject to get all types of objects
from the collections to avoid having to manually cast.
2021-05-10 10:32:39 +02:00
Matthew Olsson
72f693e9ed LibPDF: Add a basic parser and Document structure
This commit adds a parser as well as the Reader class, which serves
as a utility to aid in reading the PDF both forwards and in reverse.
The parser currently is capable of reading xref tables, as well as
all values. We don't really do anything with any of this information,
however.
2021-05-10 10:32:39 +02:00
Matthew Olsson
a8f5b6aaa3 LibPDF: Create basic object structure
This commit is the start of LibPDF, and introduces some basic structure
objects. This emulates LibJS's Value structure, where Value is a simple
class that can contain a pointer to a more complex Object class with
more data. All of the basic PDF objects have a representation.
2021-05-10 10:32:39 +02:00