serenity

mirror of https://github.com/RGBCube/serenity synced 2025-05-14 09:24:57 +00:00

Author	SHA1	Message	Date
Nico Weber	83128d093e	LibPDF: Implement most of the spec algorithm for picking TrueType glyphs Non-CID-keyed fonts in PDFs have 8-bit codepoints which are mapped from bytes to character names via encoding. TrueType fonts don't index glyphs by name (Type1 fonts do), so the fix (codified in the spec) was to make a list of all possible glyph names and map those to (16-bit) unicode values, and then pass those into the truetype cmap. (As a fallback, we're supposed to look at the optional names in the font's "post" table. That part isn't implemented here yet.) (Note that this affects the behavior of fallback fonts for TrueType fonts, but not yet fallback fonts for Type1 fonts, and neither the behavior of the 14 built-in Type1 fonts (which we implement as fallback fonts), since the TrueType fallback in Type1Font.cpp does not use this algorithm yet. This will be fixed in a future patch.)	2024-02-25 15:15:20 +01:00
Nico Weber	9c762b9650	LibPDF+Meta: Use a CMYK ICC profile to convert CMYK to RGB CMYK data describes which inks a printer should use to print a color. If a screen should display a color that's supposed to look similar to what the printer produces, it results in a color very different to what Color::from_cmyk() produces. (It's also printer-dependent.) There are many ICC profiles describing printing processes. It doesn't matter too much which one we use -- most of them look somewhat similar, and they all look dramatically better than Color::from_cmyk(). This patch adds a function to download a zip file that Adobe offers on their web site. They even have a page for redistribution: https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html (That one leads to a broken download though, so this downloads the end-user version.) In case we have to move off this download at some point, there are also a whole bunch of profiles at https://www.color.org/registry/index.xalter that "may be used, embedded, exchanged, and shared without restriction". The adobe zip contains a whole bunch of other useful and fun profiles, so I went with it. For now, this only unzips the USWebCoatedSWOP.icc file though, and installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In Serenity builds, this will make it to /res/icc/Adobe/CMYK in the disk image. And in lagom build, after #23016 this is the lagom res staging directory that tools can install via Core::ResourceImplementation. `pdf` and `MacPDF` already do that, `TestPDF` now does it too. The final piece is that LibPDF then loads the profile from there and uses it for DeviceCMYK color conversions. (Doing file access from the bowels of a library is a bit weird, especially in a system that has sandboxing built in. But LibGfx does that in FontDatabase too already, and LibPDF uses that, so it's not a new problem.)	2024-02-01 13:42:04 -07:00
Timothy Flynn	aa0a6d58b2	Userland: Remove LibCore dependency from libraries that do not use it	2024-01-22 08:48:34 -05:00
Nico Weber	4cd1a2d319	LibPDF: Add some scaffolding for type 3 fonts	2023-11-17 19:47:53 +00:00
Nico Weber	9204252d02	LibPDF: Add scaffolding for function objects See PDF 1.7 Spec, "3.9 Functions".	2023-11-06 10:01:05 +01:00
Nico Weber	69c965b987	LibPDF: Move code to compute full page contents into Page Pure code move, no behavior change.	2023-07-12 18:22:35 -04:00
Rodrigo Tobar	cb04e4e9da	LibPDF: Refactor Font classes The PDFFont class hierarchy was very simple (a top-level PDFFont class, followed by all the children classes that derived directly from it). While this design was good enough for some things, it didn't correctly model the actual organization of font types: PDF fonts are first divided between "simple" and "composite" fonts. The latter is the Type0 font, while the rest are all simple. * PDF fonts yield a glyph per "character code". Simple fonts char codes are always 1 byte long, while Type0 char codes are of variable size. To this effect, this commit changes the hierarchy of Font classes, introducing a new SimpleFont class, deriving from PDFFont, and acting as the parent of Type1Font and TrueTypeFont, while Type0 still derives from PDFFont directly. This distinction allows us now to: * Model string rendering differently from simple and composite fonts: PDFFont now offers a generic draw_string method that takes a whole string to be rendered instead of a single char code. SimpleFont implements this as a loop over individual bytes of the string, with T1 and TT implementing draw_glyph for drawing a single char code. * Some common fields between T1 and TT fonts now live under SimpleFont instead of under PDFfont, where they previously resided. * Some other interfaces specific to SimpleFont have been cleaned up, with u16/u32 not appearing on these classes (or in PDFFont) anymore. * Type0Font's rendering still remains unimplemented. As part of this exercise I also took the chance to perform the following cleanups and restructurings: * Refactored the creation and initialisation of fonts. They are all centrally created at PDFFont::create, with a virtual "initialize" method that allows them to initialise their inner members in the correct order (parent first, child later) after creation. * Removed duplicated code. * Cleaned up some public interfaces: receive const refs, removed unnecessary ctro/dtors, etc. * Slightly changed how Type1 and TrueType fonts are implemented: if there's an embedded font that takes priority, otherwise we always look for a replacement. * This means we don't do anything special for the standard fonts. The only behavior previously associated to standard fonts was choosing an encoding, and even that was under questioning.	2023-02-24 20:16:50 +01:00
Rodrigo Tobar	c4b45a82cd	LibPDF: Add initial CFF parsing The Compat Font Format specification (Adobe's Technical Note #5176) is used by PDF's Type1C fonts to store their data. While being similar in spirit to PS1 Type 1 Font Programs, it was designed for a more compact representation and thus space reduction (but an increment on complexity). It also shares most of the charstring encoding logic, which is why the CFF class also inherits from Type1FontProgram. This initial implementation is still lacking many details, e.g.: * It doesn't include all the built-in CFF SIDs * It doesn't support CFF-provided SIDs (defaults those glyphs to the space character) * More checks in general	2023-01-25 15:40:11 +01:00
Rodrigo Tobar	416585f75a	LibPDF: Add new Type1FontProgram base class We are planning to add support for CFF fonts to read Type1 fonts, and therefore much of the logic already found in PS1FontProgram will be useful for representing the Type1 fonts read from CFF. This commit moves the PS1-independent bits of PS1FontProgram into a new Type1FontProgram base class that can be used as the base for CFF-based Type1 fonts in the future. The Type1Font class uses this new type now instead of storing a PS1FontProgram pointer. While doing this refactoring I also took care of making some minor adjustments to the PS1FontProgram API, namely: * Its create() method is static and returns a NonnullRefPtr<Type1FontProgram>. * Many (all?) of the parse_* methods are now static. * Added const where possible. Notably, the Type1FontProgram also contains at the moment the code that parses the CharString data from the PS1 program. This logic is very similar in CFF files, so after some minor adjustments later on it should be possible to reuse most of it.	2023-01-25 15:40:11 +01:00
Rodrigo Tobar	2331fe5e68	LibPDF: Add first interpolation methods Interpolation is needed in more than one place, and I couldn't find a central place where I could borrow a readily available interpolation routine, so I've implemented the first simple interpolation object. More will follow for more complex scenarios.	2022-12-10 10:49:03 +01:00
Tim Schumacher	ce2f1b845f	Everywhere: Mark dependencies of most targets as PRIVATE Otherwise, we end up propagating those dependencies into targets that link against that library, which creates unnecessary link-time dependencies. Also included are changes to readd now missing dependencies to tools that actually need them.	2022-11-01 14:49:09 +00:00
Tim Schumacher	7834e26ddb	Everywhere: Explicitly link all binaries against the LibC target Even though the toolchain implicitly links against -lc, it does not know where it should get LibC from except for the sysroot. In the case of Clang this causes it to pick up the LibC stub instead, which might be slightly outdated and feature missing symbols. This is currently not an issue that manifests because we pass through the dependency on LibC and other libraries by accident, which causes CMake to link against the LibC target (instead of just the library), and thus points the linker at the build output directory. Since we are looking to fix that in the upcoming commits, let's make sure that everything will still be able to find the proper LibC first.	2022-11-01 14:49:09 +00:00
Julian Offenhäuser	b14f0950a5	LibPDF: Add very basic support for Adobe Type 1 font rendering Previously we would draw all text, no matter what font type, as Liberation Serif, which results in things like ugly character spacing. We now have partial support for drawing Type 1 glyphs, which are part of a PostScript font program. We completely ignore hinting for now, which results in ugly looking characters at low resolutions, but gain support for a large number of typefaces, including most of the default fonts used in TeX.	2022-10-16 17:44:54 +02:00
Julian Offenhäuser	4887aacec7	LibPDF: Move document-specific parsing functionality into its own class The Parser class is now a generic PDF object parser, of which the new DocumentParser class derives. DocumentParser now takes over all functions relating to linearization, pages, xref and trailer handling. This allows the use of multiple parsers in the same document's context, which will be needed in order to handle PDF object streams.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	9f4659cc63	LibPDF: Move consume and match helper functions to the Reader class	2022-09-17 10:07:14 +01:00
Matthew Olsson	4d0f74a15c	LibPDF: Add Type0 and TrueType fonts	2022-03-31 18:10:45 +02:00
Matthew Olsson	5f9d35909d	LibPDF: Move font files into their own directory	2022-03-31 18:10:45 +02:00
Matthew Olsson	5b316462b2	LibPDF: Add implementation of the Standard security handler Security handlers manage encryption and decription of PDF files. The standard security handler uses RC4/MD5 to perform its crypto (AES as well, but that is not yet implemented).	2022-03-29 02:52:57 +02:00
Matthew Olsson	0624472768	LibPDF: Add initial support for Type1 fonts This is enough to get a char code -> code point mapping	2022-03-29 02:52:57 +02:00
Matthew Olsson	8441fa2bc4	LibPDF: Add support for builtin and custom Encodings	2022-03-29 02:52:57 +02:00
Ben Wiederhake	edc0cd29f8	LibPDF: Break weird dependency cycle Old situation: Object.h defines Object Object.h defines ArrayObject ArrayObject requires the definition of Object ArrayObject requires the definition of Value Value.h defines Value Value requires the definition of Object Therefore, a file with the single line "#include <Value.h>" used to raise compilation errors; certainly not something that one might expect from a library. This patch splits up the definitions in Object.h to break the cycle. Now, Object.h only defines Object, Value.h still only defines Value (and includes Object.h), and the new header ObjectDerivatives.h defines ArrayObject (and includes both Object.h and Value.h).	2021-09-20 17:39:36 +04:30
Matthew Olsson	7b4e36bf88	LibPDF: Split ColorSpace into a different class for each color space While unnecessary at the moment, this will allow for more fine-grained control when complex color spaces get added.	2021-06-12 22:45:01 +04:30
Matthew Olsson	78f3bad7e6	LibPDF: Pre-initialize common FlyStrings in CommonNames.h	2021-05-25 00:24:09 +04:30
Matthew Olsson	67b65dffa8	LibPDF: Handle string encodings Strings can be encoded in either UTF16-BE or UTF8. In either case, there are a few initial bytes which specify the encoding that must be checked and also removed from the final string.	2021-05-25 00:24:09 +04:30
Matthew Olsson	477e3946e5	LibPDF: Add support for stream filters This commit also splits up StreamObject into PlainTextStreamObject and EncodedStreamObject, which is essentially just a stream object which does not own its bytes vs one which does.	2021-05-25 00:24:09 +04:30
Matthew Olsson	4479c1bff0	LibPDF: Add a bitmap renderer This commit adds the Renderer class, which is responsible for rendering a page into a Gfx::Bitmap. There are many improvements to make here, but this is a great start!	2021-05-18 16:35:23 +02:00
Matthew Olsson	8c745ad0d9	LibPDF: Parse page structures This commit introduces the ability to parse the document catalog dict, as well as the page tree and individual pages. Pages obviously aren't fully parsed, as we won't care about most of the fields until we start actually rendering PDFs. One of the primary benefits of the PDF format is laziness. PDFs are not meant to be parsed all at once, and the same is true for pages. When a Document is constructed, it builds a map of page number to object index, but it does not fetch and parse any of the pages. A page is only parsed when a caller requests that particular page (and is cached going forwards). Additionally, this commit also adds an object_cast function which logs bad casts if DEBUG_PDF is set. Additionally, utility functions were added to ArrayObject and DictObject to get all types of objects from the collections to avoid having to manually cast.	2021-05-10 10:32:39 +02:00
Matthew Olsson	72f693e9ed	LibPDF: Add a basic parser and Document structure This commit adds a parser as well as the Reader class, which serves as a utility to aid in reading the PDF both forwards and in reverse. The parser currently is capable of reading xref tables, as well as all values. We don't really do anything with any of this information, however.	2021-05-10 10:32:39 +02:00
Matthew Olsson	a8f5b6aaa3	LibPDF: Create basic object structure This commit is the start of LibPDF, and introduces some basic structure objects. This emulates LibJS's Value structure, where Value is a simple class that can contain a pointer to a more complex Object class with more data. All of the basic PDF objects have a representation.	2021-05-10 10:32:39 +02:00

29 commits