1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-24 05:02:33 +00:00
Commit graph

238 commits

Author SHA1 Message Date
Sam Atkins
2db168acc1 LibTextCodec+Everywhere: Port Decoders to new Strings 2023-02-19 17:15:47 +01:00
Lucas CHOLLET
856d0202f2 LibGfx: Rename JPGLoader to JPEGLoader
The patch also contains modifications on several classes, functions or
files that are related to the `JPGLoader`.

Renaming include:
 - JPGLoader{.h, .cpp}
 - JPGImageDecoderPlugin
 - JPGLoadingContext
 - JPG_DEBUG
 - decode_jpg
 - FuzzJPGLoader.cpp
 - Few string literals or texts
2023-02-18 23:56:24 +01:00
Sam Atkins
d6075ef5b5 LibTextCodec+Everywhere: Make TextCodec::decoder_for() take a StringView
We don't need a full String/DeprecatedString inside this function, so we
might as well not force users to create one.
2023-02-15 12:48:26 -05:00
Rodrigo Tobar
c4507bb56e LibPDF: Add more built-in SIDs
The first iteration has enough SIDs to display simple documents, but
when trying more and more documents we started to need more of these
SIDs to be properly defined. This is a copy/paste exercise from the CFF
document, which is tedious, so it will continue in small drops.

This commit fills all the gaps until SID 228, which covers all the
ISOAdobe space, and should be enough for most use cases. Since this is a
continuous space starting at 0, we now use an Array instead of a Map to
store these names, which should be more performant. Also to simplify
things I've moved the Array out of the CFF class, making it a simpler
static variable, which allows us to use template type deduction.
2023-02-13 00:23:17 +00:00
Julian Offenhäuser
1f27c47973 LibPDF: Check for end of stream in Reader::matches_regular_character()
The way this was set up before, this function would return "true" if
the underlying stream had ended, which would cause us to try to read
past the end in some edge cases.
2023-02-12 10:55:37 +00:00
Julian Offenhäuser
a2b57dd188 LibPDF: Return an error if we fail to load a replacement font 2023-02-12 10:55:37 +00:00
Julian Offenhäuser
96064ec5af LibPDF: Allow filter DecodeParms array entries to be null
Filters will use the default values in this case.
2023-02-12 10:55:37 +00:00
Julian Offenhäuser
34350ee9e7 LibPDF: Allow reading documents with incremental updates
The PDF spec allows incremental changes of a document by appending a
new XRef table and file trailer to it. These will only contain the
changed objects and will point back to the previous change, forming an
arbitrarily long chain of XRef sections and file trailers.

Every one of those XRef sections may be encoded as an XRef stream as
well, in which case the trailer is part of the stream dictionary as
usual. To make this easier, I made it so every XRef table may "own" a
trailer. This means that the main file trailer is now part of the main
XRef table.
2023-02-12 10:55:37 +00:00
Julian Offenhäuser
4f4bd3793f LibPDF: Fix glyph sizing bug that caused incorrect spacing
When loading OpenType fonts, either as a replacement for the standard
14 fonts or an embedded one, we previously passed the font size as the
_point_ size to the loader class. The difference is quite subtle, being
that Gfx::ScaledFont uses the optional dpi parameter to convert the
input from inches to pixels.

This meant that our glyphs were exactly 1.333% too large, causing them
to overlap in places.
2023-02-10 15:37:51 +01:00
Julian Offenhäuser
152a8c5c43 LibPDF: Use more appropriate standard 14 replacement fonts
The mapping of standard font to replacement now looks like this:

Times New Roman -> Liberation Serif
Courier -> Liberation Mono
Helvetica, Arial -> Liberation Sans
2023-02-10 15:37:51 +01:00
MacDue
63b11030f0 Everywhere: Use ReadonlySpan<T> instead of Span<T const> 2023-02-08 19:15:45 +00:00
Rodrigo Tobar
e4a7606b81 LibPDF: Construct accented characters with Type1 seac command
The seac command provides the base and accented character that are
needed to create an accented character glyph. Storing these values is
all that was left to properly support these composed glyphs.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
3eaa27f53a LibPDF: Add infrastructure for accented character glyphs
Type1 accented character glyphs are composed of two other glyphs in the
same font: a base glyph and an accent glyph, given as char codes in the
standard encoding. These two glyphs are then composed together to form
the accented character.

This commit adds the data structures to hold the information for
accented characters, and also the routine that composes the final glyph
path out of the two individual components. All glyphs must have been
loaded by the time this composition takes place, and thus a new
protected consolidate_glyphs() routine has been added to perform this
calculation.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
11a9bfd4b6 LibPDF: Turn Glyph into a class
Glyph was a simple structure, but even now it's become more complex that
it was initially. Turning it into a class hides some of that complexity,
and make sit easier to understand to external eyes.

While doing this I also decided to remove the float + bool combo for
keeping track of the glyph's width, and replaced it with an Optional
instead.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
c084943457 LibPDF: Index Type1 glyphs by name, not char code
Storing glyphs indexed by char code in a Type1 Font Program binds a Font
Program instance to the particular Encoding that was used at Font
Program construction time. This makes it difficult to reuse Font Program
instances against different Encodings, which would be otherwise
possible.

This commit changes how we store the glyphs on Type1 Font Programs.
Instead of storing them on a map indexed by char code, the map is now
indexed by glyph name. In turn, when rendering a glyph we use the
Encoding object to turn the char code into a glyph name, which in turn
is used to index into the map of glyphs.

This is the first step towards reusability of Type1 Font Programs. It
also unlocks the ability to render glyphs that are described via the
"seac" command (standard encoding accented character), which requires
accessing the base and accent glyphs by name.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
596119cf3e LibPDF: Add placeholders for *flex Type2 commands
These should be implemented properly in the future, but for now we are
adding the as placeholders to avoid crashes.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
64bbe431b5 LibPDF: Add char_code -> name mapping function
We already keep both mappings internally, now it's time to actually use
it.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
82bd854d6f LibPDF: Account for other endings of PS1 Encoding array 2023-02-08 19:47:15 +01:00
Rodrigo Tobar
a533ea7ae6 LibPDF: Improve stream parsing
When parsing streams we rely on a /Length item being defined in the
stream's dictionary to know how much data comprises the stream. Its
value is usually a direct value, but it can be indirect. There was
however a contradiction in the code: the condition that allowed it to
read and use the /Length value required it to be a direct value, but the
actual code using the value would have worked with indirect ones. This
meant that indirect /Length values triggered the fallback, "manual"
stream parsing code.

On the other hand, this latter code was also buggy, because it relied on
the "endstream" keyword to appear on a separate line, which isn't always
the case.

This commit both fixes the bug in the manual stream parsing scenario,
while also allowing for indirect /Length values to be used to parse
streams more directly and avoid the manual approach. The main caveat to
this second change is that for a brief period of time the Document is
not able to resolve references (i.e., before the xref table itself is
not parsed). Any parsing happening before that (e..g, the linearization
dictionary) must therefore use the manual stream parsing approach.
2023-02-08 19:47:15 +01:00
Tim Schumacher
220fbcaa7e AK: Remove the fallible constructor from FixedMemoryStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
261d62438f AK: Remove the fallible constructor from LittleEndianInputBitStream 2023-02-08 17:44:32 +00:00
Rodrigo Tobar
82bac7e665 LibPDF: Fix clipping of painting operations
While the clipping logic was correct (current v/s new clipping path),
the clipping path contents weren't. This commit fixed that.

We calculate the clipping path in two places: when we set it to be the
whole page at graphics state creation time, and when we perform clipping
path intersection to calculate a new clipping path. The clipping path is
then used to limit painting by passing it to the painter (more
precisely, but passing its bounding box to the painter, as the latter
doesn't support arbitrary path clipping). For this last point the
clipping path must be in device coordinates.

There was however a mix of coordinate systems involved in the creation,
update and usage of the clipping path:

 * The initial values of the path (i.e., the whole page) were in user
   coordinates.
 * Clipping path intersection was performed against m_current_path,
   which is in device coordinates.
 * To perform the clipping operation, the current clipping path was
   assumed to be in user coordinates.

This mix resulted in the clipping not working correctly depending on the
zoom level at which one visualised a page.

This commit fixes the issue by always keeping track of the clipping path
in device coordinates. This means that the initial full-page contents
are now converted to device coordinates before putting them in the
graphics state, and that no mapping is performed when applied the
clipping to the painter.
2023-02-04 12:29:57 +01:00
Rodrigo Tobar
286e3e6872 LibPDF: Simplify Encoding to align with simple font requirements
All "Simple Fonts" in PDF (all but Type0 fonts) have the property that
glyphs are selected with single byte character codes. This means that
the Encoding objects should use u8 for representing these character
codes. Moreover, and as mentioned in a previous commit, there is no need
to store the unicode code point associated with a character (which was
in turn wrongly associated to a glyph).

This commit greatly simplifies the Encoding class. Namely it:

 * Removes the unnecessary CharDescriptor class.
 * Changes the internal maps to be u8 -> FlyString and vice-versa,
   effectively providing two-way lookups.
 * Adds a new method to set a two-way u8 -> FlyString mapping and uses
   it in all possible places.
 * Simplified the creation of Encoding objects.
 * Changes how the WinAnsi special treatment for bullet points is
   implemented.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
fb0c3a9e18 LibPDF: Stop calculating code points for glyphs
When rendering text, a sequence of bytes corresponds to a glyph, but not
necessarily to a character. This misunderstanding permeated through the
Encoding through to the Font classes, which were all trying to calculate
such values. Moreover, this was done only to identify "space"
characters/glyphs, which were getting a special treatment (e.g., avoid
rendering). Spaces are not special though -- there might be fonts that
render something for them -- and thus should not be skipped
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
7c42d6c737 LibPDF: Fix ZapfDingbat's char codes
The initial values were fine, but those starting at 100 were wrong: they
are all octal values, but since they were missing an initial 0 they were
interpreted as decimals.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
2f773b3c5c LibPDF: Stop storing unicode code points in Encoding
In PDF's fonts, encoding objects are used to translate bytes into fonts'
glyphs. Glyphs (in the fonts we currently support) organise their glyphs
in such a way that they are accessed by name, and thus encoding
translate between a byte sequence and a glyph name.

Note that an no point this translation includes a Unicode character, and
therefore assigning a character to a glyph in the Encoding object is the
wrong thing to do. Moreover, using the code point for this character
during the byte-sequence-to-glyph translation sequence is double-wrong.

This commit removes the characters associated to each translation in the
built-in Encoding objects. In order to keep commits short and sweet, I'm
currently simply removing the character from the enumeration, leaving
the old structure this information was held on intact. Instead, I'm
filling the "code_point" member with a zero, and filling both mappings
(which will be changed later on too) with the glyph name and the
associated char code.
2023-02-02 14:50:38 +01:00
Tim Schumacher
093cf428a3 AK: Move memory streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
ae64b68717 AK: Deprecate the old AK::Stream
This also removes a few cases where the respective header wasn't
actually required to be included.
2023-01-29 19:16:44 -07:00
Sam Atkins
bc1504c794 LibPDF: Remove declarations for non-existent methods 2023-01-27 20:33:18 +00:00
Tim Schumacher
82a152b696 LibGfx: Remove try_ prefix from bitmap creation functions
Those don't have any non-try counterpart, so we might as well just omit
it.
2023-01-26 20:24:37 +00:00
Rodrigo Tobar
3588709986 LibPDF: Load Type1C fonts when found
Now that our CFF parser is working we can load Type1C fonts in PDF,
which are backed by a CFF stream.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
c4b45a82cd LibPDF: Add initial CFF parsing
The Compat Font Format specification (Adobe's Technical Note #5176) is
used by PDF's Type1C fonts to store their data. While being similar in
spirit to PS1 Type 1 Font Programs, it was designed for a more compact
representation and thus space reduction (but an increment on
complexity). It also shares most of the charstring encoding logic, which
is why the CFF class also inherits from Type1FontProgram.

This initial implementation is still lacking many details, e.g.:

 * It doesn't include all the built-in CFF SIDs
 * It doesn't support CFF-provided SIDs (defaults those glyphs to the
   space character)
 * More checks in general
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
1ec4ad5eb6 LibPDF: Add name -> char code conversion in Encoding
This is an operation that was already being done (sub-optimally) in
PS1FontProgram, so we are replacing that. We will use this during CFF
parsing too.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
c592b889bf LibPDF: Add Reader::try_read for easier error propagation
This will allow us to use TRY(reader.try_read) instead of having to
verify the result of reader.remaining() before calling read.read().
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
1b90ea7d3a LibPDF: Augment Type11FontProgram with Type2 capabilities
The Type1FontProgram logic was based on the Adobe Type 1 Font Format; in
particular, it implemented the CharStrings Dictionary section
(charstring decoding, and most commands). In the case of Type1, these
charstrings are read from a PS1 diciontary, with one entry per character
in the font's charset. This has served us well for Type1 font rendering.

When implementing Type1C font rendering, this wasn't enough. Type1C PDF
fonts are specified in embedded CFF (Compact Font File) streams, which
also contain a charstring dictionary with an entry for each character in
the font's charset. These entries can be slightly different from those
in a PS1 Font Program though: depending on a flag in the CFF, the
entries will be encoded either in the original charstring format from
the Adobe Type 1 Font Format, or in the "Type 2 Charstring Format"
(Adobe's Technical Note #1577). This new format is for the most part a
super-set of the original, with small differences, all in the name of
making the representation as compact as possible:

 * The glyph's width is not specified via a separate command; instead
   it's an optional additional argument to the first command of the
   charstring stream (and even then, it's only the *difference* to a
   nominal character width specified in the CFF).
 * The interpretation of a 4-byte number is different from Type 1: in
   Type 1 this is a 4-byte unsigned integer, whereas in Type 1 it's a
   fixed decimal with 16 bits of fractional part.
 * Many commands accept a variable set of arguments, so they can draw
   more than one line/curve on a single go. These are all
   retro-compatible with Type 1's commands.

All these changes are implemented in this patch in a
backwards-compatible way. To ensure Type 1/2 behavior is accessed, a new
parameter indicates which behavior is desired when decoding the
charstring stream.

I also took the chance to centralise some logic that was previously
duplicated across the parse_glyph function. Common lambdas capture the
logic for moving to, or drawing a line/curve to a given point and
updating the glyph state. Similarly, some command logic, including
reading parameters, are shared by several commands. Finally, I've
re-organised the cases in the main switch to group together related
commands.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
f06de0fa07 LibPDF: Remove unused member 2023-01-25 15:40:11 +01:00
Rodrigo Tobar
416585f75a LibPDF: Add new Type1FontProgram base class
We are planning to add support for CFF fonts to read Type1 fonts, and
therefore much of the logic already found in PS1FontProgram will be
useful for representing the Type1 fonts read from CFF.

This commit moves the PS1-independent bits of PS1FontProgram into a new
Type1FontProgram base class that can be used as the base for CFF-based
Type1 fonts in the future. The Type1Font class uses this new type now
instead of storing a PS1FontProgram pointer. While doing this
refactoring I also took care of making some minor adjustments to the
PS1FontProgram API, namely:

 * Its create() method is static and returns a
   NonnullRefPtr<Type1FontProgram>.
 * Many (all?) of the parse_* methods are now static.
 * Added const where possible.

Notably, the Type1FontProgram also contains at the moment the code that
parses the CharString data from the PS1 program. This logic is very
similar in CFF files, so after some minor adjustments later on it should
be possible to reuse most of it.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
e751ec2089 LibPDF: Avoid reading fields from moved-from Data object
This might not be an issue at the moment, but moved-from objects are
usually in a unspecifed but valid state, meaning that we shouldn't read
from them.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
bfeca4ebb3 LibPDF: Record base font name read from document
This will be useful for debugging, or if we later on want to show all
the fonts found in the document in an organised manner.
2023-01-25 15:40:11 +01:00
Tim Schumacher
b1bfeb391e LibPDF: Use Core::Stream to parse the page offset hint table 2023-01-21 00:45:33 +00:00
Liav A
57e19a7e56 LibGfx: Re-structure the whole initialization pattern for image decoders
When trying to figure out the correct implementation, we now have a very
strong distinction on plugins that are well suited for sniffing, and
plugins that need a MIME type to be chosen.

Instead of having multiple calls to non-static virtual sniff methods for
each Image decoding plugin, we have 2 static methods for each
implementation:
1. The sniff method, which in contrast to the old method, gets a
    ReadonlyBytes parameter and ensures we can figure out the result
    with zero heap allocations for most implementations.
2. The create method, which just creates a new instance so we don't
    expose the constructor to everyone anymore.

In addition to that, we have a new virtual method called initialize,
which has a per-implementation initialization pattern to actually ensure
each implementation can construct a decoder object, and then have a
correct context being applied to it for the actual decoding.
2023-01-20 15:13:31 +00:00
Timothy Flynn
f3db548a3d AK+Everywhere: Rename FlyString to DeprecatedFlyString
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
2023-01-09 23:00:24 +00:00
Julian Offenhäuser
2a70ea3ee7 LibPDF: Propagate errors in PDFFont::create() 2023-01-09 22:54:36 +00:00
Julian Offenhäuser
ac31b1bda3 LibPDF: Make glyphs from standard 14 fonts show up in Type1Font
Previously, we would assume that all standard 14 fonts use a
TrueTypeFont dictionary. Now we render them in Type1Font as well,
given that it doesn't contain a PostScript font program.
2023-01-09 22:54:36 +00:00
Julian Offenhäuser
a37f3390dc LibPDF: Allow numbers to start with whitespace 2023-01-09 22:54:36 +00:00
Rodrigo Tobar
a5620fd41f LibPDF: Load destinations from Catalogue -> Names -> Dests name tree
PDF allows for named destinations to be provided as string. These can be
either found in the Dests dictionary in the document catalogue (as
already implemented), or in the Name Tree specified by the Dests key in
the Names dictionary of the document catalogue (missing).

This commit adds this missing case. Once the named destination is found
in the name tree, its value is interpreted just like in the first case,
so a new utility method encapsulates the common behavior.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
5420261347 LibPDF: Implement name tree lookups
Name Trees are hierarchical, string-keyed, sorted-by-key dictionary
structures in PDF where each node (except the root) specifies the bounds
of the values it holds, and either its kids (more nodes) or the
key/value pairs it contains.

This commit implements a series of lookup calls for finding a key in
such name trees. This implementation follows the tree as needed on each
lookup, but if that becomes inefficient in the long run we can switch to
creating a HashMap with all the contents, which as a drawback will
require more memory.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
8c79f0e0cf LibPDF: Add more utility methods to {Dict,Array}Object
Being both of them containers, these classes already offered a set of
methods to retrieve an inner element by key or index, respectively, with
different methods for the different subtypes of the PDF::Object type
returning the element cast to the correct type pointer. On top of
that, DictObject offered an additional method to obtain an element as an
Object pointer.

While these methods were useful, they have some shortcomings:

 * They always take a Document pointer to first perform an object
   resolution, in case the element is a Reference. This is not always
   necessary though, as there are values that are always meant to be
   immediate, and hence the resolution lookup adds overhead.
 * There was no easy way to get an individual Object element from an
   ArrayObject like there is in DictObject. This makes it difficult to
   obtain such values, as one first needs to call dict.get() to get a
   Value, then cast it manually to a NonnullRefPtr<Object>.

This commit fixes these two issues by:

 * Adding a new method that returns an Object for a given index.
 * Adding overloads for this new method, and all the existing methods
   described above, that do *not* take a Document, and therefore do
   *not* perform an object resolution lookup.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
0e1c858f90 LibPDF: Move casting code to its own cast_to function
This functionality was previously part of the resolve_to() Document
method, and thus only available only when resolving objects through the
Document class. There are many use cases where this casting can be used,
but no resolution is needed.

This commit moves this functionality into a new cast_to function, and
makes the resolve_to function call it internally. With this new function
in place we can now offer new versions of DictObject::get_* and
ArrayObject::get_*_at that don't perform Document resolution
unnecessarily when not required.
2023-01-06 18:06:41 +01:00