When trying to figure out the correct implementation, we now have a very
strong distinction on plugins that are well suited for sniffing, and
plugins that need a MIME type to be chosen.
Instead of having multiple calls to non-static virtual sniff methods for
each Image decoding plugin, we have 2 static methods for each
implementation:
1. The sniff method, which in contrast to the old method, gets a
ReadonlyBytes parameter and ensures we can figure out the result
with zero heap allocations for most implementations.
2. The create method, which just creates a new instance so we don't
expose the constructor to everyone anymore.
In addition to that, we have a new virtual method called initialize,
which has a per-implementation initialization pattern to actually ensure
each implementation can construct a decoder object, and then have a
correct context being applied to it for the actual decoding.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
Previously, we would assume that all standard 14 fonts use a
TrueTypeFont dictionary. Now we render them in Type1Font as well,
given that it doesn't contain a PostScript font program.
PDF allows for named destinations to be provided as string. These can be
either found in the Dests dictionary in the document catalogue (as
already implemented), or in the Name Tree specified by the Dests key in
the Names dictionary of the document catalogue (missing).
This commit adds this missing case. Once the named destination is found
in the name tree, its value is interpreted just like in the first case,
so a new utility method encapsulates the common behavior.
Name Trees are hierarchical, string-keyed, sorted-by-key dictionary
structures in PDF where each node (except the root) specifies the bounds
of the values it holds, and either its kids (more nodes) or the
key/value pairs it contains.
This commit implements a series of lookup calls for finding a key in
such name trees. This implementation follows the tree as needed on each
lookup, but if that becomes inefficient in the long run we can switch to
creating a HashMap with all the contents, which as a drawback will
require more memory.
Being both of them containers, these classes already offered a set of
methods to retrieve an inner element by key or index, respectively, with
different methods for the different subtypes of the PDF::Object type
returning the element cast to the correct type pointer. On top of
that, DictObject offered an additional method to obtain an element as an
Object pointer.
While these methods were useful, they have some shortcomings:
* They always take a Document pointer to first perform an object
resolution, in case the element is a Reference. This is not always
necessary though, as there are values that are always meant to be
immediate, and hence the resolution lookup adds overhead.
* There was no easy way to get an individual Object element from an
ArrayObject like there is in DictObject. This makes it difficult to
obtain such values, as one first needs to call dict.get() to get a
Value, then cast it manually to a NonnullRefPtr<Object>.
This commit fixes these two issues by:
* Adding a new method that returns an Object for a given index.
* Adding overloads for this new method, and all the existing methods
described above, that do *not* take a Document, and therefore do
*not* perform an object resolution lookup.
This functionality was previously part of the resolve_to() Document
method, and thus only available only when resolving objects through the
Document class. There are many use cases where this casting can be used,
but no resolution is needed.
This commit moves this functionality into a new cast_to function, and
makes the resolve_to function call it internally. With this new function
in place we can now offer new versions of DictObject::get_* and
ArrayObject::get_*_at that don't perform Document resolution
unnecessarily when not required.
Destination arrays contain a page number, a mode name, and parameters
specific to that mode. In many cases these parameters can be set to
"null", which our code wasn't taking into consideration.
This commit parses these parameters taking into account whether they are
null or actual numbers, and stores them as Optional<float> instead of
plain floats. The parameters are not yet used anywhere else other than
when formatting a Destination object, so the change is fairly small.
Before this patch, the generation of the encryption key was not working
correctly since the lifetime of the underlying data was too short,
same inputs would give random encryption keys.
Fixes#16668
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
When an attempt is made to provide the user password to a
SecurityHandler a user gets back a boolean result indicating success or
failure on the attempt. However, the SecurityHandler is left in a state
where it thinks it has a user password, regardless of the outcome of the
attempt. This confuses the rest of the system, which continues as if the
provided password is correct, resulting in garbled content.
This commit fixes the situation by resetting the internal fields holding
the encryption key (which is used to determine whether a user password
has been successfully provided) in case of a failed attempt.
With the StandardSecurityHandler the Length item in the Encryption
dictionary is optional, and needs to be given only if the encryption
algorithm (V) is other than 1; otherwise we can assume a length of 40
bits for the encryption key.
The Value previously stored corresponded to a Reference to a Page object
in the PDF document. This isn't useful information, since what we want
to display at the end of the day is the page an outline item refers to.
This commit changes the page member on OutlineItem to be a Optional<u32>
(some destinations don't necessarily refer to a Page), which we resolve
while building OutlineItems.
While OutlineItem had a parent field, it was never populated nor used.
This commit populates it when possible (no parent means the OutlineItem
is a top-level item).
Instead of calling TODO(), which will abort the program, we now return
an Error specifying that we haven't implemented the drawing operation
yet. This will now nicely trickle up all the way through to the
PDFViewer, which will then notify its clients about the problem.
The current rendering routine aborts as soon as an error is found during
rendering, which potentially severely limits the contents we show on
screen. Moreover, whenever an error happens the PDFViewer widget shows
an error dialog, and doesn't display the bitmap that has been painted so
far.
This commit improves the situation in both fronts, implementing
rendering now with a best-effort approach. Firstly, execution of
operations isn't halted after an operand results in an error, but
instead execution of all operations is always attempted, and all
collected errors are returned in bulk. Secondly, PDFViewer now always
displays the resulting bitmap, regardless of error being produced or
not. To communicate errors, an on_render_errors callback has been added
so clients can subscribe to these events and handle them as appropriate.
This will be used to perform a best-effort rendering, where an error in
rendering won't abort the whole rendering operation, but instead will be
stored for later reference while rendering continues.
The code parsing comments parsed only a single line of comments, but
callers assumed they parsed all comments that appeared contiguously in a
block. The latter is an easier to understand API, so this commit changes
the parse_comment function to parse entire blocks of comments instead of
single lines.
While the Outline Items making up the document's Outline have all sorts
of cross-references (parent, first/last chlid, next/previous sibling,
etc), not all documents out there have fully-consistent references. Our
implementation already discarded some of that information too (e.g.,
/Parent and /Prev were never read), and trusted that /First and /Next
were good enough to traverse the whole hierarchy.
Where the current implementation failed was in assuming that /Last was
also a good source of information. There are documents out there were
/Last also points to dead ends, and were therefore causing a crash when
we verified that the last child found on a chain was the /Last child
declared by the parent. To fix this I'm simply removing the check, and
simplifying the function call to remove any references to /Last. This
way we affirm our commitment to /First and /Next as the main sources of
information.
This command is meant to print an Standard Encoding Accented Character.
It's not critical to implement it yet, but if we want to render more
documents we need to handle the instruction, even if simply ignore it.
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.
std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
After adding support for XObject Form rendering, the next was to display
XObject images. This commit adds this initial support,
Images come in many shapes and forms: encodings: color spaces, bits per
component, width, height, etc. This initial support is constrained to
the color spaces we currently support, to images that use 8 bits per
component, to images that do *not* use the JPXDecode filter, and that
are not Masks. There are surely other constraints that aren't considered
in this initial support, so expect breakage here and there.
In addition to supporting images, we also support applying an alpha mask
(SMask) on them. Additionally, a new rendering preference allows to skip
image loading and rendering altogether, instead showing an empty
rectangle as a placeholder (useful for when actual images are not
supported). Since RenderingPreferences is becoming a bit more complex,
we add a hash option that will allow us to keep track of different
preferences (e.g., in a HashMap).
Interpolation is needed in more than one place, and I couldn't find a
central place where I could borrow a readily available interpolation
routine, so I've implemented the first simple interpolation object. More
will follow for more complex scenarios.
Arrays of float numbers are common in many PDF objects, and thus to
avoid code repetition I'm introducing a new method to ArrayObject that
will return exactly that.
ColorSpaces now can tell users how many components they expect, and the
default decode array that should be used when converting unit bit
sequences into color space component input values during image
rendering.
ColorSpaces can be specified in two ways: with a stream as operands of
the color space operations (CS/cs), or as a separate PDF object, which
is then referred to by other means (e.g., from Image XObjects and other
entities). These two modes of addressing a ColorSpace are slightly
different and need to be addressed separately. However, the current
implementation embedded the full logic of the first case in the routine
that created ColorSpace objects.
This commit refactors the creation of ColorSpace to support both cases.
First, a new ColorSpaceFamily class encapsulates the static aspects of a
family, like its name or whether color space construction never requires
parameters. Then we define the supported ColorSpaceFamily objects.
On top of this also sit a breakage on how ColorSpaces are created. Two
methods are now offered: one only providing construction of no-argument
color spaces (and thus taking a simple name), and another taking an
ArrayObject, hence used to create ColorSpaces requiring arguments.
Finally, on top of *that* two ways to get a color space in the Renderer
are made available: the first creates a ColorSpace with a name and a
Resources dictionary, and another takes an Object. These model the two
addressing modes described above.
Fonts with the encoding name "WinAnsiEncoding" should render missing
characters above character code 040 (octal) as a "bullet" character.
This patch adds Encoding::should_map_to_bullet(char_code) which is then
called by char_code_to_code_point() to check if the given char code
should be displayed as a bullet instead.
I didn't have a good way to test this, so I've only verified that it
works by manually overriding inputs to the function during the rendering
stage.
This takes care of a FIXME in the Annex D part of the PDF specification.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This implementation currently handles Form XObjects only, skipping
image XObjects. When rendering an XObject, its resources are passed to
the underlying operations so they use those instead of the Page's.
Operators usually assume that the resources its operations will require
will be the Page's. This assumption breaks however when XObjects with
their own resources come into the picture (and maybe other cases too).
In that case, the XObject's resources take precedence, but they should
also contain the Page's resources. Because of this, one can safely use
the XObject resources alone when given, and default to the Page's if
not.
This commit adds all operator calls an extra argument with optional
resources, which will be fed by XObjects as necessary.
Resources can come from other sources (e.g., XObjects), and since the
only attribute we are reading from Page are its resources it makes sense
to receive resources instead. That way we'll be able to pass down
arbitrary resources that are not necessarily declared at the page level.
Paths rendering was buggy because the map() function that translates
points from user space to bitmap space applied the vertical flip
conversion that the current transformation matrix already considers;
Hence, all paths were upside down. The only exception was the "re"
instruction, which manually adjusted the Y coordinate of its points to
be flipped again (and had a FIXME saying that this should be
unnecessary).
This commit fixes the map() function that maps userspace points to
bitmap coordinates. The "re" operator implementation has also been
simplified creating a rectangle first and mapping *that* instead of
mapping each point individually.
A new struct allows users to specify specific rendering preferences that
the Renderer class might use to paint some Document elements onto the
target bitmap. The first toggle allows rendering (or not) the clipping
paths on a page, which is useful for debugging.