1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-26 16:02:37 +00:00
Commit graph

20 commits

Author SHA1 Message Date
Julian Offenhäuser
fd78875662 LibPDF: Fix navigate_to_before_eof_marker() for PDFs not ending in EOL
The way this was factored before, we would miss the %%EOF marker if it
didn't have a valid end-of-line sequence after it.
2023-03-22 09:04:00 +01:00
Julian Offenhäuser
93062e2b78 LibPDF: Be more cautious of errors when looking for linearization dict
We would previously assume that, following the header, there must be a
valid PDF object that could be a linearization dict.

However, if the file is not linearized, this is not necessarily true.
We now try to detect if there even is an object, and don't treat
parsing errors as fatal.
2023-03-22 09:04:00 +01:00
Julian Offenhäuser
6c0f7d83bb LibPDF: Don't treat a broken document header as a fatal error
As the current goal is to make our best effort loading documents, we
might as well ignore a broken header and power through, giving the user
a warning.
2023-03-22 09:04:00 +01:00
Julian Offenhäuser
34350ee9e7 LibPDF: Allow reading documents with incremental updates
The PDF spec allows incremental changes of a document by appending a
new XRef table and file trailer to it. These will only contain the
changed objects and will point back to the previous change, forming an
arbitrarily long chain of XRef sections and file trailers.

Every one of those XRef sections may be encoded as an XRef stream as
well, in which case the trailer is part of the stream dictionary as
usual. To make this easier, I made it so every XRef table may "own" a
trailer. This means that the main file trailer is now part of the main
XRef table.
2023-02-12 10:55:37 +00:00
MacDue
63b11030f0 Everywhere: Use ReadonlySpan<T> instead of Span<T const> 2023-02-08 19:15:45 +00:00
Tim Schumacher
220fbcaa7e AK: Remove the fallible constructor from FixedMemoryStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
261d62438f AK: Remove the fallible constructor from LittleEndianInputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
093cf428a3 AK: Move memory streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
ae64b68717 AK: Deprecate the old AK::Stream
This also removes a few cases where the respective header wasn't
actually required to be included.
2023-01-29 19:16:44 -07:00
Tim Schumacher
b1bfeb391e LibPDF: Use Core::Stream to parse the page offset hint table 2023-01-21 00:45:33 +00:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Julian Offenhäuser
d1bc89e30b LibPDF: Try to repair XRef tables with broken indices
An XRef table usually starts with an object number of zero. While it
could technically start at any other number, this is a tell-tale sign
of a broken table.

For the "broken" documents I encountered, this always meant that some
objects must have been removed from the start of the table, without
updating the following indices. When this is the case, the document is
not able to be read normally.

However, most other PDF parsers seem to know of this quirk and fix the
XRef table automatically.

Likewise, we now check for this exact case, and if it matches up with
what we expect, we update the XRef table such that all object numbers
match the actual objects found in the file again.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
4b1a72ff7a LibPDF: Fix loop condition in parse_xref_stream()
We previously compared two unrelated values to determine if we parsed
the xref table to completion. We now check if we added every subsection
instead, and double check to make sure we never read past the end.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
a17a23a3f0 LibPDF: Make some variable names in parse_xref_stream() more clear
I found these to be a bit misleading.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
77f5f7a6f4 LibPDF: Support parsing page tree nodes that are in object streams
conditionally_parse_page_tree_node used to assume that the xref table
contained a byte offset, even for compressed objects. It now uses the
common facilities for parsing objects, at the expense of some
performance.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
563d91b6c4 LibPDF: Implement loading compressed objects from object streams
Now, whenever the xref table points to a compressed object,
parse_object_with_index will look it up in the corresponding object
stream as if it were a regular object.

With this, our parser gains the bare minimum support for xref streams.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
f9beff7b5e LibPDF: Initial work on parsing xref streams
Since PDF version 1.5, a document may omit the xref table in favor of
a new kind of xref stream object. This is used to reference so-called
"compressed" objects that are part of an object stream.

With this patch we are able to parse this new kind of xref object, but
we'll have to implement object streams to use them correctly.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
4887aacec7 LibPDF: Move document-specific parsing functionality into its own class
The Parser class is now a generic PDF object parser, of which the new
DocumentParser class derives. DocumentParser now takes over all
functions relating to linearization, pages, xref and trailer handling.

This allows the use of multiple parsers in the same document's
context, which will be needed in order to handle PDF object streams.
2022-09-17 10:07:14 +01:00