1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-28 19:17:44 +00:00

LibPDF: Parse page structures

This commit introduces the ability to parse the document catalog dict,
as well as the page tree and individual pages. Pages obviously aren't
fully parsed, as we won't care about most of the fields until we
start actually rendering PDFs.

One of the primary benefits of the PDF format is laziness. PDFs are
not meant to be parsed all at once, and the same is true for pages.
When a Document is constructed, it builds a map of page number to
object index, but it does not fetch and parse any of the pages. A page
is only parsed when a caller requests that particular page (and is
cached going forwards).

Additionally, this commit also adds an object_cast function which
logs bad casts if DEBUG_PDF is set. Additionally, utility functions
were added to ArrayObject and DictObject to get all types of objects
from the collections to avoid having to manually cast.
This commit is contained in:
Matthew Olsson 2021-05-08 14:57:49 -07:00 committed by Andreas Kling
parent 72f693e9ed
commit 8c745ad0d9
11 changed files with 320 additions and 6 deletions

View file

@ -27,6 +27,8 @@ public:
};
XRefTableAndTrailer parse_last_xref_table_and_trailer();
NonnullRefPtr<IndirectValue> parse_indirect_value_at_offset(size_t offset);
private:
bool parse_header();
XRefTable parse_xref_table();
@ -48,6 +50,7 @@ private:
Value parse_value();
Value parse_possible_indirect_value_or_ref();
NonnullRefPtr<IndirectValue> parse_indirect_value(int index, int generation);
NonnullRefPtr<IndirectValue> parse_indirect_value();
Value parse_number();
NonnullRefPtr<NameObject> parse_name();
NonnullRefPtr<StringObject> parse_string();
@ -60,6 +63,8 @@ private:
bool matches_eol() const;
bool matches_whitespace() const;
bool matches_number() const;
bool matches_delimiter() const;
bool matches_regular_character() const;
void consume_eol();
bool consume_whitespace();