With this, looking at page 2 of pdf_reference_1-7.pdf no longer crashes.
Why did it crash in the first place? Because due to this bug, CFF.cpp
failed to parse the font program for the font used to render the `®`
character. `Renderer::render()` adds all errors that are encounterd
to an `errors` object but continues rendering. That meant that the
previous font was still active, and that didn't have a width for that
symbol in its width table.
SimpleFont::draw_string() falls back to get_glyph_width() if there's
no entry for a character for a symbol. `Type1Font::get_glyph_width()`
always dereferences `m_font` in that method, even if the font has
a font program (and m_font is hence nullptr).
With the off-by-one fixed, the second font is successfully installed
as current font, and the second font has a width entry for that symbol,
so the problem no longer occurs.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This is a big step, as most PDFs which are downloaded online will be
linearized. Pretty much the only difference is that the xref structure
is slightly different.
This commit introduces the ability to parse the document catalog dict,
as well as the page tree and individual pages. Pages obviously aren't
fully parsed, as we won't care about most of the fields until we
start actually rendering PDFs.
One of the primary benefits of the PDF format is laziness. PDFs are
not meant to be parsed all at once, and the same is true for pages.
When a Document is constructed, it builds a map of page number to
object index, but it does not fetch and parse any of the pages. A page
is only parsed when a caller requests that particular page (and is
cached going forwards).
Additionally, this commit also adds an object_cast function which
logs bad casts if DEBUG_PDF is set. Additionally, utility functions
were added to ArrayObject and DictObject to get all types of objects
from the collections to avoid having to manually cast.
This commit adds a parser as well as the Reader class, which serves
as a utility to aid in reading the PDF both forwards and in reverse.
The parser currently is capable of reading xref tables, as well as
all values. We don't really do anything with any of this information,
however.