LibPDF: Tolerate trailing whitespace after %%EOF marker

mirror of https://github.com/RGBCube/serenity synced 2025-07-25 06:37:43 +00:00

At first I tried implmenting the quirk from PDF 1.7 Appendix H,
3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF
marker appear somewhere within the last 1024 bytes of the file.""
This would've been like #22548 but at end-of-file instead of at
start-of-file.

This helped a bunch of files, but also broke a bunch of files that
made more than 1024 bytes of stuff at the end, and it wouldn't have
helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF.
So just tolerate whitespace after the %%EOF line, and keep ignoring
and arbitrary amount of other stuff after that like before.

This helps:
* 0000599.pdf
  One trailing \0 byte after %%EOF. Due to that byte, the
  is_linearized() check fails and we go down the non-linearized
  codepath. But with this fix, that code path succeeds.
* 0000937.pdf
  Same.
* 0000055.pdf
  Has one space followed by a \n after %%EOF
* 0000059.pdf
  Has over 40kB of trailing \0 bytes

The following files keep working with it:
* 0000242.pdf
  5586 bytes of trailing HTML
* 0000336.pdf
  5586 bytes of trailing HTML fragment
* 0000136.pdf
  2054 bytes of trailing space characters
  This one kind of only worked by accident before since it found
  the %%EOF block before the final %%EOF block. Maybe this is
  even an intentional XRefStm compat hack? Anyways, now it
  find the final block instead.
* 0000327.pdf
  11044 bytes of trailing HTML

This commit is contained in:

Nico Weber

2024-01-03 17:56:16 -05:00

• committed by

Andreas Kling

parent 2d12647e29

commit 9d69c5d434

1 changed files with 1 additions and 0 deletions

									
										1

Userland/Libraries/LibPDF/DocumentParser.cpp
									
										View file
										
				@ -726,6 +726,7 @@ bool DocumentParser::navigate_to_before_eof_marker()

				    while (!m_reader.done()) {

				        m_reader.consume_eol();

				        m_reader.consume_whitespace();

				        if (m_reader.matches("%%EOF")) {

				            m_reader.move_by(5);

				            return true;

Rows
Columns

LibPDF: Tolerate trailing whitespace after %%EOF marker

1 Userland/Libraries/LibPDF/DocumentParser.cpp Unescape Escape View file

1

Userland/Libraries/LibPDF/DocumentParser.cpp

View file