From 9d69c5d434ffe6466c135748e87c8ebc735c4580 Mon Sep 17 00:00:00 2001 From: Nico Weber Date: Wed, 3 Jan 2024 17:56:16 -0500 Subject: [PATCH] LibPDF: Tolerate trailing whitespace after %%EOF marker At first I tried implmenting the quirk from PDF 1.7 Appendix H, 3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file."" This would've been like #22548 but at end-of-file instead of at start-of-file. This helped a bunch of files, but also broke a bunch of files that made more than 1024 bytes of stuff at the end, and it wouldn't have helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF. So just tolerate whitespace after the %%EOF line, and keep ignoring and arbitrary amount of other stuff after that like before. This helps: * 0000599.pdf One trailing \0 byte after %%EOF. Due to that byte, the is_linearized() check fails and we go down the non-linearized codepath. But with this fix, that code path succeeds. * 0000937.pdf Same. * 0000055.pdf Has one space followed by a \n after %%EOF * 0000059.pdf Has over 40kB of trailing \0 bytes The following files keep working with it: * 0000242.pdf 5586 bytes of trailing HTML * 0000336.pdf 5586 bytes of trailing HTML fragment * 0000136.pdf 2054 bytes of trailing space characters This one kind of only worked by accident before since it found the %%EOF block before the final %%EOF block. Maybe this is even an intentional XRefStm compat hack? Anyways, now it find the final block instead. * 0000327.pdf 11044 bytes of trailing HTML --- Userland/Libraries/LibPDF/DocumentParser.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/Userland/Libraries/LibPDF/DocumentParser.cpp b/Userland/Libraries/LibPDF/DocumentParser.cpp index 0c07e563b1..92ae52db1e 100644 --- a/Userland/Libraries/LibPDF/DocumentParser.cpp +++ b/Userland/Libraries/LibPDF/DocumentParser.cpp @@ -726,6 +726,7 @@ bool DocumentParser::navigate_to_before_eof_marker() while (!m_reader.done()) { m_reader.consume_eol(); + m_reader.consume_whitespace(); if (m_reader.matches("%%EOF")) { m_reader.move_by(5); return true;