mirror of
https://github.com/RGBCube/serenity
synced 2025-07-25 06:37:43 +00:00
LibPDF: Tolerate trailing whitespace after %%EOF marker
At first I tried implmenting the quirk from PDF 1.7 Appendix H, 3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file."" This would've been like #22548 but at end-of-file instead of at start-of-file. This helped a bunch of files, but also broke a bunch of files that made more than 1024 bytes of stuff at the end, and it wouldn't have helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF. So just tolerate whitespace after the %%EOF line, and keep ignoring and arbitrary amount of other stuff after that like before. This helps: * 0000599.pdf One trailing \0 byte after %%EOF. Due to that byte, the is_linearized() check fails and we go down the non-linearized codepath. But with this fix, that code path succeeds. * 0000937.pdf Same. * 0000055.pdf Has one space followed by a \n after %%EOF * 0000059.pdf Has over 40kB of trailing \0 bytes The following files keep working with it: * 0000242.pdf 5586 bytes of trailing HTML * 0000336.pdf 5586 bytes of trailing HTML fragment * 0000136.pdf 2054 bytes of trailing space characters This one kind of only worked by accident before since it found the %%EOF block before the final %%EOF block. Maybe this is even an intentional XRefStm compat hack? Anyways, now it find the final block instead. * 0000327.pdf 11044 bytes of trailing HTML
This commit is contained in:
parent
2d12647e29
commit
9d69c5d434
1 changed files with 1 additions and 0 deletions
|
@ -726,6 +726,7 @@ bool DocumentParser::navigate_to_before_eof_marker()
|
|||
|
||||
while (!m_reader.done()) {
|
||||
m_reader.consume_eol();
|
||||
m_reader.consume_whitespace();
|
||||
if (m_reader.matches("%%EOF")) {
|
||||
m_reader.move_by(5);
|
||||
return true;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue