1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-25 09:47:35 +00:00
serenity/Userland/Libraries/LibPDF
Nico Weber 9d69c5d434 LibPDF: Tolerate trailing whitespace after %%EOF marker
At first I tried implmenting the quirk from PDF 1.7 Appendix H,
3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF
marker appear somewhere within the last 1024 bytes of the file.""
This would've been like #22548 but at end-of-file instead of at
start-of-file.

This helped a bunch of files, but also broke a bunch of files that
made more than 1024 bytes of stuff at the end, and it wouldn't have
helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF.
So just tolerate whitespace after the %%EOF line, and keep ignoring
and arbitrary amount of other stuff after that like before.

This helps:
* 0000599.pdf
  One trailing \0 byte after %%EOF. Due to that byte, the
  is_linearized() check fails and we go down the non-linearized
  codepath. But with this fix, that code path succeeds.
* 0000937.pdf
  Same.
* 0000055.pdf
  Has one space followed by a \n after %%EOF
* 0000059.pdf
  Has over 40kB of trailing \0 bytes

The following files keep working with it:
* 0000242.pdf
  5586 bytes of trailing HTML
* 0000336.pdf
  5586 bytes of trailing HTML fragment
* 0000136.pdf
  2054 bytes of trailing space characters
  This one kind of only worked by accident before since it found
  the %%EOF block before the final %%EOF block. Maybe this is
  even an intentional XRefStm compat hack? Anyways, now it
  find the final block instead.
* 0000327.pdf
  11044 bytes of trailing HTML
2024-01-04 11:19:15 +01:00
..
Fonts Everywhere: Use to_number<T> instead of to_{int,uint,float,double} 2023-12-23 20:41:07 +01:00
CMakeLists.txt LibPDF: Add some scaffolding for type 3 fonts 2023-11-17 19:47:53 +00:00
ColorSpace.cpp Revert "LibPDF: Add basic tiled, coloured pattern rendering" 2023-12-21 19:24:56 +01:00
ColorSpace.h Revert "LibPDF: Add basic tiled, coloured pattern rendering" 2023-12-21 19:24:56 +01:00
CommonNames.cpp AK+Everywhere: Rename FlyString to DeprecatedFlyString 2023-01-09 23:00:24 +00:00
CommonNames.h LibPDF: Show a 'rendering unsupported' error for images with /Mask key 2023-12-23 20:39:11 +01:00
Document.cpp LibPDF: Scan for PDF file start in first 1024 bytes 2024-01-03 10:12:35 +01:00
Document.h LibPDF: Draw inline images 2023-12-20 12:45:16 -07:00
DocumentParser.cpp LibPDF: Tolerate trailing whitespace after %%EOF marker 2024-01-04 11:19:15 +01:00
DocumentParser.h LibPDF: Scan for PDF file start in first 1024 bytes 2024-01-03 10:12:35 +01:00
Encoding.cpp LibPDF: Add a FIXME and a spec comment to Encoding::from_object() 2024-01-04 10:12:11 +01:00
Encoding.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Encryption.cpp Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Encryption.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Error.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Filter.cpp LibGfx+LibPDF: Use LibCompress' implementation of the PackBits decoder 2023-12-27 17:40:11 +01:00
Filter.h LibPDF: Let decode_png_prediction() call LibGfx's unfilter_scanline() 2023-11-17 19:09:50 +01:00
Forward.h Everywhere: Remove unused includes of AK/StdLibExtras.h 2023-01-02 20:27:20 -05:00
Function.cpp LibPDF: Do less work in SampledFunction::evaluate()'s inner loop 2023-12-02 22:26:13 +01:00
Function.h LibPDF: Add scaffolding for function objects 2023-11-06 10:01:05 +01:00
Interpolation.cpp LibPDF: Add first interpolation methods 2022-12-10 10:49:03 +01:00
Interpolation.h LibPDF: Add first interpolation methods 2022-12-10 10:49:03 +01:00
Object.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
ObjectDerivatives.cpp Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
ObjectDerivatives.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Operator.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Page.cpp LibPDF: Don't accidentally form new tokens on pages with contents arrays 2023-10-23 13:23:54 -04:00
Page.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Parser.cpp LibPDF: Improve hex string parsing 2024-01-02 22:13:21 +01:00
Parser.h LibPDF: Extract Parser::parse_inline_image() 2023-12-22 10:58:54 +01:00
Reader.cpp LibPDF: Add Reader::consume_non_eol_whitespace() 2024-01-04 10:14:30 +01:00
Reader.h LibPDF: Add Reader::consume_non_eol_whitespace() 2024-01-04 10:14:30 +01:00
Reference.h LibPDF: Make Reference store two u32s instead of one 2023-07-10 17:48:15 +01:00
Renderer.cpp LibPDF: Implement /Mask support with stream object argument 2023-12-23 20:39:11 +01:00
Renderer.h LibPDF: Move error for /ImageMask out of load_image() 2023-12-23 20:39:11 +01:00
Value.cpp Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Value.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
XRefTable.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30