LibPDF: Improve hex string parsing

A local (non-public) PDF I have lying around contains this in a page's operator stream: ``` [<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29 <00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31 <0057004b> 4 <00480003> -37 <0050 >] TJ ``` That is, there's a newline in a hexstring after a character. This led to `Parser error at offset 5184: Unexpected character`. The spec says in 3.2.3 String Objects, Hexadecimal Strings: """Each pair of hexadecimal digits defines one byte of the string. White-space characters (such as space, tab, carriage return, line feed, and form feed) are ignored.""" But we didn't ignore whitespace before or after a character, only in between the bytes. The spec also says: """If the final digit of a hexadecimal string is missing—that is, if there is an odd number of digits—the final digit is assumed to be 0.""" In that case, we were skipping the closing `>` twice -- or, more accurately, we ignored the character after it too. This has been wrong all the way back in #6974. Add a test that fails if either of the two changes isn't present.
2025-07-28 20:57:44 +00:00 · 2024-01-01 19:31:27 -05:00 · 2024-01-01 19:31:27 -05:00 · 9495f64f91
commit 9495f64f91
parent d11c7a19da
2 changed files with 22 additions and 1 deletions
--- a/Userland/Libraries/LibPDF/Parser.cpp
+++ b/Userland/Libraries/LibPDF/Parser.cpp
@ -352,6 +352,7 @@ PDFErrorOr<ByteString> Parser::parse_hex_string()
    StringBuilder builder;

    while (true) {
+        m_reader.consume_whitespace();
        if (m_reader.matches('>')) {
            m_reader.consume();
            return builder.to_byte_string();
@ -364,7 +365,6 @@ PDFErrorOr<ByteString> Parser::parse_hex_string()
                if (ch == '>') {
                    // The hex string contains an odd number of characters, and the last character
                    // is assumed to be '0'
-                    m_reader.consume();
                    hex_value *= 16;
                    builder.append(static_cast<char>(hex_value));
                    return builder.to_byte_string();