1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-25 03:07:43 +00:00
serenity/Userland/Libraries/LibPDF
Nico Weber 87112dcbdc LibPDF: Return null for invalid refs, tolerate null objects as outline
https://llvm.org/devmtg/2022-11/slides/TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf
has an xref table that starts like so:

```
xref
0 214
0000000002 65535 f
0000924663 00000 n
0000000003 00000 f
0000000000 00000 f
0000000016 00000 n
0000000160 00000 n
0000000263 00000 n
```

This is a list of objects in the PDF file. The lines ending with 'f'
mean that this object is "free", that is it's not stored in the file.
In this file, objects 0, 2, 3 are free. For free objects, the first
number is the offset of the next free object: Object 0 refers to object
2, 2 to 3, and 3 back to 0 (since it's the last free object).
The lines ending with "n" are actual objects; here the first number is
a byte offset to where that object is stored in the file.

Furthermore, the file contains

```
/Outlines
2
0
R
```

in its root object, meaning that object 2 stores the page outlines.

Since object 2 is set as free, there is no object 2. But the spec
says that an invalid object reference is just the null object.

This patch makes us return null objects for references to free
objects, and it also makes us treat a null object as /Outlines value
the same as not having /Outlines in the first place.

Fixes #23023 -- we can now open that file. (We don't render it super
well, but only for already-known reasons.)

Since I found it a bit confusing: XRefTable has two related methods
here:

1. has_object() returns if an object was explicitly listed in an
   xref table. The first number right after `xref` is the start
   index. So if an xref table were to start with `10`, we'd implicitly
   create 10 trailing objects for which has_object() would return false
2. is_object_in_use() returns true if an object that was in a table
   (i.e. one where has_object() returns true) was listed with 'n' and
   false if it was listed with 'f'.

DocumentParser::parse_object_with_index() should probably return a null
object for the `!has_object()` case as well instead of VERIFY()ing
that has_object() is true. But I haven't seen this in the wild yet,
so keeping as-is for now.
2024-01-31 12:10:19 -05:00
..
Fonts LibPDF: Apply text matrix to each glyph's position 2024-01-18 14:01:30 +01:00
CMakeLists.txt Userland: Remove LibCore dependency from libraries that do not use it 2024-01-22 08:48:34 -05:00
ColorSpace.cpp LibPDF: Move ColorSpace::style() to take ReadonlySpan<float> 2024-01-12 12:37:56 +00:00
ColorSpace.h LibPDF: Move ColorSpace::style() to take ReadonlySpan<float> 2024-01-12 12:37:56 +00:00
CommonNames.cpp AK+Everywhere: Rename FlyString to DeprecatedFlyString 2023-01-09 23:00:24 +00:00
CommonNames.h LibPDF: Show a 'rendering unsupported' error for images with /Mask key 2023-12-23 20:39:11 +01:00
Document.cpp LibPDF: Return null for invalid refs, tolerate null objects as outline 2024-01-31 12:10:19 -05:00
Document.h LibPDF: Draw inline images 2023-12-20 12:45:16 -07:00
DocumentParser.cpp LibPDF: Return null for invalid refs, tolerate null objects as outline 2024-01-31 12:10:19 -05:00
DocumentParser.h LibPDF: Scan for PDF file start in first 1024 bytes 2024-01-03 10:12:35 +01:00
Encoding.cpp LibPDF: Add a FIXME and a spec comment to Encoding::from_object() 2024-01-04 10:12:11 +01:00
Encoding.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Encryption.cpp Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Encryption.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Error.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Filter.cpp LibPDF: Be more forgiving about trailing image data 2024-01-16 09:55:11 -05:00
Filter.h LibPDF: Start implementing the TIFF predictor 2024-01-15 23:06:06 -07:00
Forward.h Everywhere: Remove unused includes of AK/StdLibExtras.h 2023-01-02 20:27:20 -05:00
Function.cpp LibPDF: Use mix() in SampledFunction::evaluate() 2024-01-04 21:12:23 +01:00
Function.h LibPDF: Add scaffolding for function objects 2023-11-06 10:01:05 +01:00
Interpolation.cpp LibPDF: Add first interpolation methods 2022-12-10 10:49:03 +01:00
Interpolation.h LibPDF: Add first interpolation methods 2022-12-10 10:49:03 +01:00
Object.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
ObjectDerivatives.cpp LibPDF: Make pdf --dump-contents handle \r line endings better 2024-01-15 23:16:45 -07:00
ObjectDerivatives.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Operator.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Page.cpp LibPDF: Don't accidentally form new tokens on pages with contents arrays 2023-10-23 13:23:54 -04:00
Page.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Parser.cpp LibPDF: Improve hex string parsing 2024-01-02 22:13:21 +01:00
Parser.h LibPDF: Extract Parser::parse_inline_image() 2023-12-22 10:58:54 +01:00
Reader.cpp LibPDF: Add Reader::consume_non_eol_whitespace() 2024-01-04 10:14:30 +01:00
Reader.h LibPDF: Add Reader::consume_non_eol_whitespace() 2024-01-04 10:14:30 +01:00
Reference.h LibPDF: Make Reference store two u32s instead of one 2023-07-10 17:48:15 +01:00
Renderer.cpp LibPDF+MacPDF: Clip text, and add a debug option for disabling it 2024-01-20 08:56:03 +01:00
Renderer.h LibPDF+MacPDF: Clip text, and add a debug option for disabling it 2024-01-20 08:56:03 +01:00
Value.cpp Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
Value.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30
XRefTable.h Everywhere: Rename {Deprecated => Byte}String 2023-12-17 18:25:10 +03:30