In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.
The idea is to massage the inline image data into something that
looks like a regular image, and then use the normal image drawing code:
We translate the inline image abbreviations to the expanded version at
rendering time, then unfilter (i.e. uncompress) the image data at
rendering time, and the go down the usual image drawing path.
Normal streams are unfiltered when they're first accessed, but
inline image streams live in a page's drawing operators, and this
fits the current approach of parsing a page's operators anew
every time the page is rendered.
(We also need to add some special-case handling for color spaces
of inline images: Inline images can use named color spaces, while
regular images always use direct color space objects.)
We create a inline_image_end operator that has all the relevant data
in a synthetic StreamObject.
inline_image_end is still a RENDERER_TODO(), so no real behavior
change. (Previously we'd call only inline_image_begin, so string the
todo message is about is now a bit different. But no interesting
behavior change.)
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
In practice, basically no file has it, since it was only added in 2.0,
and 1.7 explicitly said "in particular, the Type, Subtype, and Length
entries normally found in a stream or image dictionary are unnecessary."
Fixes a crash on page 3 of 0000450.pdf of 0000.zip, where we previously
started interpreting the middle of an inline image content stream as
operators, since it contained `EI` in its pixel data.
Fixes these errors from `Meta/test_pdf.py path/to/0000`, with
0000 being 0000.zip from the PDF/A corpus in unzipped:
Malformed PDF file: Indexed color space lookup table doesn't
match size, in 4 files, on 8 pages, 73 times
path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
path/to/0000/0000364.pdf 5 6
path/to/0000/0000918.pdf 5
path/to/0000/0000683.pdf 8
When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat
the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors
to 8-bit nicely, but is the wrong thing to do for palette indices.
Stop doing this for palette indices.
Fixes "Indexed color space index out of range" for 11 files in the
PDF/A 0000.zip test set now that we correctly handle palette indices
as of the previous commit:
Malformed PDF file: Indexed color space lookup table doesn't match
size, in 4 files, on 8 pages, 73 times
path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
path/to/0000/0000364.pdf 5 6
path/to/0000/0000918.pdf 5
path/to/0000/0000683.pdf 8
Previously, we were scaling palette indices from 0..(palette_size - 1)
to 0..255 before using them as index into the palette. Instead, do not
scale palette indices before using them as indices.
(Renderer::load_image() uses `component_value_decoders.empend(
.0f, 255.0f, dmin, dmax)`, so to get an identity mapping, we have to
return `0, 255` from IndexedColorSpace::default_decode()).
Fixes rendering of the gradient on page 5 of 0000277.pdf.
Gfx::ICC::Profile's current API takes bytes, so we need to do some
contortions for LAB values to go through.
This will probably become nicer once we implement all the backward
transforms in Gfx::ICC::Profile, but for now let's hack it in
on the LibPDF side.
Makes colors in 0000651.pdf looks good, especially on pages 1 and 7-12.
If one profile uses PCSXYZ and the other PCSLAB as connection space,
we now do the necessary XYZ/LAB conversion.
With this and the previous commits, we can now convert from profiles
that use PCSLAB with mAB, such as stress.jpeg from
https://littlecms.com/blog/2020/09/09/browser-check/ :
% Build/lagom/icc --name sRGB --reencode-to serenity-sRGB.icc
% Build/lagom/bin/image -o out.png \
--convert-to-color-profile serenity-sRGB.icc \
~/src/jpegfiles/stress.jpeg
Instead of recomputing the left index and the float amount in that
interval for each coordinate all the time, do it once when we
preprocess the input coordinates.
One line less, faster, and arguably easier to read.
No behavior change.
Using `min()` to guarantee the left index is never == `size() - 1`,
even for an interpolation value of 1.0, is less code, and arguably
easier to understand as well.
No behavior change.
Since all parents held a reference pointer to their children, and all
children held reference pointers to their parents, both objects would
never get free'd once the document was no longer being used.
Fixes ossfuzz-63833.
This method takes bytes as input and decompress everything to a
ByteBuffer. It uses two control codes (clear and end of data) as
described in the GIF, TIFF and PDF specifications.
Page 1 of 0000277.pdf does:
BT 22 0 0 22 59 28 Tm /TT2 1 Tf
(Presented at Photonics West OPTO, February 17, 2016) Tj ET
BT 32 0 0 32 269 426 Tm /TT1 1 Tf
(Robert W. Boyd) Tj ET
BT 22 0 0 22 253 357 Tm /TT2 1 Tf
(Department of Physics and) Tj ET
BT 22 0 0 22 105 326 Tm /TT2 1 Tf
(Max-Planck Centre for Extreme and Quantum Photonics) Tj ET
Every line begins a text operation, then updates the font matrix,
selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text
operation.
`Tm` (which sets the font matrix) contains a scale, and uses that
to update the font size of the currently-active font (cf #20084).
But in this file, we `Tm` first and `Tf` (font selection) second,
so this updates the size of the old font. So when we pull it out
of the cache again on line 3, it would still have the old size
from the `Tm` on line 2.
(The whole text scaling logic in LibPDF imho needs a rethink; the
current approach also causes issues with zero-width glyphs which
currently lead to divisions by zero. But that's for another PR.)
Fixes another regression from c8510b58a3 (which I've accidentally
referred to by 2340e834cd in another commit).
SMasks are greyscale images that get used as alpha channel for a
different image.
JPEGs in PDFs are stored as streams with /DCTDecode filters, and
we have a separate code path for loading those in the PDF renderer.
That code path just calls our JPEG decoder, which creates bitmaps
with format BGRx8888.
So when we process an SMask for such a bitmap, we have to change
the bitmap's format to BGRA8888 in addition to setting alpha values
on all pixels.
Per 1.7 spec 3.8.1, there are multiple logical text string types:
* text strings
* ASCII strings
* byte strings
Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0)
UTF-8.
But byte strings shouldn't be converted but treated as binary
data.
This makes us no longer convert strings used for drawing page text.
TABLE 5.6 "Text-showing operators" lists the operands for text-showing
operators as just "string", not "text string" (even though these strings
confusingly are called "text strings" in the body text), so not doing
this there is correct (and matches other viewers).
We also no longer incorrectly convert strings used for cypto data
(such as passwords), if they start with an UTF-16BE or UTF-8 marker.
No behavior change for outlines and info dict entries.
https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of
this.
(ASCII strings only contain ASCII characters and behave the same
anyways.)