CMYK data describes which inks a printer should use to print a color.
If a screen should display a color that's supposed to look similar
to what the printer produces, it results in a color very different
to what Color::from_cmyk() produces. (It's also printer-dependent.)
There are many ICC profiles describing printing processes. It doesn't
matter too much which one we use -- most of them look somewhat
similar, and they all look dramatically better than Color::from_cmyk().
This patch adds a function to download a zip file that Adobe offers
on their web site. They even have a page for redistribution:
https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html
(That one leads to a broken download though, so this downloads the
end-user version.)
In case we have to move off this download at some point, there are also
a whole bunch of profiles at https://www.color.org/registry/index.xalter
that "may be used, embedded, exchanged, and shared without restriction".
The adobe zip contains a whole bunch of other useful and fun profiles,
so I went with it.
For now, this only unzips the USWebCoatedSWOP.icc file though, and
installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In
Serenity builds, this will make it to /res/icc/Adobe/CMYK in the
disk image. And in lagom build, after #23016 this is the
lagom res staging directory that tools can install via
Core::ResourceImplementation. `pdf` and `MacPDF` already do that,
`TestPDF` now does it too.
The final piece is that LibPDF then loads the profile from there
and uses it for DeviceCMYK color conversions.
(Doing file access from the bowels of a library is a bit weird,
especially in a system that has sandboxing built in. But LibGfx does
that in FontDatabase too already, and LibPDF uses that, so it's not a
new problem.)
A local (non-public) PDF I have lying around contains this in
a page's operator stream:
```
[<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29
<00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31
<0057004b> 4 <00480003> -37 <0050
>] TJ
```
That is, there's a newline in a hexstring after a character.
This led to `Parser error at offset 5184: Unexpected character`.
The spec says in 3.2.3 String Objects, Hexadecimal Strings:
"""Each pair of hexadecimal digits defines one byte of the string.
White-space characters (such as space, tab, carriage return, line feed,
and form feed) are ignored."""
But we didn't ignore whitespace before or after a character, only
in between the bytes.
The spec also says:
"""If the final digit of a hexadecimal string is missing—that is, if
there is an odd number of digits—the final digit is assumed to be 0."""
In that case, we were skipping the closing `>` twice -- or, more
accurately, we ignored the character after it too. This has been
wrong all the way back in #6974.
Add a test that fails if either of the two changes isn't present.
Having some rendering test coverage is motivated by #22362, but this
test wouldn't have found the crashes over there (since colorspaces.pdf
does not contain pattern color spaces). Still, good to have some
in-repo test coverage of PDF rendering.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
Manually added an Outlines dict with three items, one each for
every text string encoding in its title.
(Preview.app apparently can't handle UTF-8 in outlines either.)
I didn't find example code for this and the AI assistant did very
poorly on this as well. So I had to write it all by myself!
It can be much more efficient I think, but I think the overall
shape is maybe roughly fine.
* SampledFunction now keeps the StreamObject it gets data from alive
(doesn't matter too much in practice, but does matter in the test,
where nothing else keeps the stream alive).
* If a sample is an integer, we would previously sample that value
twice and then divide by zero when interpolating. Make sure to
sample 1 unit apart.
There were two problems:
1. parse_compressed_object_with_index() parses indirect objects
without going through Parser::parse_indirect_value(), so
push_reference() / pop_reference() weren't called.
Manually call them, both for the indirect object containing
the object stream and for the indirect object within the
object stream.
2. The indirect object within the object stream got decrypted
twice: Once when the object stream data itself got decrypted,
and then incorrectly a second time when the object data within
the stream was read. To fix, disable encryption while parsing
object stream data (since it's already decrypted).
The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/
which according to readme.md at the same location is CC0.
I created this by typing "sup" into TextEdit.app on macOS 13.4,
hitting Cmd-P to bring up the print dialog, clicked the PDF button
at the bottom, changed Title and Author to "sup", clicked
"Security Options…", and checked "Require password to open document"
(with password "sup").
This file tests several things:
- It has a compressed stream as first object. This used to make the
linearization dict detection logic assert.
- It uses AES as encryption key using version 4 of the encryption
dict. This used to not be implemented.
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.
Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Security handlers manage encryption and decription of PDF files. The
standard security handler uses RC4/MD5 to perform its crypto (AES as
well, but that is not yet implemented).
Add a unit test for each sample pdf file that currently exists in the
anon user's `~/Document/pdf` directory.
- linear.pdf
- non-linearized.pdf
- complex.pdf
Each test ensures that the pdf document is parsed and that the page
count is the expected one.