serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-22 04:12:06 +00:00

Author	SHA1	Message	Date
Nico Weber	eb1c99bd72	LibPDF+LibGfx: Make SMasks on jpeg images work SMasks are greyscale images that get used as alpha channel for a different image. JPEGs in PDFs are stored as streams with /DCTDecode filters, and we have a separate code path for loading those in the PDF renderer. That code path just calls our JPEG decoder, which creates bitmaps with format BGRx8888. So when we process an SMask for such a bitmap, we have to change the bitmap's format to BGRA8888 in addition to setting alpha values on all pixels.	2023-11-23 12:13:03 +01:00
Nico Weber	57e2b5ef59	LibPDF+Tests: Correctly decode text strings without explicit encoding	2023-11-22 09:08:06 -07:00
Nico Weber	e39a790c82	LibPDF: Stop converting encodings in object parser Per 1.7 spec 3.8.1, there are multiple logical text string types: * text strings * ASCII strings * byte strings Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0) UTF-8. But byte strings shouldn't be converted but treated as binary data. This makes us no longer convert strings used for drawing page text. TABLE 5.6 "Text-showing operators" lists the operands for text-showing operators as just "string", not "text string" (even though these strings confusingly are called "text strings" in the body text), so not doing this there is correct (and matches other viewers). We also no longer incorrectly convert strings used for cypto data (such as passwords), if they start with an UTF-16BE or UTF-8 marker. No behavior change for outlines and info dict entries. https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of this. (ASCII strings only contain ASCII characters and behave the same anyways.)	2023-11-22 09:08:06 -07:00
Nico Weber	14bcb5219d	LibPDF: Tolerate comments before drawing operators Necessary to be able to render https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf	2023-11-22 08:56:43 +00:00
Nico Weber	9e8cf4fc1a	LibPDF: Tolerate comment after last dict item Necessary to be able to open https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf	2023-11-22 08:56:43 +00:00
Nico Weber	4440452f92	LibPDF: Support images with 1, 2, 4 bits per pixel They just get upsampled to 8 bits per pixel images.	2023-11-18 07:33:15 +00:00
Nico Weber	bfe27228a3	LibPDF+LibGfx: Don't invert CMYK channels in JPEG data in PDFs This is a hack: Ideally we'd have a CMYK Bitmap pixel format, and we'd convert to rgb at blit time. Then we could also apply color profiles (which for CMYK images are CMYK-based). Also, the colors for our CMYK->RGB conversion are off for PDFs, and we have distinct codepaths for this in Gfx::Color (for paths) and JPEGs. So when we fix that, we'll have to fix it in two places. But this doesn't require a lot of code and it's a huge visual progression, so let's go with it for now.	2023-11-17 22:32:40 +00:00
Nico Weber	bd7ae7f91e	LibPDF: Consistently asciibetize CommonNames.h The file wasn't quite decided if it wanted to sort by ascii value or by case folding. Now it uses ascii value, thanks to vim's `:'<,'>sort`. No behavior change.	2023-11-17 20:27:42 +00:00
Nico Weber	29396415d5	LibPDF: Add an initial implementation of type 3 glyph rendering This is a very inefficient implementation: Every time a type 3 font glyph is drawn, we parse its operator stream and execute all the operators therein. We'll want to instead cache the glyphs in bitmaps (at least in most cases), like we do for other fonts. But it's a good first step, and all the coordinate math seems to work in the files I've tested. Good test files from pdfa dataset 0000.zip: - 0000559.pdf page 1 (and 2): Has a non-default font matrix; text appears mirrored if the font matrix isn't handled correctly - 0000425.pdf, page 1: Draws several glyphs in a single run; glyphs overlap if Renderer::render_type3_glyph() ignores the passed-in point - 0000211.pdf, any page: Uses type 3 glyphs for all text. Good perf test (already "reasonably fast") - 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the purple box is a type 3 font glyph, and it's colored (which in part means the first operator is `d0`, while all the other documents above use `d1`)	2023-11-17 19:47:53 +00:00
Nico Weber	14ddab5519	LibPDF: Stub out type3_font_set_glyph_width* Type 3 font glyphs begin with either `d0` or `d1`. If we bail out with an "unsupported" error on the very first operator in a glyph, we'll never paint the glyph. Just stub these out for now. We probably want to do more in here in the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).	2023-11-17 19:47:53 +00:00
Nico Weber	54c98a46d8	LibPDF: Correctly parse the d0 and d1 operators They are the first operator in a type 3 charproc. Operator.h already knew about them, but we didn't manage to parse them, since they're the only two operators that contain a digit.	2023-11-17 19:47:53 +00:00
Nico Weber	5513f8bbe3	LibPDF: Move ScopedState from a function on Renderer into Renderer No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	126a0be595	LibPDF: Pass Renderer to SimpleFont::draw_glyph() This makes it available in Type3Font::draw_glyph(). No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	bcc6439b5f	LibPDF: Pass Renderer to PDFFont::draw_string() It's a bit unfortunate that fonts need to know about the renderer, but type 3 fonts contain PDF drawing operators, so it's necessary. On the bright side, it makes it possible to pass fewer parameters around and compute things locally as needed. (As we implement more fonts, we'll probably want to create some functions to do these computations in a central place, eventually.) No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	e0c0864ddf	LibPDF: Load a few values off a type 3 font dictionary	2023-11-17 19:47:53 +00:00
Nico Weber	9632d8ee49	LibPDF: Make SimpleFont font matrix configurable Type 3 fonts can set it to a custom value.	2023-11-17 19:47:53 +00:00
Nico Weber	4cd1a2d319	LibPDF: Add some scaffolding for type 3 fonts	2023-11-17 19:47:53 +00:00
Nico Weber	7f999b1ff5	LibPDF: Sink m_base_font_name from PDFFont into subclasses /BaseFont is a required key for type 0, type 1, and truetype font dictionaries, but not for type 3 font dictionaries. This is mechanical; type 0 fonts don't even use this yet (but probably should). PDFFont::initialize() is now empty and could be removed, but maybe we'll put stuff there again later, so I'm leaving it around for a bit longer.	2023-11-17 19:47:53 +00:00
Nico Weber	6c1da5db54	LibPDF: Make SimpleFont::draw_glyph() fallible	2023-11-17 19:47:53 +00:00
Nico Weber	843e9daa8c	LibPDF: Remove unused PDFFont::type() This got added in #15270, but its one use then got removed again in #16150. No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	26fd29baf8	LibPDF: Give Type3 fonts a dedicated error message They're described in "5.5.4 Type 3 Fonts" in the PDF 1.7 spec, so we shouldn't `internal_error()` on them. They're just not implemented yet.	2023-11-17 19:47:53 +00:00
Nico Weber	5eaa403ddf	LibPDF: Use font dictionary object as cache key, not resource name In the main page contents, /T0 might refer to a different font than it might refer to in an XObject. So don't use the `Tf` argument as font cache key. Instead, use the address of the font dictionary object. Fixes false cache sharing, and also allows us to share cache entries if the same font dict is referred to by two different names. Fixes a regression from 2340e834cd (but keeps the speed-up intact).	2023-11-17 19:14:39 +01:00
Nico Weber	443b3eac77	LibPDF: Let decode_png_prediction() call LibGfx's unfilter_scanline() It's less code, but it also fixes a bug: The implementation in Filter.cpp used to use the previous byte as reference value, while we're supposed to use the value of the previous channel as reference (at least when a pixel is larger than one byte).	2023-11-17 19:09:50 +01:00
Nico Weber	145ade3a86	LibPDF: Remove a needless AK:: qualification No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	0416a07d56	LibPDF: Make filter byte not part of row in decode_png_prediction() No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	b763960fc2	LibPDF: Convert decode_png_prediction to use spans No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	588d6fab22	LibGfx+LibPDF: Create filter_type() for converting u8 to FilterType ...and use it in LibPDF. No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	7e4fe8e610	LibPDF: Use PNG::paeth_predictor() in png decoding path No behavior change. Ideally, the PDF code would just call a function PNGLoader to do the PNG unfiltering, but let's first try to make the implementations look more similar.	2023-11-17 19:09:50 +01:00
Lucas CHOLLET	1e8004734f	LibPDF: Don't consider the End of Data code as normal ASCII85 input Data encoded with ASCII85 is terminated with the EOD code 0x7E3E. This should not be considered as normal input but rather discarded.	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	59a6d4b7bc	LibPDF: Factorize duplicated code in `Filter::decode_ascii85()`	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	2fe0647c68	LibPDF: Handle pdf-specific white spaces correctly in ASCII85 We were previously only looking the space character but PDF white spaces is a superset of ascii spaces.	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	db08fe12ec	LibPDF: Implement `Reader::is_[eol, whitespace](char)` These two static members are now used to implement respective `matches_` methods but will also be useful to provide a global implementation of the specified concept of whitespace.	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	dac703a0b8	LibPDF: Avoid an unnecessary copy in `Filter::decode_ascii85()`	2023-11-14 10:15:15 +01:00
Nico Weber	9b022239c3	LibPDF: Apply all offsets of TJ operator TJ acts on a list of either strings or numbers. The strings are drawn, and the numbers are treated as offsets. Previously, we'd only apply the last-seen number as offset when we saw a string. That had the effect of us ignoring all but the last number in front of a string, and ignoring numbers at the end of the list. Now, we apply all numbers as offsets. Our rendering of Tests/LibPDF/text.pdf now matches other PDF viewers.	2023-11-14 10:11:09 +01:00
Nico Weber	1c2b0feb7b	LibPDF: Change how CFF optional width prefix is stored Per 5177.Type2.pdf 3.1 "Type 2 Charstring Organization", a glyph's charstring looks like: w? {hs* vs* cm* hm* mt subpath}? {mt subpath}* endchar The `w?` is the width of the glyph, but it's optional. So all possible commands after it (hstem* vstem* cntrmask hintmask moveto endchar) check if there's an extra number at the start and interpret it as a width, for the very first command we read. This was done by having an `is_first_command` local bool that got set to false after the first command. That didn't work with subrs: If the first command was a call to a subr that just pushed a bunch of numbers, then the second command after it is the actual first command. Instead, move that bool into the state. Set it to false the first time we try to read a width, since that means we just read a command that could've been prefixed by a width.	2023-11-14 10:10:34 +01:00
Lucas CHOLLET	9e4d697d23	LibPDF: Detect DCT images correctly Images can have multiple filters, each one of them is processed sequentially. Only the last one will be relevant for the image format (DCT or JPXDecode), so use the last filter instead of the first one to detect that property.	2023-11-13 10:30:34 -05:00
Nico Weber	f882a3ae37	LibPDF: In ColorSpace creation code, use resolve_to() more For valid PDFs, this makes no difference. For invalid PDFs, we now assert during the cast in resolve_to() instead of returning a PDFError. However, most PDFs are valid, and even for invalid PDFs, we'd previously keep the old color space around when getting the PDF error and then usually assert later when the old color space got passed a color with an unexpected number of components (since the components were for the new color space). Doesn't affect any of the > 2000 PDFs I use for testing locally, is less code, and should make for less surprising asserts when it does happen.	2023-11-13 10:29:26 -05:00
Lucas CHOLLET	9bc25db9a3	LibPDF: Add support for the LZW filter This allows us to decode the first page of ThinkingInPostScript.pdf :^)	2023-11-13 14:23:23 +01:00
Lucas CHOLLET	048ef11136	LibPDF: Factorize flate parameters handling to its own function This part will be shared with the LZW filter, so let's factorize it.	2023-11-13 14:23:23 +01:00
Nico Weber	bbde3cbc90	LibPDF: Tolerate an indirect object as dict for CIE-based color spaces Namely, for CalGrayColorSpace, CalRGBColorSpace, LabColorSpace. Fixes a crash rendering any page of Adobe's 5014.CIDFont_Spec.pdf (which uses CalRGBColorSpace with an indirect dict: The dict is object `92 0`, and many color spaces are inline objects referring to it).	2023-11-13 07:12:05 -05:00
Nico Weber	f4a847894f	LibPDF: Make SampledFunction::evaluate() work for n-dimensional input I didn't find example code for this and the AI assistant did very poorly on this as well. So I had to write it all by myself! It can be much more efficient I think, but I think the overall shape is maybe roughly fine.	2023-11-12 07:55:04 +01:00
Nico Weber	a9ef65e64a	LibPDF: For multi-output SampledFunctions, fix output colors For N outputs, the outputs aren't stored in N independent planes. Instead, N output values are stored right next to each other in the stream data.	2023-11-11 08:55:37 +01:00
Nico Weber	ec739460e0	LibPDF: Add test for SampledFunction and fix bugs found by it * SampledFunction now keeps the StreamObject it gets data from alive (doesn't matter too much in practice, but does matter in the test, where nothing else keeps the stream alive). * If a sample is an integer, we would previously sample that value twice and then divide by zero when interpolating. Make sure to sample 1 unit apart.	2023-11-11 08:55:37 +01:00
Nico Weber	323ba7404c	LibPDF: Implement SampledFunction::evaluate() for some sampled functions Things now work for functions that are all of: * linear * 1-D input * 8 bits per sample	2023-11-10 15:03:30 +00:00
Nico Weber	fd1876441a	LibPDF: Implement SampledFunction::create()	2023-11-10 15:03:30 +00:00
Nico Weber	cd9f4655ec	LibPDF: Tweak implementation of postscript `roll` op Since positive offsets roll to the right, it makes more sense to do the big reverse first. Gets rid of an awkward minus sign. No behavior change.	2023-11-10 14:45:38 +01:00
Nico Weber	b23ed86889	LibPDF: Implement StitchingFunction::evaluate()	2023-11-10 14:45:16 +01:00
Nico Weber	ba34ddeb21	LibPDF: Implement StitchingFunction creation	2023-11-10 14:45:16 +01:00
Nico Weber	5af6e1c042	LibPDF: Implement DeviceNColorSpace	2023-11-09 23:33:49 +01:00
Nico Weber	0f07049935	LibPDF: Add ColorSpaceFamily::operator== No behavior change.	2023-11-09 23:33:49 +01:00

1 2 3 4 5 ...

490 commits