serenity

mirror of https://github.com/RGBCube/serenity synced 2025-05-31 11:18:11 +00:00

Author	SHA1	Message	Date
Nico Weber	071f890847	LibPDF: Require whitespace in front of inline image marker EI Fixes a crash on page 3 of 0000450.pdf of 0000.zip, where we previously started interpreting the middle of an inline image content stream as operators, since it contained `EI` in its pixel data.	2023-12-11 10:50:39 +01:00
Nico Weber	27aae7e2b1	LibPDF: Parse inline image key-value pairs Not used for anything yet.	2023-12-11 10:50:39 +01:00
Nico Weber	0912896ae0	LibPDF: Extract Parser::parse_dict_contents_until() No behavior change.	2023-12-11 10:50:39 +01:00
Kyle Pereira	8c7fc4fe6c	LibPDF: Offset PaintStyle when painting so pattern overlaps properly	2023-12-10 16:44:24 +01:00
Kyle Pereira	8ff87911a3	LibPDF: Add basic tiled, coloured pattern rendering	2023-12-10 16:44:24 +01:00
Kyle Pereira	8191f2b47a	LibPDF: Add parameter for background color of render	2023-12-10 16:44:24 +01:00
Kyle Pereira	60c4803dd3	LibPDF: Pass Renderer to ColorSpace	2023-12-10 16:44:24 +01:00
Kyle Pereira	082a4197b6	LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces This is in anticipation of Pattern color space support which does not yield a simple color.	2023-12-10 16:44:24 +01:00
Kyle Pereira	e4b8d68039	LibPDF: Permit comments at the end of a stream	2023-12-10 16:44:24 +01:00
Nico Weber	8b50b689f9	LibPDF: Reject invalid "hival" values Doesn't fire on any of the PDFs I have, and seems like a good thing to check.	2023-12-07 08:10:40 +00:00
Nico Weber	43cd3d7dbd	LibPDF: Tolerate palettes that are one byte too long Fixes these errors from `Meta/test_pdf.py path/to/0000`, with 0000 being 0000.zip from the PDF/A corpus in unzipped: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00
Nico Weber	832a065687	LibPDF: For low-bpp images, start scanlines on byte boundaries Required per spec, and we get slanted images without it. Fixes e.g. page 1 of 0000749.pdf.	2023-12-07 08:10:40 +00:00
Nico Weber	06b9633da5	LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors to 8-bit nicely, but is the wrong thing to do for palette indices. Stop doing this for palette indices. Fixes "Indexed color space index out of range" for 11 files in the PDF/A 0000.zip test set now that we correctly handle palette indices as of the previous commit: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00
Nico Weber	8733ba2734	LibPDF: Fix decoding of IndexedColorSpace for palette sizes != 255 Previously, we were scaling palette indices from 0..(palette_size - 1) to 0..255 before using them as index into the palette. Instead, do not scale palette indices before using them as indices. (Renderer::load_image() uses `component_value_decoders.empend( .0f, 255.0f, dmin, dmax)`, so to get an identity mapping, we have to return `0, 255` from IndexedColorSpace::default_decode()). Fixes rendering of the gradient on page 5 of 0000277.pdf.	2023-12-06 15:32:13 +01:00
Nico Weber	4cb0593daf	LibPDF: Convert LAB values to bytes differently Gfx::ICC::Profile's current API takes bytes, so we need to do some contortions for LAB values to go through. This will probably become nicer once we implement all the backward transforms in Gfx::ICC::Profile, but for now let's hack it in on the LibPDF side. Makes colors in 0000651.pdf looks good, especially on pages 1 and 7-12.	2023-12-05 11:36:44 -05:00
Nico Weber	b2a1130556	LibGfx/ICC: Implement conversion between different connection spaces If one profile uses PCSXYZ and the other PCSLAB as connection space, we now do the necessary XYZ/LAB conversion. With this and the previous commits, we can now convert from profiles that use PCSLAB with mAB, such as stress.jpeg from https://littlecms.com/blog/2020/09/09/browser-check/ : % Build/lagom/icc --name sRGB --reencode-to serenity-sRGB.icc % Build/lagom/bin/image -o out.png \ --convert-to-color-profile serenity-sRGB.icc \ ~/src/jpegfiles/stress.jpeg	2023-12-04 08:02:36 +00:00
Nico Weber	1c88b82dfc	LibPDF: Do less work in SampledFunction::evaluate()'s inner loop Instead of recomputing the left index and the float amount in that interval for each coordinate all the time, do it once when we preprocess the input coordinates. One line less, faster, and arguably easier to read. No behavior change.	2023-12-02 22:26:13 +01:00
Nico Weber	54883b7d41	LibPDF: Remove get_bounds lambda in SampledFunction::evaluate() Using `min()` to guarantee the left index is never == `size() - 1`, even for an interpolation value of 1.0, is less code, and arguably easier to understand as well. No behavior change.	2023-12-02 22:26:13 +01:00
Nico Weber	d9fd72007e	LibPDF: Add a spec comment to SampledFunction::sample()	2023-12-02 22:26:13 +01:00
Idan Horowitz	aad5c58996	LibPDF: Eliminate reference cycle between OutlineItem parent/children Since all parents held a reference pointer to their children, and all children held reference pointers to their parents, both objects would never get free'd once the document was no longer being used. Fixes ossfuzz-63833.	2023-12-02 22:23:53 +01:00
Lucas CHOLLET	2a5cb5becb	LibCompress: Add `LZWDecoder::decode_all()` This method takes bytes as input and decompress everything to a ByteBuffer. It uses two control codes (clear and end of data) as described in the GIF, TIFF and PDF specifications.	2023-12-01 12:58:14 +01:00
Nico Weber	f34da6396f	LibPDF: Update font size after getting font from cache Page 1 of 0000277.pdf does: BT 22 0 0 22 59 28 Tm /TT2 1 Tf (Presented at Photonics West OPTO, February 17, 2016) Tj ET BT 32 0 0 32 269 426 Tm /TT1 1 Tf (Robert W. Boyd) Tj ET BT 22 0 0 22 253 357 Tm /TT2 1 Tf (Department of Physics and) Tj ET BT 22 0 0 22 105 326 Tm /TT2 1 Tf (Max-Planck Centre for Extreme and Quantum Photonics) Tj ET Every line begins a text operation, then updates the font matrix, selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text operation. `Tm` (which sets the font matrix) contains a scale, and uses that to update the font size of the currently-active font (cf #20084). But in this file, we `Tm` first and `Tf` (font selection) second, so this updates the size of the old font. So when we pull it out of the cache again on line 3, it would still have the old size from the `Tm` on line 2. (The whole text scaling logic in LibPDF imho needs a rethink; the current approach also causes issues with zero-width glyphs which currently lead to divisions by zero. But that's for another PR.) Fixes another regression from `c8510b58a3` (which I've accidentally referred to by 2340e834cd in another commit).	2023-11-26 19:05:13 -05:00
Nico Weber	eb1c99bd72	LibPDF+LibGfx: Make SMasks on jpeg images work SMasks are greyscale images that get used as alpha channel for a different image. JPEGs in PDFs are stored as streams with /DCTDecode filters, and we have a separate code path for loading those in the PDF renderer. That code path just calls our JPEG decoder, which creates bitmaps with format BGRx8888. So when we process an SMask for such a bitmap, we have to change the bitmap's format to BGRA8888 in addition to setting alpha values on all pixels.	2023-11-23 12:13:03 +01:00
Nico Weber	57e2b5ef59	LibPDF+Tests: Correctly decode text strings without explicit encoding	2023-11-22 09:08:06 -07:00
Nico Weber	e39a790c82	LibPDF: Stop converting encodings in object parser Per 1.7 spec 3.8.1, there are multiple logical text string types: * text strings * ASCII strings * byte strings Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0) UTF-8. But byte strings shouldn't be converted but treated as binary data. This makes us no longer convert strings used for drawing page text. TABLE 5.6 "Text-showing operators" lists the operands for text-showing operators as just "string", not "text string" (even though these strings confusingly are called "text strings" in the body text), so not doing this there is correct (and matches other viewers). We also no longer incorrectly convert strings used for cypto data (such as passwords), if they start with an UTF-16BE or UTF-8 marker. No behavior change for outlines and info dict entries. https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of this. (ASCII strings only contain ASCII characters and behave the same anyways.)	2023-11-22 09:08:06 -07:00
Nico Weber	14bcb5219d	LibPDF: Tolerate comments before drawing operators Necessary to be able to render https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf	2023-11-22 08:56:43 +00:00
Nico Weber	9e8cf4fc1a	LibPDF: Tolerate comment after last dict item Necessary to be able to open https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf	2023-11-22 08:56:43 +00:00
Nico Weber	4440452f92	LibPDF: Support images with 1, 2, 4 bits per pixel They just get upsampled to 8 bits per pixel images.	2023-11-18 07:33:15 +00:00
Nico Weber	bfe27228a3	LibPDF+LibGfx: Don't invert CMYK channels in JPEG data in PDFs This is a hack: Ideally we'd have a CMYK Bitmap pixel format, and we'd convert to rgb at blit time. Then we could also apply color profiles (which for CMYK images are CMYK-based). Also, the colors for our CMYK->RGB conversion are off for PDFs, and we have distinct codepaths for this in Gfx::Color (for paths) and JPEGs. So when we fix that, we'll have to fix it in two places. But this doesn't require a lot of code and it's a huge visual progression, so let's go with it for now.	2023-11-17 22:32:40 +00:00
Nico Weber	bd7ae7f91e	LibPDF: Consistently asciibetize CommonNames.h The file wasn't quite decided if it wanted to sort by ascii value or by case folding. Now it uses ascii value, thanks to vim's `:'<,'>sort`. No behavior change.	2023-11-17 20:27:42 +00:00
Nico Weber	29396415d5	LibPDF: Add an initial implementation of type 3 glyph rendering This is a very inefficient implementation: Every time a type 3 font glyph is drawn, we parse its operator stream and execute all the operators therein. We'll want to instead cache the glyphs in bitmaps (at least in most cases), like we do for other fonts. But it's a good first step, and all the coordinate math seems to work in the files I've tested. Good test files from pdfa dataset 0000.zip: - 0000559.pdf page 1 (and 2): Has a non-default font matrix; text appears mirrored if the font matrix isn't handled correctly - 0000425.pdf, page 1: Draws several glyphs in a single run; glyphs overlap if Renderer::render_type3_glyph() ignores the passed-in point - 0000211.pdf, any page: Uses type 3 glyphs for all text. Good perf test (already "reasonably fast") - 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the purple box is a type 3 font glyph, and it's colored (which in part means the first operator is `d0`, while all the other documents above use `d1`)	2023-11-17 19:47:53 +00:00
Nico Weber	14ddab5519	LibPDF: Stub out type3_font_set_glyph_width* Type 3 font glyphs begin with either `d0` or `d1`. If we bail out with an "unsupported" error on the very first operator in a glyph, we'll never paint the glyph. Just stub these out for now. We probably want to do more in here in the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).	2023-11-17 19:47:53 +00:00
Nico Weber	54c98a46d8	LibPDF: Correctly parse the d0 and d1 operators They are the first operator in a type 3 charproc. Operator.h already knew about them, but we didn't manage to parse them, since they're the only two operators that contain a digit.	2023-11-17 19:47:53 +00:00
Nico Weber	5513f8bbe3	LibPDF: Move ScopedState from a function on Renderer into Renderer No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	126a0be595	LibPDF: Pass Renderer to SimpleFont::draw_glyph() This makes it available in Type3Font::draw_glyph(). No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	bcc6439b5f	LibPDF: Pass Renderer to PDFFont::draw_string() It's a bit unfortunate that fonts need to know about the renderer, but type 3 fonts contain PDF drawing operators, so it's necessary. On the bright side, it makes it possible to pass fewer parameters around and compute things locally as needed. (As we implement more fonts, we'll probably want to create some functions to do these computations in a central place, eventually.) No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	e0c0864ddf	LibPDF: Load a few values off a type 3 font dictionary	2023-11-17 19:47:53 +00:00
Nico Weber	9632d8ee49	LibPDF: Make SimpleFont font matrix configurable Type 3 fonts can set it to a custom value.	2023-11-17 19:47:53 +00:00
Nico Weber	4cd1a2d319	LibPDF: Add some scaffolding for type 3 fonts	2023-11-17 19:47:53 +00:00
Nico Weber	7f999b1ff5	LibPDF: Sink m_base_font_name from PDFFont into subclasses /BaseFont is a required key for type 0, type 1, and truetype font dictionaries, but not for type 3 font dictionaries. This is mechanical; type 0 fonts don't even use this yet (but probably should). PDFFont::initialize() is now empty and could be removed, but maybe we'll put stuff there again later, so I'm leaving it around for a bit longer.	2023-11-17 19:47:53 +00:00
Nico Weber	6c1da5db54	LibPDF: Make SimpleFont::draw_glyph() fallible	2023-11-17 19:47:53 +00:00
Nico Weber	843e9daa8c	LibPDF: Remove unused PDFFont::type() This got added in #15270, but its one use then got removed again in #16150. No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	26fd29baf8	LibPDF: Give Type3 fonts a dedicated error message They're described in "5.5.4 Type 3 Fonts" in the PDF 1.7 spec, so we shouldn't `internal_error()` on them. They're just not implemented yet.	2023-11-17 19:47:53 +00:00
Nico Weber	5eaa403ddf	LibPDF: Use font dictionary object as cache key, not resource name In the main page contents, /T0 might refer to a different font than it might refer to in an XObject. So don't use the `Tf` argument as font cache key. Instead, use the address of the font dictionary object. Fixes false cache sharing, and also allows us to share cache entries if the same font dict is referred to by two different names. Fixes a regression from 2340e834cd (but keeps the speed-up intact).	2023-11-17 19:14:39 +01:00
Nico Weber	443b3eac77	LibPDF: Let decode_png_prediction() call LibGfx's unfilter_scanline() It's less code, but it also fixes a bug: The implementation in Filter.cpp used to use the previous byte as reference value, while we're supposed to use the value of the previous channel as reference (at least when a pixel is larger than one byte).	2023-11-17 19:09:50 +01:00
Nico Weber	145ade3a86	LibPDF: Remove a needless AK:: qualification No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	0416a07d56	LibPDF: Make filter byte not part of row in decode_png_prediction() No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	b763960fc2	LibPDF: Convert decode_png_prediction to use spans No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	588d6fab22	LibGfx+LibPDF: Create filter_type() for converting u8 to FilterType ...and use it in LibPDF. No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	7e4fe8e610	LibPDF: Use PNG::paeth_predictor() in png decoding path No behavior change. Ideally, the PDF code would just call a function PNGLoader to do the PNG unfiltering, but let's first try to make the implementations look more similar.	2023-11-17 19:09:50 +01:00

... 2 3 4 5 6 ...

662 commits