1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-20 07:05:08 +00:00
Commit graph

10 commits

Author SHA1 Message Date
Nico Weber
af20ebe4a0 LibGfx/JBIG2: Scan for page size of page "1"
Sounds like the spec guarantees that that's the number of the first
page.

(In practice, all but one of all jbig2 files I've found contain just
page 1. PDFs almost always contain just page 1, and very rarely a
page 0 for globally shared parameters.)
2024-03-10 10:10:55 -04:00
Nico Weber
8f4930f2df LibGfx/JBIG2: Scan for the first PageInformation segment and decode it
This allows `file` to correctly print the dimensions of a .jbig2 file,
and it allows us to write a test that covers much of all the code
written so far.
2024-03-09 16:01:22 +01:00
Nico Weber
1eaaa8c3e9 LibPDF+LibGfx: Support JBIG2s with /JBIG2Globals set
Several ramifications:

* /JBIG2Globals is an indirect reference, which means we now need
  a Document for unfiltering. (Technically, other decode parameters
  can also be indirect objects and we should use the Document to
  resolve() those too, but in practice it only seems to be needed
  for /JBIG2Globals.)

* Since /JBIG2Globals are so rare, we just parse once for each
  image that use them, and decode_embedded() now receives a
  Vector<ReadonlyBytes> with all sections of sequences of
  segments.

* Internally, decode_segment_headers() is now called several times
  for embedded JBIG2s with multiple such sections (e.g. PDFs with
  /JBIG2Globals).

* That means `data` is now no longer part of JBIG2LoadingContext
  and things get slightly reshuffled due to this.

This completes the LibPDF part of JBIG2 support. Once LibGfx
implements actual decoding of JBIG2s, things should start to
Just Work in PDFs.
2024-03-09 16:01:22 +01:00
Nico Weber
09ca66cb8b LibGfx/JBIG2: Scan for end of immediate generic regions of unknown size
Found e.g. in 0000033.pdf (both pages).
2024-03-09 16:01:22 +01:00
Nico Weber
379ef45688 LibGfx/JBIG2: Store location of segment data bodies
They're in different places for Sequential/Embedded (right after
the header) and RandomAccess (which has all headers first, followed
by all data bits next).

We don't do anything with the data yet, but now everything's in
place to actually process segment data.
2024-03-09 16:01:22 +01:00
Nico Weber
953f6c5d9b LibPDF+LibGfx: Pass jbig2-filtered data to JBIG2ImageDecoderPlugin
Except for /JBIG2Globals, which we bail out on for now. In my 1000
files, 13 use JBIG2, and of those, 2 use JBIG2Globals (0000372.pdf e.g.
page 11 and 0000857.pdf e.g. page 1), and only one (the latter) of the
two uses the same JBIG2Globals stream for more than a single image.

JBIG2ImageDecoderPlugin cannot decode the data yet, so no behavior
change, but with `#define JBIG2_DEBUG 1` at the top of that file,
it now prints segment header info for PDFs containing JBIG2 data :^)
2024-03-09 16:01:22 +01:00
Nico Weber
b1fdc33a22 LibGfx/JBIG2: Decode all segment headers
With `#define JBIG2_DEBUG 1` at the top of the file:

    % Build/lagom/bin/image --no-output \
       .../JBIG2_ConformanceData-A20180829/F01_200_TT10.jb2
    JBIG2LoadingContext: Organization: 0 (Sequential)
    JBIG2LoadingContext: Number of pages: 1
    Segment number: 0
    Segment type: 48
    Referred to segment count: 0
    Segment page association: 1
    Segment data length: 19
    Segment number: 1
    Segment type: 39
    Referred to segment count: 0
    Segment page association: 1
    Segment data length: 12666
    Segment number: 2
    Segment type: 49
    Referred to segment count: 0
    Segment page association: 1
    Segment data length: 0
    Runtime error: JBIG2ImageDecoderPlugin: Draw the rest of the owl
2024-03-09 16:01:22 +01:00
Nico Weber
177664cfae LibGfx/JBIG2: Add an initial decode_segment_header()
With `#define JBIG2_DEBUG 1` at the top of the file:

    % Build/lagom/bin/image --no-output \
        .../JBIG2_ConformanceData-A20180829/F01_200_TT10.jb2
    JBIG2LoadingContext: Organization: 0 (Sequential)
    JBIG2LoadingContext: Number of pages: 1
    Segment number: 0
    Segment type: 48
    Referred to segment count: 0
    Segment page association: 1
    Segment data length: 19
    Runtime error: JBIG2ImageDecoderPlugin: Draw the rest of the owl
2024-03-09 16:01:22 +01:00
Nico Weber
5cefcad2fe LibGfx/JBIG2: Decode the file header
Running `image` with `#define JBIG2_DEBUG 1` now prints number of pages.
2024-03-09 16:01:22 +01:00
Nico Weber
58838db445 LibGfx: Add the start of a JBIG2 loader
JBIG2 is infamous for two things:

1. It's used in xerox scanners were it falsifies scanned numbers:

https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning

2. It was allegedly used in an iOS zero day, in a very cool way:

https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

Needless to say, we need support for it in Serenity. (...because it's
used in PDF files.)

This adds all the scaffolding, but no actual implementation yet.

It's enough for `file` to print the mime type of .jb2 files, but `image`
can't do anything with the files yet.
2024-03-09 16:01:22 +01:00