Bug 1893 - tiff2pdf produce unreadable file if TIFF is compressed with JPEG
: tiff2pdf produce unreadable file if TIFF is compressed with JPEG
Status: RESOLVED LATER
: libtiff
default
: 4.0.0
: PC Linux
: P2 normal
: ---
Assigned To:
:
:
: migrated_to_gitlab
:
:
  Show dependency treegraph
 
Reported: 2008-06-07 21:40 by
Modified: 2019-10-01 14:13 (History)


Attachments
8x8 grayscale jpeg-compressed tiff (649 bytes, image/tiff)
2008-06-07 21:44, Jay Berkenbilt
Details
200x200 grayscale jpeg-compressed tiff (1.23 KB, image/tiff)
2008-06-07 21:45, Jay Berkenbilt
Details
PDf generated from gray-200x200.tiff (1.99 KB, application/pdf)
2008-06-07 21:45, Jay Berkenbilt
Details
debian user's original tiff file from debian bug report (235.63 KB, text/plain)
2008-06-07 22:38, Jay Berkenbilt
Details
PDF generated from gray-200x200.tiff -- updated to 4.0.0 beta 2 (1.98 KB, application/pdf)
2008-06-07 22:41, Jay Berkenbilt
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-06-07 21:40:03
This bug had been in the previous bugzilla.  I'm resubmitting it here.

This is debian bug 425778.  For the original bug report, please see
http://bugs.debian.org/425778.  It is a complete and well-written bug report.

In this bug report, the user has a specific jpeg-compressed tiff which when
converted to a PDF results in a PDF with a bad image.  I was trying to create a
smaller tiff file with interesting results.  The first one was just an 8x8 gray
square.  Running this through tiff2pdf resulted in a core dump.  

tiff2pdf -o gray-8x8.pdf gray-8x8.tiff

The second one was a 200x200 gray square.  This one displays properly in acrobat
reader with warnings, but the PDF file is invalid.  I'm attaching my two tiff
files and the broken PDF.  For the original file, please see the debian bug
report or grab the tiff file from

http://eppesuigoccas.homedns.org/~giuseppe/libtiff-tools.tiff2pdf.bug.tar.bz2
------- Comment #1 From 2008-06-07 21:44:39 -------
Created an attachment (id=232) [details]
8x8 grayscale jpeg-compressed tiff
------- Comment #2 From 2008-06-07 21:45:19 -------
Created an attachment (id=233) [details]
200x200 grayscale jpeg-compressed tiff
------- Comment #3 From 2008-06-07 21:45:56 -------
Created an attachment (id=234) [details]
PDf generated from gray-200x200.tiff
------- Comment #4 From 2008-06-07 22:01:19 -------
Testing this with 4.0.0 beta 2, I still see the crash on the 8x8 image, but the
200x200 image generates a valid PDF file.  The color is not correct, but I will
be reporting that problem in a separate bug report.
------- Comment #5 From 2008-06-07 22:36:50 -------
Never mind about the separate bug report.  The original user's tiff file still
doesn't work with 4.0.0 beta 2.  I will attach the user's file here.
------- Comment #6 From 2008-06-07 22:38:37 -------
Created an attachment (id=235) [details]
debian user's original tiff file from debian bug report
------- Comment #7 From 2008-06-07 22:39:51 -------
I'm replacing gray-200x200.pdf with one generated by 4.0.0 beta 2.  The file no
longer has a corrupted xref table, but it shows as all white instead of gray.
------- Comment #8 From 2008-06-07 22:41:23 -------
Created an attachment (id=236) [details]
PDF generated from gray-200x200.tiff -- updated to 4.0.0 beta 2
------- Comment #9 From 2009-12-29 01:17:57 -------
See bug 2135.  That may help.
------- Comment #10 From 2010-08-29 16:11:41 -------
This problem, or something much like it, also exists in 3.9.4 according to
https://bugzilla.redhat.com/show_bug.cgi?id=628261

I thought at first that tiff2pdf might be choking on YCbCr input, but it fails
in the same way with grayscale JPEG input.
------- Comment #11 From 2010-08-29 18:41:30 -------
I looked into the tiff2pdf.c source code, and find that it's not surprising
that it fails to convert most JPEG-compressed TIFFs; rather, the astonishing
thing is that there are any cases at all where it appears to work even a little
bit :-(

The basic problem is that it's got an unworkable scheme for sewing together
multiple JPEG-compressed strips into a single output JPEG stream. 
t2p_process_jpeg_strip tries to do that by editorializing on the strip marker
contents, but it does so quite incompetently.  It will end up emitting SOI and
EOI markers for each strip, not just one pair, which I think is the core thing
that's making most readers fall over; although emitting DHT and DQT markers in
the midst of compressed data isn't legal per spec either, and it also fails
entirely on markers longer than 255 bytes, and it won't work at all if the
incoming data uses restart markers, and there are probably some other bugs in
that comment-free excuse for code as well.  Now these points (other than the
restart issue) could probably be fixed up with a little bit of hacking, but it
is still fundamentally Not Gonna Work unless all the strips use identical
DQT/DHT definitions --- an assumption explicitly outlawed in TIFF Tech Note #2.

I'm not real sure if it's worth the marginal hacking to make it work when that
assumption does hold, which it probably does for the vast majority of
real-world JPEG TIFFs.  A proper fix would involve restructuring so that each
strip is emitted as a separate PDF image object with only minimal modification
of the JPEG datastreams, the way tiles are handled.  I find this code
sufficiently unreadable that I haven't tried hard to see what that would take.

BTW, the tile case is hardly problem-free either, see bug #1960.
------- Comment #12 From 2010-08-30 00:09:54 -------
I was using single strip TIFFs which is probably why it's working for me with
3.9.4.
------- Comment #13 From 2010-12-12 17:39:10 -------
Until this multi-strip/tile problem using the -n option can be a work-around
(although recompressing with jpeg is a problem.  (See bug 2150.)
------- Comment #14 From 2016-08-02 04:28:13 -------
(In reply to comment #11)
> The basic problem is that it's got an unworkable scheme for sewing together
> multiple JPEG-compressed strips into a single output JPEG stream.

Hmm, isn't the right thing to do is to embed each tile/strip as a separate
image object in the PDF and position them next to each other on the page? This
is probably the only way to go without recompression, but will have
interpolation artifacts on the tile boundaries.

Opinions?
------- Comment #15 From 2019-10-01 14:13:54 -------
Bugzilla is no longer used for tracking libtiff issues. Remaining open tickets,
such as this one, have been migrated to the libtiff GitLab instance at
https://gitlab.com/libtiff/libtiff/issues .

The migrated tickets have their summary prefixed with [BZ#XXXX] where XXXX is
the initial Bugzilla issue number.