Monday, September 19, 2022

[FIXED] How do I create a better quality image converting a .pdf to a .jpg with Imagick/PHP?

Issue

I currently have a single page PDF (http://reljac.com/so_1/all.pdf) which is a basic scan of several paper receipts. If you look at the PDF the text is clear and legible. The original is a scan of an 8.5" x 11" sheet of paper (shouldn't matter)

I've created a very simple file to convert that PDF into a .jpg using this code:

<?php     
    $im = new imagick('all.pdf[0]');
    $im->setImageFormat('jpg');
    $im->setImageCompression(imagick::COMPRESSION_LOSSLESSJPEG); 
    $im->setImageCompressionQuality(80);
    header('Content-Type: image/jpeg');
    echo $im;
?>

When I run that (http://reljac.com/so_1/pdf_jpg.php) the resulting image is illegible.

I'm working off of two servers at the moment, one tells me:

Version: ImageMagick 6.2.8 10/06/10 Q16 file:/usr/share/ImageMagick-6.2.8/doc/index.html

the other:

Version: ImageMagick 6.6.0-4 2012-05-02 Q16 http://www.imagemagick.org

Both servers create a similar quality .jpg

I've changed several of the settings including:

I've tried adding $im->scaleImage(600,0);

Nothing seems to make anything more legible. I'd like the end result to be a legible .jpg of the original PDF - it does not have to fill the screen, it just needs to be legible. The original PDFs may be different sizes so I need to keep in mind that the source is not always 8.5" x 11".

Is there anything else I can do to enhance the quality of the resulting image or is this the best I should expect? Do I need to process these files in some other way to get a better image?

UPDATE Based on @VadimR's answer I'm now using the following:

$src = 'all.pdf';
$src_parts = pathinfo($src);

shell_exec('pdfimages ' . $src . ' ' . $src_parts['filename']);
shell_exec('convert ' . $src_parts['filename'] . '-000.pbm -resize 25% -sharpen -2 ' . $src_parts['filename'] . '.jpg');

$myImage = imagecreatefromjpeg($src_parts['filename'] . '.jpg');
header("Content-type: image/jpeg");
imagejpeg($myImage);
imagedestroy($myImage);

shell_exec('rm ' . $src_parts['filename'] . '-000.pbm');

That results in a nice, legible image.


Solution

ImageMagick delegates PDF rendering to Ghostscript, therefore for troubleshooting specify not only IM, but GS version, too, if necessary. Second, I think it's better to start with command line, and only after appropriate quality is achieved, put it into php code.

Command that gives quality (more or less):

convert -density 300 all.pdf out.jpg

Here we set rendering resolution 300 dpi. Note, it's not the same as

convert all.pdf -density 300 out.jpg

because here rendering goes at 72 dpi, then bad quality result is assigned (i.e. w/o resampling) with 300 dpi.

But, I think better approach can be to extract scans as is i.e. without transformations:

pdfimages all.pdf all

that gives all-000.pbm image -- 1-bit per sample, 3424*4400 px. I definitely can't agree, that "text is clear and legible" - some digits can only be guessed.

Then use convert command to resample and maybe try to improve e.g.

convert all-000.pbm -resize 25% -sharpen 2 out.jpg


Answered By - user2846289
Answer Checked By - Marie Seifert (PHPFixing Admin)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.