Issue
I currently have a single page PDF (http://reljac.com/so_1/all.pdf) which is a basic scan of several paper receipts. If you look at the PDF the text is clear and legible. The original is a scan of an 8.5" x 11" sheet of paper (shouldn't matter)
I've created a very simple file to convert that PDF into a .jpg using this code:
<?php
$im = new imagick('all.pdf[0]');
$im->setImageFormat('jpg');
$im->setImageCompression(imagick::COMPRESSION_LOSSLESSJPEG);
$im->setImageCompressionQuality(80);
header('Content-Type: image/jpeg');
echo $im;
?>
When I run that (http://reljac.com/so_1/pdf_jpg.php) the resulting image is illegible.
I'm working off of two servers at the moment, one tells me:
Version: ImageMagick 6.2.8 10/06/10 Q16 file:/usr/share/ImageMagick-6.2.8/doc/index.html
the other:
Version: ImageMagick 6.6.0-4 2012-05-02 Q16 http://www.imagemagick.org
Both servers create a similar quality .jpg
I've changed several of the settings including:
$im->setImageCompressionQuality(40);
$im->setImageCompressionQuality(100);
$im->setImageCompressionQuality(80);
$im->setImageCompression(imagick::COMPRESSION_JPEG);
(various others from http://www.php.net/manual/en/imagick.constants.php)
I've tried adding $im->scaleImage(600,0);
Nothing seems to make anything more legible. I'd like the end result to be a legible .jpg of the original PDF - it does not have to fill the screen, it just needs to be legible. The original PDFs may be different sizes so I need to keep in mind that the source is not always 8.5" x 11".
Is there anything else I can do to enhance the quality of the resulting image or is this the best I should expect? Do I need to process these files in some other way to get a better image?
UPDATE Based on @VadimR's answer I'm now using the following:
$src = 'all.pdf';
$src_parts = pathinfo($src);
shell_exec('pdfimages ' . $src . ' ' . $src_parts['filename']);
shell_exec('convert ' . $src_parts['filename'] . '-000.pbm -resize 25% -sharpen -2 ' . $src_parts['filename'] . '.jpg');
$myImage = imagecreatefromjpeg($src_parts['filename'] . '.jpg');
header("Content-type: image/jpeg");
imagejpeg($myImage);
imagedestroy($myImage);
shell_exec('rm ' . $src_parts['filename'] . '-000.pbm');
That results in a nice, legible image.
Solution
ImageMagick delegates PDF rendering to Ghostscript, therefore for troubleshooting specify not only IM, but GS version, too, if necessary. Second, I think it's better to start with command line, and only after appropriate quality is achieved, put it into php code.
Command that gives quality (more or less):
convert -density 300 all.pdf out.jpg
Here we set rendering resolution 300 dpi. Note, it's not the same as
convert all.pdf -density 300 out.jpg
because here rendering goes at 72 dpi, then bad quality result is assigned (i.e. w/o resampling) with 300 dpi.
But, I think better approach can be to extract scans as is i.e. without transformations:
pdfimages all.pdf all
that gives all-000.pbm image -- 1-bit per sample, 3424*4400 px. I definitely can't agree, that "text is clear and legible" - some digits can only be guessed.
Then use convert
command to resample and maybe try to improve e.g.
convert all-000.pbm -resize 25% -sharpen 2 out.jpg
Answered By - user2846289 Answer Checked By - Marie Seifert (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.