Again, just documenting some code for myself in case I find myself in this situation 10 years from now, and happen to be googling my blog for how to convert a PDF to a JPG/JPEG/anything really.
Let’s start off where this ordeal started… with this simple line of code:
convert source.pdf output.jpg
Jam that into a php exec statement, and you got yourself some basic PDF to jpg conversion going on… However, there will be some issues. The first one I ran into was quality (okay, so honestly the first one was the mistake of trying to use Imagick() object in php and assuming it had all the power easily accessible that existing in the command line version. It doesn’t as far as I can tell). Quality was easy enough to fix. The setting that made the difference for me was density:
convert -density 300 source.pdf output.jpg
That worked great… Until the client uploaded a PDF they had cropped in Adobe Acrobat. Strangely, when being converted using imagemagick, it was still showing white where the client had cropped. Rather than explain to the client how PDF’s have a trimbox, cropbox, bleedbox, and artbox that can all be “cropped”, I decided the best course of action was to modify imagemagick to us the cropbox instead of the trimbox:
convert -define pdf:use-cropbox=true -density 300 source.pdf output.jpg
And, again, the people rejoiced… Until the client manged to find a way to really stump me. They uploaded a pdf that contained a particular shade of green. This green went from being a nice, tree like green, to a insanely bright neon green when converted from a PDF to a JPG. I knew this was most likely a color profile issue… which in the past has always proved to be a problem for me.
Color profiles have this fun way of not always behaving the way you want them to. I tried various things to get the color profiles to behave consistently upon conversion, but no matter what I did, nothing seemed to work in all cases of conversion. So got a little more creative… I decided to try some inbetween conversions… I ended up finding something that worked by sending it through a post script (PS) file. Here’s the final 2 lines I’m now using:
pdftops -paper letter -expand source.pdf inbetween.ps
convert -density 300 inbetween.ps output.jpg
pdftops handles the cropbox on it’s own, so no need for those flags anymore. I’m not sure the density line is needed, but I left it in anyway.
Comments on this entry are closed.
First off, hurrah, a new post!
ImageMagick sometimes feels like ffmpeg… you’re pretty sure it will do the thing, but man it takes a while to put together an effective incantation.
FWIW, my current favorite use of ffmpeg is this: ffmpeg -i input.mp4 -c:a copy -vn -sn output.m4a
I use that one on songs I have downloaded from YouTube: it essentially demuxes out the audio track… so no re-encode. Very fast and no loss in quality!
And I totally agree about Imagick… the worst documentation I have ever seen. Having an image object to manipulate is nice, but it is down the rabbit hole if you want to do something they haven’t thought of.