A collection of command line solutions for different PDF‑manipulation use‑cases, such as:
Due to the nature of the topic, this post is (and probably will remain) a work‑in‑progress.
sudo port install mupdf
Example use‑case: Obvious.
Conclusion: Always optimize PDF in Acrobat PDF Optimizer, then squeeze it with cpdf squeeze!
cpdf -squeeze "in.pdf" [-squeeze-no-recompress] -o "out.pdf"
pdfcpu optimize "in.pdf" "out.pdf"
mutool clean -gggg -l -d -z -s "in.pdf" "out.pdf"
mutool clean has many options and it takes some experimentation to see what actually shrinks the PDF size:
‑gggg (Garbage collect unused objects / compact xref table / merge duplicate objects / check streams for duplication): first three g's do not affect the file size for me, and the fourth makes it sometimes a bit larger and sometimes a bit smaller. I usually use the whole ‑gggg parameter.‑l (Linearize PDF): this makes PDF ready for “Fast web view”, but makes it slightly larger. Note that MuPDF 1.26.0 removes linearisation support, so it does not really makes sense to use it.‑d (Decompress streams): this decompresses the whole file, which makes it larger – but when combined with ‑z (Deflate uncompressed streams), it allows better PDF compression‑z (Deflate uncompressed streams): this is the most important switch, and when combined with ‑d (Decompress streams) it lowers PDF size even more‑f (Compress font streams): no effect for me‑i (Compress image streams): no effect for me‑c (Clean content streams): no effect for me‑s (Sanitize content streams): this actually lowers file size for me‑AA (Recreate appearance streams for annotations): no effect for me (and not sure what it really does – at least while there are no annotations, it does not affect PDF file size at all)pdftk compress is not suitable for the task of really shrinking the PDF size to minimum, as the developer himself claims:
pdftk "in.pdf" output "out.pdf" compress
N/A (not tested yet
)
Acrobat PDF Optimizer has tons of options. After thorough testing, the best settings seems usually to be these:
Example use‑case: you have (scanned) pages where each PDF page contains two physical pages, and want to crop those into two.
Conclusion: Use cpdf ‑chop. MuPDF has some problems and combining pdftk with manual cropping in Adobe Acrobat is tedious.
mutool poster -x 2 "in.pdf" "out.pdf"
Note that mutool poster sometimes sets both mediabox and cropbox – the latter seems unnecessary. More importantly, mutool poster causes problems when mutool trim is used afterwards – for some reason, it leaves the PDF completely empty. This does not happen when mutool trim is used after cpdf ‑chop.
cpdf -chop "2 1" "in.pdf" -o "out.pdf"
Unlike MuPDF, cpdf ‑chop only sets mediabox, which is enough.
Moreover, cpdf can also remove page labels, if needed:
cpdf -chop "2 1" "in.pdf" AND -remove-page-labels -o "dstNoLabels.pdf"
pdftk shuffle && cpdf -mediabox:
pdftk A=in.pdf shuffle A A output out.pdf cpdf -mediabox "0mm 0mm a5landscape" "in.pdf" odd -o "srcOdd.pdf" cpdf -mediabox "148.5mm 0mm a5landscape" "srcOdd.pdf" even -o "out.pdf"
The second step might also be done in Adobe Acrobat “Crop pages” function.
N/A (not tested yet
)
See MuPDF documentation on different PDF page boxes (media|crop|art|trim|bleed]box).
Mediabox is a “physical” size of the page, while other boxes are in a way only “virtual”: they specify which content should be visible at what point and in which cases, but they do not alter real PDF page size.
cpdf -mediabox (/cropbox/artbox/trimbox/bleedbox):
cpdf -mediabox "0mm 0mm a4portrait" "srcA3.pdf" -o "dstA4.pdf"
Note that adjusting page sizes by cropping mediabox is (to my knowledge) the only way to do it without altering the PDF content in any way (as explained here in Coherent PDF manual). Cropping other boxes might lead to PDF structure being changed.
Note that Acrobat won't let you crop MediaBox, only other boxes (duh!).
N/A (not tested yet
)
Example use‑case: You have cropped some pages but you want to actually remove the content, since otherwise it is only hidden but remains in PDF – this can be seen when you inspect the PDF in Adobe Acrobat via “Edit PDF” and zoom out the page – the cropped content will be selectable, although not visible, since it is out of the page margins.
According to its author, it seems that cpdf is not going to support this feature.
mutool trim -o "out.pdf" -b cropbox "in.pdf"
Warning: In some PDFs, I have experienced mutool trim to also cut out some contents which was actually visible on‑page (that is, inside mediabox & cropbox). I don't know whether it was because the PDF itself was malformed, or whether it is a problem with mutool – but just in case, better avoid and use Acrobat below.
N/A (not tested yet
)
Preview app (or any PDFkit‑based PDF viewer, such as Skim) cannot search/copy text from PDFs generated by Calibri. This is because the PDF created in such way uses CID Fonts, something which Preview app based on Apple's PDFKit does not support.
This leads to a situation where such PDF files can be properly shown and visually read, but trying to select (and copy) or search the text inside the PDF fails. See these sources for reference:
This can be fixed by converting/running the PDF file through the following GhostScript command:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH \ -dPrinted=false -dQUIET -sOutputFile="out.pdf" "in.pdf"
This was mentioned here.
GhostScript can be installed e.g. by MacPorts:
sudo port install ghostscript
Example use‑case: There are some cases when you need to see or edit the actual text contents of the PDF. For example, there are some metadata at the level of individual pages (like ) which no existing program will actually clean (Acrobat “Find hidden information” lists them, but keeps them there when you select to remove them).
So directly editing text contents of the PDF might come in handy.
Conclusion:
cpdf -decompress [-no-preserve-objstm] "in.pdf" -o "out.pdf"
As mentioned by manual, -no-preserve-objstm will remove data from separate object streams and put them back into normal flow of PDF, which should make the PDF easier for direct editing.
Warning: After processing a PDF using this command (with or without the ‑no‑preserve‑objstm switch), I have not been able to use Adobe Acrobat's “Optimize” or “Compare documents” function on it ever again – no matter how much tinkering, document‑processing, transforming and cleaning I did on the PDF. So I consider this method to be unreliable if you want to maintain Adobe Acrobat compatibility (which I always do).
mutool clean -d "in.pdf" "out.pdf"
mutool clean is specifically stated by developer as designated to “make a PDF file human editable” and to “expand compressed streams”, so this is the command to go.
There is one additional switch which deals with PDF decompression: ‑a, which ASCII‑hex‑encodes binary streams. This safely encodes binary streams so that there should be no problems when editing the PDF in text editor. However, this forces almost all streams in PDF to be encoded, which makes the whole PDF human‑unreadable, so it is not really helpful.
Of all tested tools, mutool clean is the only one which maintains original object reference numbers (e.g. /Metadata 22 0 R referencing object #22).
Warning: After processing a PDF using this command (with or without the ‑a switch), I have not been able to use Adobe Acrobat's “Optimize” or “Compare documents” function on it ever again – no matter how much tinkering, document‑processing, transforming and cleaning I did on the PDF. So I consider this method to be unreliable if you want to maintain Adobe Acrobat compatibility (which I always do).
pdfcpu does not seem to support decompressing PDFs.
pdftk "in.pdf" output "out.pdf" uncompress
PDFtk separates individual PDF dictionary elements by newlines (0A). E.g. a page definition looks like this:
<< /pdftk_PageNum 7 /Metadata 23 0 R /Rotate 0 /Resources 24 0 R /Type /Page /Parent 25 0 R /Contents 26 0 R /MediaBox [0 0 370.158 591.26] /CropBox [0 0 370.158 591.26] >>
It also adds its own elements (e.g. /pdftk_PageNum).
qpdf --stream-data=uncompress:
qpdf "in.pdf" --stream-data=uncompress "out_qpdf.pdf"
This will effectively equivalent to using both --compress-streams=n and --decode-level=generalized.
There is also another switch, --qdf, which
QPDF currently does not support maintaining the original object ID.