Nástroje používateľa

Nástoje správy stránok


blog:odborny:2025-05-07-command-line_tools_for_pdf_processing

Toto je staršia verzia dokumentu!


Command-line tools for PDF processing

A collection of command line solutions for different PDFmanipulation usecases, such as:

  • PDF splitting (“explode” one multipage PDF into set of singlepage PDFs)
  • cropping PDF pages

TODO

General: Overview of PDF-processing and manipulation tools

MuPDF (mutool)

PDFtk server (pdftk)

Coherent PDF (cpdf)

pdfcpu

Other (non-tested) tools

Case #000: Minimize PDF size

Example usecase: Obvious.

cpdf

cpdf -squeeze:

cpdf -squeeze "src.pdf" [-squeeze-no-recompress] -o "dst.pdf"

pdfcpu

pdfcpu optimize:

pdfcpu optimize "src.pdf" "dst.pdf"

MuPDF (mutool)

mutool clean:

mutool clean -gggg -l -d -z -s "src.pdf" "dst.pdf"

mutool clean has many options and it takes some experimentation to see what actually shrinks the PDF size:

  • gggg (Garbage collect unused objects / compact xref table / merge duplicate objects / check streams for duplication): first three g's do not affect the file size for me, and the fourth makes it sometimes a bit larger and sometimes a bit smaller. I usually use the whole gggg parameter.
  • l (Linearize PDF): this makes PDF ready for “Fast web view”, but makes it slightly larger. Note that MuPDF 1.26.0 removes linearisation support, so it does not really makes sense to use it.
  • d (Decompress streams): this decompresses the whole file, which makes it larger – but when combined with z (Deflate uncompressed streams), it allows better PDF compression
  • z (Deflate uncompressed streams): this is the most important switch, and when combined with d (Decompress streams) it lowers PDF size even more
  • f (Compress font streams): no effect for me
  • i (Compress image streams): no effect for me
  • c (Clean content streams): no effect for me
  • s (Sanitize content streams): this actually lowers file size for me
  • AA (Recreate appearance streams for annotations): no effect for me (and not sure what it really does – at least while there are no annotations, it does not affect PDF file size at all)

Case #001: Split each page of PDF into several pages (posterisation)

Example usecase: you have (scanned) pages where each PDF page contains two physical pages, and want to crop those into two.

MuPDF (mutool)

mutool poster:

mutool poster -x 2 "src.pdf" "dst.pdf"

Coherent PDF (cpdf)

cpdf -chop:

cpdf -chop "1 2" "src.pdf" -o "dst.pdf"

Indirect way: Duplicate each page, then crop out left or right half on odd and even pages

pdftk shuffle && cpdf -mediabox:

pdftk A=src.pdf shuffle A A output dst.pdf
cpdf -mediabox "0mm 0mm a5landscape" "src.pdf" odd -o "srcOdd.pdf"
cpdf -mediabox "148.5mm 0mm a5landscape" "srcOdd.pdf" even -o "dst.pdf"

The second step might also be done in Adobe Acrobat “Crop pages” function.

Case #002: Crop pages of PDF

See MuPDF documentation on different PDF page boxes (media|crop|art|trim|bleed]box).

Mediabox is a “physical” size of the page, while other boxes are in a way only “virtual”: they specify which content should be visible at what point and in which cases, but they do not alter real PDF page size.

Coherent PDF (cpdf)

cpdf -mediabox (/cropbox/artbox/trimbox/bleedbox):

cpdf -mediabox "0mm 0mm a4portrait" "srcA3.pdf" -o "dstA4.pdf"

Note that adjusting page sizes by cropping mediabox is (to my knowledge) the only way to do it without altering the PDF content in any way (as explained here in Coherent PDF manual). Cropping other boxes might lead to PDF structure being changed.

Adobe Acrobat

Note that Acrobat won't let you crop MediaBox, only other boxes (duh!).

  1. Go to “Edit PDF” and then “Crop pages” function.

Case #003: Remove cropped content from PDF

Example usecase: You have cropped some pages but you want to actually remove the content, since otherwise it is only hidden but remains in PDF – this can be seen when you inspect the PDF in Adobe Acrobat via “Edit PDF” and zoom out the page – the cropped content will be selectable, although not visible, since it is out of the page margins.

According to author, it seems that cpdf is not going to support this feature.

MuPDF (mutool)

mutool trim:

mutool trim -o "dst.pdf" -b cropbox "src.pdf"

Adobe Acrobat

  1. Download custom user script CropBoxFix
  2. Import it into Acrobat Preflight (by doubleclicking)
  3. Run preflight and the script

Comments

blog/odborny/2025-05-07-command-line_tools_for_pdf_processing.1746708233.txt.gz · Posledná úprava: 2025/05/08 14:43 od Róbert Toth