Nástroje používateľa

Nástoje správy stránok


blog:odborny:2025-05-07-command-line_tools_for_pdf_processing

Toto je staršia verzia dokumentu!


Command-line tools for PDF processing

A collection of command line solutions for different PDFmanipulation usecases, such as:

  • PDF splitting (“explode” one multipage PDF into set of singlepage PDFs)
  • cropping PDF pages

TODO

General: Overview of PDF-processing and manipulation tools

MuPDF (mutool)

PDFtk server (pdftk)

Coherent PDF (cpdf)

pdfcpu

Other (non-tested) tools

Case #000: Minimize PDF size

Example usecase: Obvious.

cpdf

cpdf -squeeze:

cpdf -squeeze "src.pdf" [-squeeze-no-recompress] -o "dst.pdf"

pdfcpu

pdfcpu optimize:

pdfcpu optimize "src.pdf" "dst.pdf"

Case #001: Split each page of PDF into several pages (posterisation)

Example usecase: you have (scanned) pages where each PDF page contains two physical pages, and want to crop those into two.

MuPDF (mutool)

mutool poster:

mutool poster -x 2 "src.pdf" "dst.pdf"

Coherent PDF (cpdf)

cpdf -chop:

cpdf -chop "1 2" "src.pdf" -o "dst.pdf"

Indirect way: Duplicate each page, then crop out left or right half on odd and even pages

pdftk shuffle && cpdf -mediabox:

pdftk A=src.pdf shuffle A A output dst.pdf
cpdf -mediabox "0mm 0mm a5landscape" "src.pdf" odd -o "srcOdd.pdf"
cpdf -mediabox "148.5mm 0mm a5landscape" "srcOdd.pdf" even -o "dst.pdf"

The second step might also be done in Adobe Acrobat “Crop pages” function.

Case #002: Crop pages of PDF

See MuPDF documentation on different PDF page boxes (media|crop|art|trim|bleed]box).

Mediabox is a “physical” size of the page, while other boxes are in a way only “virtual”: they specify which content should be visible at what point and in which cases, but they do not alter real PDF page size.

Coherent PDF (cpdf)

cpdf -mediabox (/cropbox/artbox/trimbox/bleedbox):

cpdf -mediabox "0mm 0mm a4portrait" "srcA3.pdf" -o "dstA4.pdf"

Note that adjusting page sizes by cropping mediabox is (to my knowledge) the only way to do it without altering the PDF content in any way (as explained here in Coherent PDF manual). Cropping other boxes might lead to PDF structure being changed.

Adobe Acrobat

Note that Acrobat won't let you crop MediaBox, only other boxes (duh!).

  1. Go to “Edit PDF” and then “Crop pages” function.

Case #003: Remove cropped content from PDF

Example usecase: You have cropped some pages but you want to actually remove the content, since otherwise it is only hidden but remains in PDF – this can be seen when you inspect the PDF in Adobe Acrobat via “Edit PDF” and zoom out the page – the cropped content will be selectable, although not visible, since it is out of the page margins.

According to author, it seems that cpdf is not going to support this feature.

MuPDF (mutool)

mutool trim:

mutool trim -o "dst.pdf" -b cropbox "src.pdf"

Adobe Acrobat

  1. Download custom user script CropBoxFix
  2. Import it into Acrobat Preflight (by doubleclicking)
  3. Run preflight and the script

Comments

blog/odborny/2025-05-07-command-line_tools_for_pdf_processing.1746699722.txt.gz · Posledná úprava: 2025/05/08 12:22 od Róbert Toth