blog:odborny:2025-05-07-command-line_tools_for_pdf_processing
Rozdiely
Tu môžete vidieť rozdiely medzi vybranou verziou a aktuálnou verziou danej stránky.
| Obojstranná predošlá revíziaPredchádzajúca revíziaNasledujúca revízia | Predchádzajúca revízia | ||
| blog:odborny:2025-05-07-command-line_tools_for_pdf_processing [2026/01/19 11:39] – Róbert Toth | blog:odborny:2025-05-07-command-line_tools_for_pdf_processing [2026/04/30 10:56] (aktuálne) – [1. Minimize PDF size] Róbert Toth | ||
|---|---|---|---|
| Riadok 6: | Riadok 6: | ||
| * … | * … | ||
| - | TODO | + | Due to the nature of the topic, this post is (and probably will remain) a work-in-progress. |
| ===== General: Overview of PDF-processing and manipulation tools ===== | ===== General: Overview of PDF-processing and manipulation tools ===== | ||
| ==== Coherent PDF (cpdf) ==== | ==== Coherent PDF (cpdf) ==== | ||
| - | * **Download: | + | * **Download: |
| * **Changelog: | * **Changelog: | ||
| * **Manual: | * **Manual: | ||
| Riadok 24: | Riadok 25: | ||
| ==== pdfcpu ==== | ==== pdfcpu ==== | ||
| * **Download: | * **Download: | ||
| - | * **Changelog: | + | * **Changelog: |
| * **Manual: | * **Manual: | ||
| Riadok 31: | Riadok 32: | ||
| * **Changelog: | * **Changelog: | ||
| * **Manual: | * **Manual: | ||
| + | |||
| + | ==== QPDF ==== | ||
| + | * **Download: | ||
| + | * **Changelog: | ||
| + | * **Manual: | ||
| < | < | ||
| Riadok 41: | Riadok 47: | ||
| </ | </ | ||
| - | ==== Other (non-tested) tools ==== | + | ==== Other (non-/not-yet-tested) tools ==== |
| - | * **QPDF: | + | TODO |
| - | * … | + | |
| < | < | ||
| <!-- | <!-- | ||
| Riadok 50: | Riadok 55: | ||
| </ | </ | ||
| - | ===== Minimize PDF size ===== | + | |
| + | ===== - Minimize PDF size ===== | ||
| **Example use-case:** Obvious. | **Example use-case:** Obvious. | ||
| Riadok 94: | Riadok 100: | ||
| < | < | ||
| pdftk " | pdftk " | ||
| + | </ | ||
| + | |||
| + | ==== QPDF ==== | ||
| + | [[https:// | ||
| + | * **'' | ||
| + | * **'' | ||
| + | * **'' | ||
| + | * **'' | ||
| + | * **'' | ||
| + | So the resulting command is: | ||
| + | < | ||
| + | qpdf " | ||
| </ | </ | ||
| Riadok 101: | Riadok 119: | ||
| ; <color blue> | ; <color blue> | ||
| ; <color blue> | ; <color blue> | ||
| - | ; <color blue> | + | ; <color blue> |
| ; <color blue> | ; <color blue> | ||
| ; <color blue> | ; <color blue> | ||
| - | ===== Split each page of PDF into several pages (posterisation) ===== | + | |
| + | ===== - Split each page of PDF into several pages (posterisation) ===== | ||
| **Example use-case:** you have (scanned) pages where each PDF page contains two physical pages, and want to crop those into two. | **Example use-case:** you have (scanned) pages where each PDF page contains two physical pages, and want to crop those into two. | ||
| Riadok 139: | Riadok 158: | ||
| The second step might also be done in Adobe Acrobat "Crop pages" function. | The second step might also be done in Adobe Acrobat "Crop pages" function. | ||
| + | ==== QPDF ==== | ||
| + | N/A (not tested yet TODO) | ||
| - | ===== Crop pages of PDF ===== | + | |
| + | ===== - Crop pages of PDF ===== | ||
| See MuPDF documentation on [[https:// | See MuPDF documentation on [[https:// | ||
| Riadok 155: | Riadok 177: | ||
| Note that Acrobat won't let you crop MediaBox, only other boxes (duh!). | Note that Acrobat won't let you crop MediaBox, only other boxes (duh!). | ||
| - Go to "Edit PDF" and then "Crop pages" function. | - Go to "Edit PDF" and then "Crop pages" function. | ||
| + | |||
| + | ==== QPDF ==== | ||
| + | N/A (not tested yet TODO) | ||
| - | ===== Remove cropped content from PDF ===== | + | ===== - Remove cropped content from PDF ===== |
| **Example use-case:** You have cropped some pages but you want to actually remove the content, since otherwise it is only hidden but remains in PDF – this can be seen when you inspect the PDF in Adobe Acrobat via "Edit PDF" and zoom out the page – the cropped content will be selectable, although not visible, since it is out of the page margins. | **Example use-case:** You have cropped some pages but you want to actually remove the content, since otherwise it is only hidden but remains in PDF – this can be seen when you inspect the PDF in Adobe Acrobat via "Edit PDF" and zoom out the page – the cropped content will be selectable, although not visible, since it is out of the page margins. | ||
| Riadok 175: | Riadok 200: | ||
| - Run preflight and the script | - Run preflight and the script | ||
| + | ==== QPDF ==== | ||
| + | N/A (not tested yet TODO) | ||
| - | ===== Text in Calibri-created PDF files cannot be copied/ | + | |
| + | ===== - Text in Calibri-created PDF files cannot be copied/ | ||
| Preview app (or any PDFkit-based PDF viewer, such as Skim) cannot search/copy text from PDFs generated by Calibri. This is because the PDF created in such way uses CID Fonts, something which Preview app based on Apple' | Preview app (or any PDFkit-based PDF viewer, such as Skim) cannot search/copy text from PDFs generated by Calibri. This is because the PDF created in such way uses CID Fonts, something which Preview app based on Apple' | ||
| Riadok 197: | Riadok 225: | ||
| - | ===== Decompress the whole PDF for editing in text editor ===== | + | ===== - Decompress the whole PDF for editing in text editor ===== |
| **Example use-case:** There are some cases when you need to see or edit the actual text contents of the PDF. For example, there are some metadata at the level of individual pages (like ) which no existing program will actually clean (Acrobat "Find hidden information" | **Example use-case:** There are some cases when you need to see or edit the actual text contents of the PDF. For example, there are some metadata at the level of individual pages (like ) which no existing program will actually clean (Acrobat "Find hidden information" | ||
| Riadok 211: | Riadok 239: | ||
| </ | </ | ||
| As mentioned by manual, '' | As mentioned by manual, '' | ||
| + | |||
| + | <color red> | ||
| ==== MuPDF (mutool) ==== | ==== MuPDF (mutool) ==== | ||
| [[https:// | [[https:// | ||
| < | < | ||
| - | mutool clean -d [-a] " | + | mutool clean -d " |
| </ | </ | ||
| - | '' | + | '' |
| - | * **'' | + | |
| - | * **'' | + | There is one additional switch |
| + | |||
| + | Of all tested tools, '' | ||
| + | |||
| + | <color red> | ||
| ==== pdfcpu ==== | ==== pdfcpu ==== | ||
| - | [[https://pdfcpu.io/ | + | pdfcpu |
| - | < | + | |
| - | pdfcpu TODO " | + | |
| - | </ | + | |
| ==== PDFtk server (pdftk) ==== | ==== PDFtk server (pdftk) ==== | ||
| Riadok 232: | Riadok 263: | ||
| pdftk " | pdftk " | ||
| </ | </ | ||
| + | PDFtk separates individual PDF dictionary elements by newlines ('' | ||
| + | < | ||
| + | << | ||
| + | / | ||
| + | /Metadata 23 0 R | ||
| + | /Rotate 0 | ||
| + | /Resources 24 0 R | ||
| + | /Type /Page | ||
| + | /Parent 25 0 R | ||
| + | /Contents 26 0 R | ||
| + | /MediaBox [0 0 370.158 591.26] | ||
| + | /CropBox [0 0 370.158 591.26] | ||
| + | >> | ||
| + | </ | ||
| + | It also adds its own elements (e.g. ''/ | ||
| + | ==== QPDF ==== | ||
| + | [[https:// | ||
| + | < | ||
| + | qpdf " | ||
| + | </ | ||
| + | This will effectively equivalent to using both [[https:// | ||
| + | |||
| + | There is also another switch, '' | ||
| + | |||
| + | QPDF currently [[https:// | ||
| + | |||
| + | |||
| + | ===== New cases to come… ===== | ||
| < | < | ||
| + | <!-- ——————————————————————————————————————————————————————————————————————————————————————————————— | ||
| + | ——————————————————————————————————————————————————————————————————————————————————————————————— | ||
| <!-- New Case Template BEGIN | <!-- New Case Template BEGIN | ||
| - | ===== New Case TODO ===== | + | ===== - New Case TODO ===== |
| **Example use-case:** TODO. | **Example use-case:** TODO. | ||
| Riadok 264: | Riadok 325: | ||
| < | < | ||
| pdftk " | pdftk " | ||
| + | </ | ||
| + | |||
| + | ==== QPDF ==== | ||
| + | [[https:// | ||
| + | < | ||
| + | qpdf " | ||
| </ | </ | ||
| -->< | -->< | ||
blog/odborny/2025-05-07-command-line_tools_for_pdf_processing.1768819146.txt.gz · Posledná úprava: 2026/01/19 11:39 od Róbert Toth
