Rozdiely

Tu môžete vidieť rozdiely medzi vybranou verziou a aktuálnou verziou danej stránky.

--- blog:odborny:2025-05-07-command-line_tools_for_pdf_processing [2026/01/19 18:10] – Róbert Toth
+++ blog:odborny:2025-05-07-command-line_tools_for_pdf_processing [2026/01/20 17:42] (aktuálne) – [6. Decompress the whole PDF for editing in text editor] Róbert Toth
@@ Riadok 230: / Riadok 230: @@
 </code>
 As mentioned by manual, ''[[https://www.coherentpdf.com/cpdfmanual/cpdfmanualch1.html#x5-360001.12|-no-preserve-objstm]]'' will remove data from separate object streams and put them back into normal flow of PDF, which should make the PDF easier for direct editing.
+<color red>**Warning:**</color> After processing a PDF using this command (with or without the ''-no-preserve-objstm'' switch), I have not been able to use Adobe Acrobat's "Optimize" or "Compare documents" function on it ever again – no matter how much tinkering, document-processing, transforming and cleaning I did on the PDF. So I consider this method to be unreliable if you want to maintain Adobe Acrobat compatibility (which I always do).
 ==== MuPDF (mutool) ====
@@ Riadok 239: / Riadok 241: @@
 There is one additional switch which deals with PDF decompression: **''-a''**, which ASCII-hex-encodes binary streams. This safely encodes binary streams so that there should be no problems when editing the PDF in text editor. However, this forces almost //**all**// streams in PDF to be encoded, which makes the whole PDF human-unreadable, so it is not really helpful.
+Of all tested tools, ''mutool clean'' is the only one which maintains original object reference numbers (e.g. ''/Metadata 22 0 R'' referencing object #22).
+<color red>**Warning:**</color> After processing a PDF using this command (with or without the ''-a'' switch), I have not been able to use Adobe Acrobat's "Optimize" or "Compare documents" function on it ever again – no matter how much tinkering, document-processing, transforming and cleaning I did on the PDF. So I consider this method to be unreliable if you want to maintain Adobe Acrobat compatibility (which I always do).
 ==== pdfcpu ====
@@ Riadok 248: / Riadok 254: @@
 pdftk "in.pdf" output "out.pdf" uncompress
 </code>
+PDFtk separates individual PDF dictionary elements by newlines (''0A''). E.g. a page definition looks like this:
+<code>
+<<
+/pdftk_PageNum 7
+/Metadata 23 0 R
+/Rotate 0
+/Resources 24 0 R
+/Type /Page
+/Parent 25 0 R
+/Contents 26 0 R
+/MediaBox [0 0 370.158 591.26]
+/CropBox [0 0 370.158 591.26]
+>>
+</code>
+It also adds its own elements (e.g. ''/pdftk_PageNum'').
 ==== QPDF ====
@@ Riadok 254: / Riadok 276: @@
 qpdf "in.pdf" --stream-data=uncompress "out_qpdf.pdf"
 </code>
+This will effectively equivalent to using both [[https://qpdf.readthedocs.io/en/stable/cli.html#option-compress-streams|--compress-streams=n]] and [[https://qpdf.readthedocs.io/en/stable/cli.html#option-decode-level|--decode-level=generalized]].
+There is also another switch, ''[[https://qpdf.readthedocs.io/en/stable/cli.html#option-qdf|--qdf]]'', which TODO
+QPDF currently [[https://github.com/qpdf/qpdf/issues/339|does not support]] maintaining the original object ID.
+===== New cases to come… =====
 <html>
+<!-- ———————————————————————————————————————————————————————————————————————————————————————————————
+———————————————————————————————————————————————————————————————————————————————————————————————  -->
 <!-- New Case Template BEGIN
 ===== - New Case TODO =====