Nástroje používateľa

Nástoje správy stránok


blog:odborny:2025-05-07-command-line_tools_for_pdf_processing

Rozdiely

Tu môžete vidieť rozdiely medzi vybranou verziou a aktuálnou verziou danej stránky.

Odkaz na tento prehľad zmien

Obojstranná predošlá revíziaPredchádzajúca revízia
blog:odborny:2025-05-07-command-line_tools_for_pdf_processing [2026/01/19 18:10] Róbert Tothblog:odborny:2025-05-07-command-line_tools_for_pdf_processing [2026/01/20 17:42] (aktuálne) – [6. Decompress the whole PDF for editing in text editor] Róbert Toth
Riadok 230: Riadok 230:
 </code> </code>
 As mentioned by manual, ''[[https://www.coherentpdf.com/cpdfmanual/cpdfmanualch1.html#x5-360001.12|-no-preserve-objstm]]'' will remove data from separate object streams and put them back into normal flow of PDF, which should make the PDF easier for direct editing. As mentioned by manual, ''[[https://www.coherentpdf.com/cpdfmanual/cpdfmanualch1.html#x5-360001.12|-no-preserve-objstm]]'' will remove data from separate object streams and put them back into normal flow of PDF, which should make the PDF easier for direct editing.
 +
 +<color red>**Warning:**</color> After processing a PDF using this command (with or without the ''-no-preserve-objstm'' switch), I have not been able to use Adobe Acrobat's "Optimize" or "Compare documents" function on it ever again – no matter how much tinkering, document-processing, transforming and cleaning I did on the PDF. So I consider this method to be unreliable if you want to maintain Adobe Acrobat compatibility (which I always do).
  
 ==== MuPDF (mutool) ==== ==== MuPDF (mutool) ====
Riadok 239: Riadok 241:
  
 There is one additional switch which deals with PDF decompression: **''-a''**, which ASCII-hex-encodes binary streams. This safely encodes binary streams so that there should be no problems when editing the PDF in text editor. However, this forces almost //**all**// streams in PDF to be encoded, which makes the whole PDF human-unreadable, so it is not really helpful. There is one additional switch which deals with PDF decompression: **''-a''**, which ASCII-hex-encodes binary streams. This safely encodes binary streams so that there should be no problems when editing the PDF in text editor. However, this forces almost //**all**// streams in PDF to be encoded, which makes the whole PDF human-unreadable, so it is not really helpful.
 +
 +Of all tested tools, ''mutool clean'' is the only one which maintains original object reference numbers (e.g. ''/Metadata 22 0 R'' referencing object #22).
 +
 +<color red>**Warning:**</color> After processing a PDF using this command (with or without the ''-a'' switch), I have not been able to use Adobe Acrobat's "Optimize" or "Compare documents" function on it ever again – no matter how much tinkering, document-processing, transforming and cleaning I did on the PDF. So I consider this method to be unreliable if you want to maintain Adobe Acrobat compatibility (which I always do).
  
 ==== pdfcpu ==== ==== pdfcpu ====
Riadok 248: Riadok 254:
 pdftk "in.pdf" output "out.pdf" uncompress pdftk "in.pdf" output "out.pdf" uncompress
 </code> </code>
 +PDFtk separates individual PDF dictionary elements by newlines (''0A''). E.g. a page definition looks like this:
 +<code>
 +<<
 +/pdftk_PageNum 7
 +/Metadata 23 0 R
 +/Rotate 0
 +/Resources 24 0 R
 +/Type /Page
 +/Parent 25 0 R
 +/Contents 26 0 R
 +/MediaBox [0 0 370.158 591.26]
 +/CropBox [0 0 370.158 591.26]
 +>>
 +</code>
 +It also adds its own elements (e.g. ''/pdftk_PageNum'').
 +
  
 ==== QPDF ==== ==== QPDF ====
Riadok 254: Riadok 276:
 qpdf "in.pdf" --stream-data=uncompress "out_qpdf.pdf" qpdf "in.pdf" --stream-data=uncompress "out_qpdf.pdf"
 </code> </code>
 +This will effectively equivalent to using both [[https://qpdf.readthedocs.io/en/stable/cli.html#option-compress-streams|--compress-streams=n]] and [[https://qpdf.readthedocs.io/en/stable/cli.html#option-decode-level|--decode-level=generalized]].
 +
 +There is also another switch, ''[[https://qpdf.readthedocs.io/en/stable/cli.html#option-qdf|--qdf]]'', which TODO
 +
 +QPDF currently [[https://github.com/qpdf/qpdf/issues/339|does not support]] maintaining the original object ID.
 +
  
 +===== New cases to come… =====
 <html> <html>
 +<!-- ———————————————————————————————————————————————————————————————————————————————————————————————
 +———————————————————————————————————————————————————————————————————————————————————————————————  -->
 <!-- New Case Template BEGIN <!-- New Case Template BEGIN
 ===== - New Case TODO ===== ===== - New Case TODO =====
blog/odborny/2025-05-07-command-line_tools_for_pdf_processing.txt · Posledná úprava: 2026/01/20 17:42 od Róbert Toth