Your cart is currently empty!
How To Convert A PDF File To Markdown (With Images) In Linux
If you work with technical docs, wikis, or static site generators, you’ve probably run into this:
You have a PDF, but you really need it as Markdown — with images intact.
Good news: Linux has everything you need to make it happen.
Below is a step-by-step guide that works for both text-heavy PDFs and PDFs packed with diagrams, charts, and screenshots.
Step 1 – Install the Required Tools
We’ll use two main tools:
pdftohtml
(frompoppler-utils
) – Extracts text and images from PDFs.pandoc
– Converts between document formats, including HTML to Markdown.
Install them with:
sudo apt update && sudo apt install poppler-utils pandoc
For Fedora/RHEL-based systems:
sudo dnf install poppler-utils pandoc
Step 2 – Convert PDF to HTML (Preserving Images)
First, turn your PDF into HTML while keeping the images:
pdftohtml -c -noframes -p myfile.pdf output.html
Flags explained:
-c
→ Keeps the layout as close as possible.-noframes
→ Avoids splitting content into multiple HTML frames.-p
→ Retains original images.
This will give you:
output.html
→ Your PDF in HTML format.output_images/
(or similar) → A folder containing extracted images.
Step 3 – Convert HTML to Markdown
Now that you have HTML, use Pandoc to convert it to Markdown:
pandoc output.html -f html -t markdown -o myfile.md
This will create myfile.md
with Markdown syntax.
If the HTML referenced images, the Markdown will contain image links to the extracted image files.
Step 4 – Organize Images for Your Markdown
Make sure the image folder is in the same directory as your Markdown file.
Pandoc will keep the relative paths, so if your Markdown says:

…then output_images/image1.png
should be next to your .md
file.
If you plan to upload this to a site or Git repo, keep the images folder alongside the Markdown.
Step 5 – Clean Up the Markdown (Optional)
PDF → HTML → Markdown isn’t always perfect. You might see:
- Extra line breaks
- Odd spacing
- Overly long lines
To tidy up, you can run:
pandoc myfile.md -f markdown -t markdown --wrap=preserve -o myfile_clean.md
Or open it in your favorite Markdown editor (like Typora, Obsidian, or VS Code) and do some manual cleanup.
Bonus – One-Liner Command
If you want to chain everything into one line:
pdftohtml -c -noframes -p myfile.pdf temp.html && pandoc temp.html -f html -t markdown -o myfile.md
Images will still be saved in the generated image folder from pdftohtml
.
Final Thoughts
Converting PDFs to Markdown on Linux is straightforward once you know the toolchain.
By using pdftohtml
to preserve images and pandoc
to do the format conversion, you get a clean Markdown file and all your images neatly saved for reuse.
If you need perfect formatting for publishing, expect to do a little cleanup — but this method will save you hours compared to manually copying and pasting.
Tech enthusiast and content creator passionate about making technology simple for everyone. I share practical tips, guides, and reviews on the latest in computers, software, and gadgets. Let’s explore the digital world together!