How To Convert A PDF File To Markdown (With Images) In Linux

If you work with technical docs, wikis, or static site generators, you’ve probably run into this:
You have a PDF, but you really need it as Markdown — with images intact.

Good news: Linux has everything you need to make it happen.

Below is a step-by-step guide that works for both text-heavy PDFs and PDFs packed with diagrams, charts, and screenshots.

Step 1 – Install the Required Tools

We’ll use two main tools:

pdftohtml (from poppler-utils) – Extracts text and images from PDFs.
pandoc – Converts between document formats, including HTML to Markdown.

Install them with:

sudo apt update && sudo apt install poppler-utils pandoc

For Fedora/RHEL-based systems:

sudo dnf install poppler-utils pandoc

Step 2 – Convert PDF to HTML (Preserving Images)

First, turn your PDF into HTML while keeping the images:

pdftohtml -c -noframes -p myfile.pdf output.html

Flags explained:

-c → Keeps the layout as close as possible.
-noframes → Avoids splitting content into multiple HTML frames.
-p → Retains original images.

This will give you:

output.html → Your PDF in HTML format.
output_images/ (or similar) → A folder containing extracted images.

Step 3 – Convert HTML to Markdown

Now that you have HTML, use Pandoc to convert it to Markdown:

pandoc output.html -f html -t markdown -o myfile.md

This will create myfile.md with Markdown syntax.
If the HTML referenced images, the Markdown will contain image links to the extracted image files.

Step 4 – Organize Images for Your Markdown

Make sure the image folder is in the same directory as your Markdown file.
Pandoc will keep the relative paths, so if your Markdown says:

![Figure 1](output_images/image1.png)

…then output_images/image1.png should be next to your .md file.

If you plan to upload this to a site or Git repo, keep the images folder alongside the Markdown.

Step 5 – Clean Up the Markdown (Optional)

PDF → HTML → Markdown isn’t always perfect. You might see:

Extra line breaks
Odd spacing
Overly long lines

To tidy up, you can run:

pandoc myfile.md -f markdown -t markdown --wrap=preserve -o myfile_clean.md

Or open it in your favorite Markdown editor (like Typora, Obsidian, or VS Code) and do some manual cleanup.

Bonus – One-Liner Command

If you want to chain everything into one line:

pdftohtml -c -noframes -p myfile.pdf temp.html && pandoc temp.html -f html -t markdown -o myfile.md

Images will still be saved in the generated image folder from pdftohtml.

Final Thoughts

Converting PDFs to Markdown on Linux is straightforward once you know the toolchain.
By using pdftohtml to preserve images and pandoc to do the format conversion, you get a clean Markdown file and all your images neatly saved for reuse.

If you need perfect formatting for publishing, expect to do a little cleanup — but this method will save you hours compared to manually copying and pasting.

Mark Vincent

Tech enthusiast and content creator passionate about making technology simple for everyone. I share practical tips, guides, and reviews on the latest in computers, software, and gadgets. Let’s explore the digital world together!

Step 1 – Install the Required Tools

Step 2 – Convert PDF to HTML (Preserving Images)

Step 3 – Convert HTML to Markdown

Step 4 – Organize Images for Your Markdown

Step 5 – Clean Up the Markdown (Optional)

Bonus – One-Liner Command

Final Thoughts

More posts

How to Convert an IMG File to ISO File in Linux

How To Fix Broken Flatpak Issue In Ubuntu 25.10 Questing Quokka

How to Upgrade Ubuntu 25.04 to Ubuntu 25.10 Without Losing Data

How to import PDF data in Microsoft Excel without mangling it