Your cart is currently empty!
How To Convert A PDF File To Markdown (With Images) In Linux
If you work with technical docs, wikis, or static site generators, you’ve probably run into this:
You have a PDF, but you really need it as Markdown — with images intact.
Good news: Linux has everything you need to make it happen.
Below is a step-by-step guide that works for both text-heavy PDFs and PDFs packed with diagrams, charts, and screenshots.
Step 1 – Install the Required Tools
We’ll use two main tools:
pdftohtml(frompoppler-utils) – Extracts text and images from PDFs.pandoc– Converts between document formats, including HTML to Markdown.
Install them with:
sudo apt update && sudo apt install poppler-utils pandocFor Fedora/RHEL-based systems:
sudo dnf install poppler-utils pandocStep 2 – Convert PDF to HTML (Preserving Images)
First, turn your PDF into HTML while keeping the images:
pdftohtml -c -noframes -p myfile.pdf output.htmlFlags explained:
-c→ Keeps the layout as close as possible.-noframes→ Avoids splitting content into multiple HTML frames.-p→ Retains original images.
This will give you:
output.html→ Your PDF in HTML format.output_images/(or similar) → A folder containing extracted images.
Step 3 – Convert HTML to Markdown
Now that you have HTML, use Pandoc to convert it to Markdown:
pandoc output.html -f html -t markdown -o myfile.mdThis will create myfile.md with Markdown syntax.
If the HTML referenced images, the Markdown will contain image links to the extracted image files.
Step 4 – Organize Images for Your Markdown
Make sure the image folder is in the same directory as your Markdown file.
Pandoc will keep the relative paths, so if your Markdown says:
…then output_images/image1.png should be next to your .md file.
If you plan to upload this to a site or Git repo, keep the images folder alongside the Markdown.
Step 5 – Clean Up the Markdown (Optional)
PDF → HTML → Markdown isn’t always perfect. You might see:
- Extra line breaks
- Odd spacing
- Overly long lines
To tidy up, you can run:
pandoc myfile.md -f markdown -t markdown --wrap=preserve -o myfile_clean.mdOr open it in your favorite Markdown editor (like Typora, Obsidian, or VS Code) and do some manual cleanup.
Bonus – One-Liner Command
If you want to chain everything into one line:
pdftohtml -c -noframes -p myfile.pdf temp.html && pandoc temp.html -f html -t markdown -o myfile.mdImages will still be saved in the generated image folder from pdftohtml.
Final Thoughts
Converting PDFs to Markdown on Linux is straightforward once you know the toolchain.
By using pdftohtml to preserve images and pandoc to do the format conversion, you get a clean Markdown file and all your images neatly saved for reuse.
If you need perfect formatting for publishing, expect to do a little cleanup — but this method will save you hours compared to manually copying and pasting.

Tech enthusiast and content creator passionate about making technology simple for everyone. I share practical tips, guides, and reviews on the latest in computers, software, and gadgets. Let’s explore the digital world together!
