Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (50.2k points)
How can I obtain all images from a pdf document, at native resolution and format? Basically, I want to extract tiff as tiff, jpeg as jpeg, etc., and without doing the resampling. The layout is not important to me.

1 Answer

0 votes
by (107k points)

In that case, you can simply use the module PyMuPDF in Python. This module will yield all images as .png files. Kindly refer to the below code:

import fitz

doc = fitz.open("file.pdf")

for i in range(len(doc)):

    for img in doc.getPageImageList(i):

        xref = img[0]

        pix = fitz.Pixmap(doc, xref)

        if pix.n < 5:       # this is GRAY or RGB

            pix.writePNG("p%s-%s.png" % (i, xref))

        else:               # CMYK: convert to RGB first

            pix1 = fitz.Pixmap(fitz.csRGB, pix)

            pix1.writePNG("p%s-%s.png" % (i, xref))

            pix1 = None

        pix = None

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...