Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (280 points)

Hello, Currently I'm using Python 3.4 and I need suggestion to extract the entire text from a PDF and then using it for text processing

I'm getting suggestions for Python 2.7. But I need suggestions in Python 3.4?

Thank you in advance :)

1 Answer

0 votes
by (26.4k points)

For Python 3.4, There's a module called PyPDF2, you have to install it if you want to work with PDFs in python 3.4.

This PyPDF2 helps in extracting text and returns it as a Python string. 

Run this code in the command line to install it,

pip install pyPdf2 

In this module name, 'y' is lowercase, it is case sensitive. So, make sure you give 'y' in lowercase and other letters in uppercase

>>> import PyPDF2

>>> pdfFileObj = open('my_file.pdf','rb')     #'rb' for read binary mode

>>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

>>> pdfReader.numPages

56

>>> pageObj = pdfReader.getPage(9)          #'9' is the page number

>>> pageObj.extractText()

In the above code, the last statement returns the entire text available on page 9 of the document my_file.pdf

To learn more about Python, Come & Join: Python Certification course

If you to know more about these topics, do check out :

Related questions

0 votes
1 answer
asked Jul 12, 2019 in Python by Sammy (47.6k points)
0 votes
1 answer
asked Jan 2, 2021 in Python by laddulakshana (16.4k points)
0 votes
1 answer
0 votes
1 answer

Browse Categories

...