I'm attempting to extract the text from a pdf file using python
With the help of PyPDF2 module, I tried this following code:
import PyPDF2
pdf_file = open('sample.pdf')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
page_content = page.extractText()
print page_content
But, When I execute the code, I'm getting this kind of output, which is actually different from the PDF document, which I included.
!"#$%#$%&%$&'()*%+,-%./01'*23%4
5'%1$#26%3/%7/))/8%&)/26%8#3"%3"*%313/9#&)
%
How to extract the text from the PDF document?