Explore Courses Blog Tutorials Interview Questions
0 votes
in Python by (16.4k points)

I am dealing with extracting the text out of pictures. 

At first, pictures are shaded with text set in white, On additional handling the pictures, the text has appeared in the dark, and other pixels are white (with some clamor), here is an example: 

Presently when I attempt OCR with the help of pytesseract (tesseract) on it, I actually am not getting any text. 

Is any solution conceivable to remove text from shaded pictures?

1 Answer

0 votes
by (26.4k points)

Try the below code:

from PIL import Image

import pytesseract

import argparse

import cv2

# construct the argument parser and parse the arguments

ap = argparse.ArgumentParser()

ap.add_argument("-i", "--image", required=True, help="Path to the image")

args = vars(ap.parse_args())

# load the image and convert it to grayscale

image = cv2.imread(args["image"])

cv2.imshow("Original", image)

# Apply an "average" blur to the image

blurred = cv2.blur(image, (3,3))

cv2.imshow("Blurred_image", blurred)

img = Image.fromarray(blurred)

text = pytesseract.image_to_string(img, lang='eng')

print (text)


Result I obtain: "Stay: in an Overwoter Bungalow $3»"

It will work if we use contour and removing the unnecessary blobs from it.

Looking for a good python tutorial course? Join the python certification course and get certified.

For more details, refer to this Python full course video:

Browse Categories