Image classification in python

Question

asked Jul 23, 2019 in Machine Learning by ParasSharma1 (19k points)

I'm looking for a method of classifying scanned pages that consist largely of text.

Here are the particulars of my problem. I have a large collection of scanned documents and need to detect the presence of certain kinds of pages within these documents. I plan to "burst" the documents into their component pages (each of which is an individual image) and classify each of these images as either "A" or "B". But I can't figure out the best way to do this.

More details:

I have numerous examples of "A" and "B" images (pages), so I can do supervised learning.
It's unclear to me how to best extract features from these images for the training. E.g. What are those features?
The pages are occasionally rotated slightly, so it would be great if the classification was somewhat insensitive to rotation and (to a lesser extent) scaling.
I'd like a cross-platform solution, ideally in pure python or using common libraries.
I've thought about using OpenCV, but this seems like a "heavy weight" solution.

1 Answer

Anurag · Answer 1 · 2019-07-23T12:38:52+0000

There are 3 steps to solve your problem case.

Feature Extraction - When you have a large dataset to choose from in the object detection field. Then, I would recommend the SIFT/SURF class of features. You should also find Harris corners etc. suitable.
Classifier Selection - Here you can use the Random Forest classifier. The concept is quite simple to understand and it is highly flexible and non-parametric. The tuning of the model requires very few parameters and you can also run it in a parameter selection mode during supervised training
Implementation - Using complete python implementations for image processing is never going to be very fast. I recommend using a combination of OpenCV for feature detection and R for statistical work and classifiers.

Hope this answer helps you! For more details and insights, study Python Tutorial.

Image classification in python

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources