How to get better OCR results (without confusing digits for letters)

13 vues (au cours des 30 derniers jours)
Carl Youel
Carl Youel le 6 Juil 2021
Commenté : Carl Youel le 6 Juil 2021
Hello all,
I'm trying to use OCR to determine the axes scale on a graph:
(I want to be able to extract the numbers "0, 32000, 4000, etc." on the y-axis, and "-50, 50, 150, etc." on the x-axis)
My initial attempt is this code:
detect = ocr(justAxes, 'TextLayout', "Block");
Iocr = insertObjectAnnotation(justAxes, 'rectangle', ...
detect.WordBoundingBoxes, ...
detect.Words + " " + detect.WordConfidences);
figure; imshow(Iocr);
words_string = detect.Words;
Which gives me this result:
The results aren't bad, but I'm wondering if there is any preprocessing I can do to avoid the OCR misreading digits as letters (e.g. the '50' as 'so', the '8000' as 'sooo', and to '0' as 'o'). Can I somehow tilt the OCR to detect digits more than it detects letters? Or do I have to preprocess the image further in some way?

Réponse acceptée

Image Analyst
Image Analyst le 6 Juil 2021
You need to have your digits be at least 20 pixels high, as stated in the help. I also had trouble with some that where the image chunk I gave it had the numbers that were only 10 or 12 pixels high and while a human could tell what they were, the ocr() function was misidentifying the numbers. I called imresize() on each image chunk to make the image 20 pixels high and then it properly identified the number. If that doesn't work, write back and attach your code and image.
  1 commentaire
Carl Youel
Carl Youel le 6 Juil 2021
Ah yes, that did it. Thanks for the help, much appreciated!

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Image Processing and Computer Vision dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by