Word segmentation based on projection histogram ?

2 vues (au cours des 30 derniers jours)
Nguyen Hien
Nguyen Hien le 3 Août 2015
Commenté : Ayush Gupta le 1 Avr 2018
Hi all,
I am currently working on an OCR project, and I am stuck now at word segmentation. The basic algorithm is to base on the horizontal projection of a segmented line, I will look for space between rising edge and falling edge. The problem is I could not differentiate between word space and character space, or I could not automatically find the proper threshold to crop out a word. Please help, any help would be appreciated, thank you guys By the way, how could I contact mr Image Analysis directly please ? Here is how my work is at the moment:
%read in an image
close all, clear all;
I = imread('C:\Users\Nguyen Duy Hien\Desktop\bible.jpg');
%to grayscale image
I = rgb2gray(I);
level = graythresh(I);
%binarization
BW = im2bw(I,level);
%BW = imadjust(I);
%smoothering image
h = fspecial('gaussian',[3 1],0.8);
BW = imfilter(BW,h);
BW=~BW;
BWedge = edge(uint8(BW));
BW = imfill(BWedge,'holes');
figure(1),imshow(BW)
%---line segmentation
pV = sum(BW,1);
pH = sum(BW,2);
figure(2),plot(pH)
figure(3),plot(pV)
lines = pH > 0;
%Detect rising edge and falling edge
d = diff(lines);
startingColumns = find(d>0);
endingColumns = find(d<0);
subImage = [];
n = length(startingColumns);
space = []>0;
y=[];
count = 1;
for k = 1 : n
subImage{k} = BW(startingColumns(k):endingColumns(k),:);
figure(4)
subplot(n,1,k),imshow(subImage{k})
pHline{k} = sum(subImage{k},1);
figure(5)
subplot(n,1,k),plot(pHline{k})
lineN = pHline{k} > 0;
a = diff(lineN);
startingRow = find(a>0);
endingRow = find(a<0);
buf_end = [];
buf_start = [startingRow(1)];
m = length(startingRow)-1;
for j = 1 : m
space{j} =startingRow(j+1) - endingRow(j);
A = cell2mat(space);
y = [y, max(A)];
if min(y)<space{j} && max(y)>space{j}
buf_end = [buf_end; endingRow(j)];
buf_start = [buf_start; startingRow(j+1)];
end
end;
buf_end = [buf_end; endingRow(end)];
o = length(buf_end);
for i=1:o
word{i} = subImage{k}(:,buf_start(i):buf_end(i));
wordarr{count} = word{i};
figure, imshow(wordarr{count})
%figure(6), subplot(o,n,count),imshow(wordarr{count})
count = count+1;
end;
end;
  2 commentaires
Walter Roberson
Walter Roberson le 4 Août 2015
Image Analyst does not wish to be contacted privately. He responds to some posts, if it amuses him to do so.
sayar chit
sayar chit le 14 Nov 2017
Hi Sir! I am studying image segmentation from printed documents. I got well line segmentation and words segmentation but I cannot get character segmentations from words. So can anyone help me. This is my words a
s inputs. I want to get its as follows မ,ိ,ှု,င,်,း,တ,ိ,ု,က,်,၍

Connectez-vous pour commenter.

Réponses (3)

Nguyen Hien
Nguyen Hien le 4 Août 2015
Thank you guys so much for your help, fortunately I have figured out the solution
  2 commentaires
somanath prakash
somanath prakash le 3 Avr 2017
Nguyen hien please let me have the solution!!!
Ayush Gupta
Ayush Gupta le 1 Avr 2018
hey can you send the solution

Connectez-vous pour commenter.


Image Analyst
Image Analyst le 4 Août 2015
You just did contact me directly - as direct as it gets. Sorry, I don't do private consulting, besides, OCR is not even my field. I'd just refer you to either the Computer Vision System Toolbox, or, if that doesn't work, then Vision Bib: http://www.visionbib.com/bibliography/contentschar.html#OCR,%20Document%20Analysis%20and%20Character%20Recognition%20Systems Besides you didn't even attach your image so we can't try your code and I couldn't detect problems like yours just by looking over the code and imagining what it would do with an image. Sorry but if it's major algorithm development, we just don't have the time for that here. If it's something quick, like a few minutes to correct syntax or logic flow or something, then maybe we can help with something that short.

Walter Roberson
Walter Roberson le 4 Août 2015
There is no fixed number of pixels that can be used to define the difference between spacing between characters and spacing between words. Some languages do not have spacing between words. And the spacing between characters on a very large sign could be larger than the total length of a word on a smaller sign.
You need to examine the relative distance between centroids, perhaps as compared to the width of the blobs.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by