how to find percentage of similarity between two arrays.
    8 vues (au cours des 30 derniers jours)
  
       Afficher commentaires plus anciens
    
Suppose x=[1 0 1 0],y=[1 1 1 0] here, if i compare individual elements of x with y, then the highest matching (i have to consider from the beginning of x)is at 3rd and 4th of 2nd array. so the percentage of matching is 50% . how to write matlab code for this.
Imp.:-The point is how much % of x starting from 1st element and serially is matching with y.
0 commentaires
Réponse acceptée
  Jan
      
      
 le 14 Mar 2017
        
      Modifié(e) : Jan
      
      
 le 14 Mar 2017
  
      I want to share my guess also, perhaps it matchs your needs:
% Method 1:
x  = [1 1 0 0]; 
y  = [1 1 1 0 1 1 0 1];
nx = length(x);
ny = length(y);
yy = [y, nan(1, nx - 1)];  % Append nans to compare the last values
p  = ones(1, ny);          % Pre-allocation [EDITED, was: zeros()]
for iy = 1:ny              % Loop over substrings of y
  match = find(yy(iy:iy + nx - 1) ~= x, 1);
  if ~isempty(match)       % Any match is found
    p(iy) = 100 * (match - 1) / nx;
  end
end
[maxP, maxPos] = max(p)    % Highest in % value and index
If this solves your problem, it is time to search for optimizations.
[EDITED2] Simplified:
% Method 2:
nx = length(x);
ny = length(y);
yy = [y, nan(1, nx - 1)];  % Append nans to compare the last values
p  = zeros(1, ny);         % Pre-allocation (here zeros() is fine)
for iy = 1:ny              % Loop over substrings of y
  p(iy) = find([(yy(iy:iy + nx - 1) ~= x), true], 1) - 1;
end
[maxP, maxPos] = max(100 * p / nx)  % Highest value in % and index
If y is long, it might be cheaper to iterate over the substrings of x:
Note: strfind operates on double vectors directly, as long as they have a row shape. Then:
% Method 3:
p  = zeros(size(y));
nx = length(x);
for ix = 1:nx                       % Loop over substrings of x
  p(strfind(y, x(1:ix))) = ix;      % STRFIND accepts double vectors
end
[maxP, maxPos] = max(100 * p / nx);
Because only the longest match is wanted, it is cheaper to start with the complete x and stop the loop when the first match is found:
% Method 4:
maxP = 0;
for ix = length(x):-1:1           % Start with complete x
  maxPos = strfind(y, x(1:ix));   % Search in y
  if any(maxPos)                  % Success 
    maxP = 100 * ix / length(x);
    break;
  end
end
Now maxPos contains all indices of the occurrences of the longest substring.
Sorry for posting multiple versions. I thought seeing the steps of development might be interesting.
22 commentaires
  Image Analyst
      
      
 le 13 Oct 2022
				@Emu, I don't have your data but if it's a column vector (vertical), try
yy = [whoSpeakCoder; nan(nx - 1, 1)];  % Append nans to compare the last values
% or else you could make yy a row vector instead if you want:
yy = [reshape(whoSpeakCoder, 1, []), nan(1, nx - 1)];  % Append nans to compare the last values
Plus de réponses (4)
  John BG
      
 le 14 Mar 2017
        
      Modifié(e) : John BG
      
 le 16 Mar 2017
  
      Hi Adithya
ok, got it
1.
The percentages are implicit in the mask applied to x.
let be
x2=[1 1 0 0];             % pattern to find
y2=[1 1 1 0 1 1 0 1]      % signal
then
if sought string is [1 1 0 0] then percentage=100%
if sought string is [1 1 0] then percentage=75%
if sought string is [1 0 0] then percentage=75%
if sought string is [1 1] then percentage=50%
if sought string is [1 0] then percentage=50%
if sought string is [0 0] then percentage=50%
it's agreed that no single bits are sought, right?
so, my suggestion is to address the percentage 1st.
the basic processing is:
clear all
x2=[1 1 0 0];             % pattern
y2=[1 1 1 0 1 1 0 1]      % signal     
maskx=[1:3]                % assign percentage here
                           % but only continuous bits, ok?
x=num2str(x2(maskx));
y=num2str(y2);
x(x==' ')=[];y(y==' ')=[];
n=strfind(y,x)
2.
sweeping all percentages
x=[1 1 0 0]
y=randi([0 1],1,20)
N=numel(x)
N=4;
maskx=[1:N]
v={}
for q=1:1:N-1
     for s=1:q
            v=[v maskx([1+s-1:N-q+1+s-1])]
     end
end 
stry
for k=1:1:numel(v)
    Lx=v{k};
    pc=numel(Lx)/numel(x)*100;
    strx=num2str(x(Lx));
    stry=num2str(y);
    strx(strx==' ')=[];stry(stry==' ')=[];
    n=strfind(stry,strx);
    strdisp=['sample ' strx ' with percentage ' num2str(pc)  '%%has '  num2str(numel(nonzeros(n)))  ' match(es). location in y: ' num2str(n) ];
    sprintf([strdisp '\n'])
end
3.
example
stry =
01000101100010101001
   =
  sample 1100 with percentage 100% has 1 match(es). location in y: 3
   =
  sample 110 with percentage 75% has 1 match(es). location in y: 3
   =
  sample 100 with percentage 75% has 2 match(es). location in y: 4  14
   =
  sample 11 with percentage 50% has 2 match(es). location in y: 3  19
   =
  sample 10 with percentage 50% has 6 match(es). location in y: 4   8  10  12  14  17
=
sample 00 with percentage 50% has 4 match(es). location in y: 1   5   6  15
if you find this answer useful would you please be so kind to mark my answer as Accepted Answer?
To any other reader, please if you find this answer
please click on the thumbs-up vote link
thanks in advance
John BG
regards
John BG
4 commentaires
  Jan
      
      
 le 16 Mar 2017
				The conversion to a string can be omitted:
x2 = [1 1 0 0];
y2 = [1 1 1 0 1 1 0 1];
strfind(y, x(1:3))
  John BG
      
 le 16 Mar 2017
				
      Modifié(e) : John BG
      
 le 16 Mar 2017
  
			my script applied to
x = [1 0 1 0 1 1 1];
y = [1 1 0 0 0 0 0 0 0 ];
apply my updated script
1  1  0  0  0  0  0  0  0
ans =
sample 1010111 with percentage 100% has 0 match(es). location in y: 
ans =
sample 101011 with percentage 85.7143% has 0 match(es). location in y: 
ans =
sample 010111 with percentage 85.7143% has 0 match(es). location in y: 
ans =
sample 10101 with percentage 71.4286% has 0 match(es). location in y: 
ans =
sample 01011 with percentage 71.4286% has 0 match(es). location in y: 
ans =
sample 10111 with percentage 71.4286% has 0 match(es). location in y: 
ans =
sample 1010 with percentage 57.1429% has 0 match(es). location in y: 
ans =
sample 0101 with percentage 57.1429% has 0 match(es). location in y: 
ans =
sample 1011 with percentage 57.1429% has 0 match(es). location in y: 
ans =
sample 0111 with percentage 57.1429% has 0 match(es). location in y: 
ans =
sample 101 with percentage 42.8571% has 0 match(es). location in y: 
ans =
sample 010 with percentage 42.8571% has 0 match(es). location in y: 
ans =
sample 101 with percentage 42.8571% has 0 match(es). location in y: 
ans =
sample 011 with percentage 42.8571% has 0 match(es). location in y: 
ans =
sample 111 with percentage 42.8571% has 0 match(es). location in y: 
ans =
sample 10 with percentage 28.5714% has 1 match(es). location in y: 2
ans =
sample 01 with percentage 28.5714% has 0 match(es). location in y: 
ans =
sample 10 with percentage 28.5714% has 1 match(es). location in y: 2
ans =
sample 01 with percentage 28.5714% has 0 match(es). location in y: 
ans =
sample 11 with percentage 28.5714% has 1 match(es). location in y: 1
ans =
sample 11 with percentage 28.5714% has 1 match(es). location in y: 1
result:
there are only 2 sub sequences 10 and 11, each with only 1 match, 10 in location 2 and 11 in location 1, and only one match percentage
28.57%
John BG
  John BG
      
 le 12 Mar 2017
        Hi Aditya
the following solves this question, and your other question
that has been closed, not by me, considered duplicate
clear all;clc;close all;
x=[1 1 1];
y=randi([0 1],1,20)
% y= [ 1 1 1 0 1 1 0 1]
maskx=[1:3]
r=conv(x(maskx),y)
n=find(r==max(r))
if max(r)==sum(x(maskx))                    % only sync if peak
     for k=1:1:numel(n)                                            
          sync_position(k)=n(k)                 % sync_position in correlation
          sync_start(k)=sync_position(k)-numel(maskx)+1;     % sync_start in y
          percentage_match(k)=numel(x(maskx))/numel(x)*100
          sync_start=nonzeros(sort(sync_start))
     end
     else
          disp('no match')
end
y
Please let me know if this answer satisfies your question, script_10.m sent by email
regards
John BG
2 commentaires
  John BG
      
 le 14 Mar 2017
				Aditya
you are right, with a for loop some chains go missing.
It's even easier than using a for loop:
clear all
x2=[1 1 0 0];             % pattern
y2=[1 1 1 0 1 1 0 1]      % signal     
% y=randi([0 1],1,20)
maskx=[1:3]                % assign percentage here
                           % but only continuous bits, ok?
x=num2str(x2(maskx));
y=num2str(y2);
x(x==' ')=[];y(y==' ')=[];
n=strfind(x,y)
Please confirm
Regards
John BG
  Image Analyst
      
      
 le 19 Mar 2017
        I'm not sure what the question is because, after reading most of the replies, it seems that aditya's been changing it (specifically the sample data and size of the sample data), but one measure of similarity is the Sørensen–Dice coefficient.
When it's applied to binary images, I believe it requires the images to be of the same size.
0 commentaires
Voir également
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




