Info

Cette question est clôturée. Rouvrir pour modifier ou répondre.

Ignore Deletions with Edit Distances (String Editing)

1 vue (au cours des 30 derniers jours)
Marcel Dorer
Marcel Dorer le 22 Avr 2016
Clôturé : MATLAB Answer Bot le 20 Août 2021
Hi, I'm trying to compare 2 strings with a function based on Miguel Castro's EditDist.m function. The function works pretty well but in my case I need to ignore some of the Deletions, namely all in the beginning and the end of the string.
For example when I compare the 2 Strings 'XXXXMatlabXXXX' and 'YYMatlabYY' the first 2 'X' and the last 2 'X' which would be deletions shouldn't count towards the EditDistance value (which should be 4 in this case). Basically one of the 2 strings has a random number of random surrounding values that should be ignored, deletions after the first Insertion/Replacement/Correct Value should be counted normally, at least until there is only a tail of deletions left.
Help would be really appreciated!
Here is the relevant part of the function I'm using:
for i = 1:n1
D(i+1,1) = D(i,1) + DelCost;
end;
for j = 1:n2
D(1,j+1) = D(1,j) + InsCost;
end;
for i = 1:n1
for j = 1:n2
if s1(i) == s2(j)
Repl = 0;
else
Repl = ReplCost;
end;
D(i+1,j+1) = min([D(i,j)+Repl D(i+1,j)+DelCost D(i,j+1)+InsCost]);
end;
end;
d = D(n1+1,n2+1);

Réponses (1)

Arnab Sen
Arnab Sen le 26 Avr 2016
Modifié(e) : Arnab Sen le 27 Avr 2016
Hello Marcel,
I am assuming that between two strings s1 and s2, s1 is known to be the one which is wrapped with some redundant characters.
Now, let's dig into what is meant by D(i,j) in the script. It means that the conversion cost of s1.substring(1,i) to s2.substring(1,j) and vice verse. Now, let's assume that after kth index of s1, all the indices are redundant. So,
D(n1,n2)=D(k,n2)+(n1-k)*DelCost.
So, Now the task is simple. We need to find out the value of k. Following code snippet should do that:
i=n1;
while(D(i,n2)-D(i-1,n2)==DelCost)
{
i=i-1;
}
k=i;
So, the last (n1-k) chars are redundant in s1.
Now we need to find out the front end redundant characters in s1. For this we can create another table (say X) where
X(i,j)= The conversion cost of s1.subtring(i,n1) to s2.sunstring(j,n2) and adopt similar approach.
A simpler approach would be just reverse the string s1 (say s1')and s2 (s2') and call edit distance again and perform same workflow. Now redundant character at the end of s1' are the redundant characters in the front end of the original string s1.
At the end subtracts DelCost*(number of total redundant characters in s1) from the original output.
  2 commentaires
Marcel Dorer
Marcel Dorer le 26 Avr 2016
Thanks a lot for the answer, it was pretty helpful and I understand the principle. There is only 1 thing I fail to understand:
{
i--;
}
I'm no matlab expert and I have to admit that I've never seen an expression like that. If I try to use that part in matlab a bracket error occurs. I'd really appreciate if you could explain this a little more!
Arnab Sen
Arnab Sen le 26 Avr 2016
Modifié(e) : Arnab Sen le 26 Avr 2016
Hi,
You are correct. MATLAB does not recognize i--. It's common in languages like C, C++, Java. Please consider the expression as
{
i=i-1;
}
I have edited the original answer as well accordingly. Thanks for pointing this out.
Please accept the answer if this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by