Matching based on the first word

1 vue (au cours des 30 derniers jours)
Danielle Leblance
Danielle Leblance le 10 Nov 2017
Modifié(e) : Jan le 10 Nov 2017
Hi,
I have two cell arrays A and B where each contains companies names: for example A contains "Biotech Capital Corp" while B contains the same company but listed differently "BIOTECH CAP CORP". Can I match A and B based on the first word in the text? I mean can i write a code that tells that "Biotech Capital Corp" matches "BIOTECH CAP CORP" based on teh first word in the two strings "biotech" ?

Réponse acceptée

Jan
Jan le 10 Nov 2017
Modifié(e) : Jan le 10 Nov 2017
A_list = {'Biotech Capital Corp', 'Apple something', 'tesla something else'};
B_list = {'Volvo Car AB', 'BIOTECH CAP CORP', 'TESLA by Elon Musk'};
A1 = strtok(A_list, ' ');
B1 = strtok(B_list, ' ');
[LiA, LocB] = ismember(lower(A1), lower(B1)); % Or the other way around
[AInd, BInd] = CStrAinBP(strtok(A_list), strtok(B_list), 'i')

Plus de réponses (2)

per isakson
per isakson le 10 Nov 2017
Modifié(e) : per isakson le 10 Nov 2017
Yes, try this
match = cssm()
match =
0 1 0
0 0 0
0 0 1
A_list items on the rows and B_list on the columns, i.e.
  • the first item in A_list matches the second item in B_list.
  • the fourth item in A_list matches the fourth item in B_list.
where
function match = cssm()
A_list = {'Biotech Capital Corp', 'Apple something', 'tesla something else'};
B_list = {'Volvo Car AB', 'BIOTECH CAP CORP', 'TESLA by Elon Musk'};
match = false( length( A_list ), length( B_list ) );
for jj = 1 : length( A_list )
a1 = regexp( A_list{jj}, '\<\w+\>', 'once', 'match' );
xpr = sprintf( '\\<%s\\>', a1 );
cac = regexpi( B_list, xpr );
%
match( jj, : ) = cellfun( @(pos) not(isempty(pos))&&pos==1, cac );
end
end

KSSV
KSSV le 10 Nov 2017
Read about strcmpi. It compares strings irrespective of what case they are in.
  1 commentaire
Danielle Leblance
Danielle Leblance le 10 Nov 2017
I don't think it will work since i am not comparing two complete strings. Rather I would like to compare the first word of two strings.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Genomics and Next Generation Sequencing dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by