ismember for table rows gives error for NaN and string

11 vues (au cours des 30 derniers jours)
Nghi Truong
Nghi Truong le 30 Jan 2019
Modifié(e) : Guillaume le 30 Jan 2019
I am trying to confirm whether a row from one table can be found in another row of another table.
The row I want to find may have more or less columns than the table to search in. I build a string array of compatible indexes and subset both row and table to handle this. I was unable to find a better way to deal with this issue.
One condition I need is that if the row has more columns, I want the table to be amended by "NaN" or otherwise empty columns, such that ismember always shows 0. Since I do not know the type of the table column that is missing, I can not do things like "zero". I thought NaN - as missing data, literally "not a number" would make sense. But, as it turns out, it doesnt' work either.
The issue is that the behavior of ismember depends on whether the table field is a numeric or string. If it is a numeric, it works with NaN. If it is a string, it fails with error
"Error using tabular/ismember (line 37)
Unable to merge the 'c' variables in A and B.
Caused by:
Error using union (line 110)
Second argument must be a string array, character vector, or cell array of character vectors."
If ismember is implemented on table, I think it should work whether the table has string or numbers. In the end, that's the use case for table, is it not? Otherwise, I wonder what sort of default element I could use to set up missing data in a table, given that NaN is not implemented for strings.
Is there a more generic NaN?
Here is a minimum example .
Note how this works correctly:
% This builds a row table, and a 2-row table with less columns
clear row row2 secondTable
row.a=1;
row.b="test";
row.c=3;
row=struct2table(row);
row2.a=1;
row2.b="test";
%row2.c=3 % - condition for ismember=1
secondTable=struct2table(row2);
secondTable(2,:)=cell2table({2,"hello"});
% Since c does not exist, replace it with NaN
secondTable.c=NaN(height(secondTable),1);
% This will not find a match
[exist,idx]=ismember(row,secondTable,'rows')
However, this throws an error because now c is a string field
% This builds a row table, and a 2-row table with less columns
clear row row2 secondTable
row.a=1;
row.b="test";
row.c="A string";
row=struct2table(row);
row2.a=1;
row2.b="test";
%row2.c="A string"; % - condition for ismember=1
secondTable=struct2table(row2);
secondTable(2,:)=cell2table({2,"hello"});
% Since c does not exist, replace it with NaN
secondTable.c=NaN(height(secondTable),1);
% Error
[exist,idx]=ismember(row,secondTable,'rows')
  3 commentaires
Nghi Truong
Nghi Truong le 30 Jan 2019
intersect seems to use the same backend and errors out the same way as ismember
Bob Thompson
Bob Thompson le 30 Jan 2019
Alright, just curious.
You could try doing an if check for the class of your element, and then implement a string 'NaN' instead of a double NaN.

Connectez-vous pour commenter.

Réponse acceptée

Guillaume
Guillaume le 30 Jan 2019
Modifié(e) : Guillaume le 30 Jan 2019
"One condition I need is that if the row has more columns, I want the table to be amended by "NaN" or otherwise empty columns, such that ismember always shows 0"
In that case, it is simpler to compare the number of columns and not bother calling ismember at all. In fact, ismember requires both tables to have exactly the same variable names, so you could just compare both set of variable names:
if ~isempty(setxor(table1.Properties.VariableNames, table2.Properties.VariableNames))
%variable names don't match
result = zeros(height(table1), 1);
else
result = ismember(table1, table2);
end
Or you could just call ismember and trap the error that will be raised when the number of variables or their names don't match
try
result = ismember(table1, table2);
catch
%ismember failed
result = zeros(height(table1), 1);
end
" I wonder what sort of default element I could use to set up missing data in a table, given that NaN is not implemented for strings"
NaN is a numeric value. It can be used to indicate missing numbers but it does not make any sense for strings. If a variable is a string type, filling it with NaN is a very bad idea as the variable is then a mix of strings and numbers.
Matlab has a specific missing indicator for strings (<missing>), as long as you do mean string and not char array. For char array, the only thing you can use is an empty char array '' (which is indistiguishable from an empty char array unfortunately).
For the types that support it you can use the missing function to generate an array of missing values. This gets converted to the proper missing indicator (NaN, NaT, <undefined>, <missing>) depending on the type.

Plus de réponses (0)

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by