MATLAB Answers

Transform NaN into number

23 views (last 30 days)
Hi everyone,
I have data that is organized in structures which look like this:
rating.pretest.s1
rating.pretest.s2
...
and in those structures on the last level (s1, s2...) I have numbers or NaN where a participant failed to push a button in time. Since this counts as a wrong answer I would like to assign all the NaNs the number 1. What I tried is
rating(isnan(rating)) = 1
but I got the error message Undefined function "isnan" for input arguments of type "struct". I have read some answers on similar questions but none of them seemed applicable since I'm using a structure (or were too complicated for me to understand since I'm still a beginner). Is there an alternative I could use? Thank you in advance!

  2 Comments

Walter Roberson
Walter Roberson on 23 May 2019
Is rating a scalar or non-scalar structure? Is rating.pretest a scalar or non-scalar structure?
For example is there a possibility of a rating(3).pretest(7).s1 ?
Rose Potter
Rose Potter on 23 May 2019
I think it's not but I'm not quite familiar with that term. It looks like this: rating is a 1x1 structure that contains two 1x1 structures, pretest and posttest. Each of those contains several fields (s1, s2, ...) which contain 56 numbers each

Sign in to comment.

Accepted Answer

Steven Lord
Steven Lord on 23 May 2019
Edited: Steven Lord on 23 May 2019
For this application, I'd use fillmissing. It will do the same task as filling in locations using logical indexing with the output of isnan, but in the code I find it makes the intent clearer to the reader. In the example below I'm going to assign the output of fillmissing into a different struct array so you can compare the "before and after", but you could assign the output back into the struct you processed if you want. Let's define some sample data:
rating.pretest.s1 = [4;1;NaN;3];
rating.pretest.s2 = [5;2;0;NaN];
Now let's fill in the missing values with the constant 1. I used 'UniformOutput', false to return the output as a struct array with the same fields as the input.
fillingfunction = @(x) fillmissing(x, 'constant', 1);
rating2.pretest = structfun(fillingfunction, rating.pretest, ...
'UniformOutput', false);
Compare the before and after.
rating.pretest.s1
rating2.pretest.s1
rating.pretest.s2
rating2.pretest.s2
FYI you might want to use table arrays to store your data.
pretestScores = table(rating.pretest.s1, rating.pretest.s2, ...
'VariableNames', {'Test1', 'Test2'}, ...
'RowNames', {'Alice', 'Bob', 'Charlie', 'Doug'})
posttestScores = table(rating.pretest.s1+2, rating.pretest.s2+3, ...
'VariableNames', {'Test1', 'Test2'}, ...
'RowNames', {'Alice', 'Bob', 'Charlie', 'Doug'})
You can even store one or more table arrays inside another table!
allScores = table(pretestScores, posttestScores)
allScores.Properties.RowNames = pretestScores.Properties.RowNames
Though if you were to build that allScores table you probably wouldn't want to put the student names as the RowNames of the inner table arrays as well, as that looks kind of funny. You can trim them if you want.
allScores.pretestScores.Properties.RowNames = {}
allScores.posttestScores.Properties.RowNames = {}

  3 Comments

Rose Potter
Rose Potter on 23 May 2019
Thank you so much, this actually works! What would be the benefit of a table as opposed to my structure? I do see that it looks odd and I haven't seen others use it (probably for a reason), but that reason is not clear to me. Also, I would still need to create those structures before I can put them into a table, right?
Steven Lord
Steven Lord on 23 May 2019
One nice feature of table arrays is the ability to use names to index into them rather than numbers. The following works to check Alice's score on test 1 regardless of the order of the rows. I don't need to keep a separate list of student names, check which entry in rating.pretest.s1 is Alice's, and retrieve it.
pretestScores{'Alice', 'Test1'}
As for building the struct before building the table, not necessarily. I built them because I believe someone else did. [Assuming you're using a sufficiently new release the VariableNames and RowNames can be either cell arrays containing char arrays or string arrays.]
scores1 = [4; 1; NaN; 3];
scores2 = [5; 2; 0; NaN];
studentnames = ["Alice"; "Bob"; "Charlie"; "Doug"];
pretestScores = table(scores1, scores2, ...
'VariableNames', {'Test1', 'Test2'}, ...
'RowNames', studentnames)
If you do this, filling in the NaN values is even easier. The fillmissing function can accept a table array as its first input.
pretestScoresFilled = fillmissing(pretestScores, 'constant', 1)
Rose Potter
Rose Potter on 23 May 2019
That was very helpful and well explained, thanks a lot for taking the time!

Sign in to comment.

More Answers (2)

Jos (10584)
Jos (10584) on 23 May 2019
This function recursively looks at all fields of the structure and replaces any NaNs by a value. Also works for structure arrays:
function S = structNaN2num(S, value)
if isstruct(S)
% recursively look at all fields of structure array S
for k=1:numel(S)
S(k) = structfun(@(x) structNaN2num(x, value), S(k),'un',0) ;
end
else
if isnumeric(S)
S(isnan(S)) = value ; % replace NaNs with a value
end
end
Use it like this:
clear a
a.x1 = 1 ; a.x2 = NaN ; x.x3.y1 = NaN ; a.x3.y2 = [1 2 NaN]
a(2) = a(1) % structure array
a(2).x3.y2 = [NaN 2 3]
b = structNaN2num(a, 999)

  1 Comment

Rose Potter
Rose Potter on 23 May 2019
Thanks a lot, this works perfectly!

Sign in to comment.


Stephen Cobeldick
Stephen Cobeldick on 23 May 2019
Edited: Stephen Cobeldick on 23 May 2019
Your original description failed to mention several things, including the size of the numeric vector and also that the numeric vector is (pointlessly) nested inside a scalar cell array. But once we know the exact data structure, it is very easy to loop over those fields and change the NaN values to whatever you want:
>> load('rating.pre.67.mat')
>> fld = fieldnames(rating.pre)
fld =
's63'
's64'
's65'
's66'
's67'
>> rating.pre.s64{1}
ans =
5
NaN
5
5
5
... more lines here
4
5
5
>> for k = 1:numel(fld), idx = isnan(rating.pre.(fld{k}){1}); rating.pre.(fld{k}){1}(idx) = 1; end
>> rating.pre.s64{1}
ans =
5
1
5
5
5
5
5
... more lines here
4
5
5
>>
Your data structure is far too complex, e.g.:
  • those scalar cell arrays appear to be entirely superfluous,
  • forcing the meta-data into fieldnames makes your code slower and more complex.
  • nested structures are not very convenient to work with.
I would recommend looking at using much simpler and more efficient data organisation, e.g. with a simple non-scalar structure, or a table.

  2 Comments

Rose Potter
Rose Potter on 23 May 2019
This works as well, thanks a lot for the suggestion!
By forcing meta-data into field names you mean the fact that I have the participant numbers in the structure? Unfortunately I didn't know how else to do that, I'd like to have an array but I thought I won't be able to see the participant number anywhere then. Could you tell me why nested structures are not good? Or maybe you know what I could look up in the Matlab documentation in order to learn how to create these variables in a better way?
Stephen Cobeldick
Stephen Cobeldick on 23 May 2019
"By forcing meta-data into field names you mean the fact that I have the participant numbers in the structure?"
Yes: data and code are two separate paradigms that are best kept separated. Using (meta-)data for fieldnames makes your code fragile because it is susceptibile to any external changes in those IDs (e.g. consider what your code would do if the ID format changes to include characters that are invalid for fieldnames). Read this for a detailed explanation:
"Unfortunately I didn't know how else to do that, I'd like to have an array but I thought I won't be able to see the participant number anywhere then."
You can easily store the ID in a cell array, or a non-scalar structure, e.g.:
S(1).data = [...];
S(1).ID = 's63';
S(2).data = [...];
S(2).ID = 's64';
...
If the ID is actually a numeric value (and you just added the 's' to make it a valid fieldname) then simply storing it in a numeric array (using indexing) would be by far the simplest and most efficient solution.
Remember that you do not have to put all of the meta-data and test-data into the same array, it might be more convenient to use two (or more) arrays of classes that better suit the meta-data and the test-data (but of course using exactly the same indexing, so there is no ambiguity about how their elements correspond). For example:
data = [... all numeric data... ]
IDs = {... cell array of IDs ... }
"Could you tell me why nested structures are not good?"
Simply because accessing their contents leads to quite bulky code which is often not easy to follow, as generally they require lots of loops to process. If the nested structures are not a good representation of how the data are actually related and arranged, then this makes the code much more complex than it needs to be (and thus slower, buggier, etc.).
They can often be replaced by simpler data arrangements using indexing (e.g. non-scalar structures or tables).
"Or maybe you know what I could look up in the Matlab documentation in order to learn how to create these variables in a better way?"
It comes with practice and lots of reading.
A general rule of thumb is to use the simplest data class that will hold your data.

Sign in to comment.


Translated by