Confusion about performance improvement by memory preallocation

I have a complex task of extracting relevant measurement data from a stack of mat files (each has 237 variables). I have 30 mat files, the files 9 to 18 are relevant (can not delete them, let's just accept this here). In each relevant file, only a few data points shall be extracted (for each variable), their index is given by a separate file (containing the measurement times).
The meas_times.mat looks like this, where each column corresponds to a file (9:18) and the rows contain start and stop times in alternating order.
0 0 0 0 0 0 0 0 315 0
0 0 0 0 0 0 0 0 352 0
0 0 0 0 56 0 26 0 373 0
0 0 0 0 115 0 45 0 394 0
0 0 0 0 141 121 65 515 476 1
0 0 0 0 200 180 104 574 511 8
0 0 0 0 471 201 132 609 837 44
0 0 0 0 530 260 191 611 860 95
0 0 0 0 610 721 443 664 881 109
0 0 0 0 669 780 502 720 910 154
0 461 0 0 671 796 521 737 961 174
0 520 0 0 730 821 580 796 990 187
0 591 0 171 1216 1072 711 1126 1071 580
0 650 0 230 1275 1105 770 1185 1130 610
0 981 21 276 1311 1197 773 1241 1171 667
0 1040 80 335 1336 1256 832 1300 1200 695
508 1061 121 721 1629 1376 1390 1324 1311 783
567 1120 142 780 1662 1435 1449 1383 1340 842
601 1721 151 816 1677 1451 1482 1411 1721 1151
660 1780 188 875 1736 1510 1541 1470 1780 1210
Example:
Of file 9 (1st column), I only want data from index 508-567 and 601-660 in each variable. For file 10, 21-80 etc. ...
My first solution with consecutive for-loops was:
tic
%% initialize
clear
clc
load('meas_times.mat')
times = B;
%% define path
path = 'C:\Users\felix\Documents\HeatFlow\Messkampagne Itterbeck\Messdaten\';
list = dir(path);
%% create empty struct t to collect relevant data in
t = struct(load(sprintf('%s%s',path,list(9).name)));
fn = fieldnames(t);
for a = 1:numel(fn)
t.(fn{a})=[];
end
%% filter relevant data
for b = 9:18
% extract files from 8th to 18th, b is their position in the folder
Dateiname = sprintf('%s%s',path,list(b).name);
s = struct(load(Dateiname)); % this is the b-th file as struct
fn = fieldnames(s);
if fn{1} == "Abtastrate_1_HzZeit_1_Hz_" % removes a duplex variable that some files have
s = rmfield(s,'Abtastrate_1_HzZeit_1_Hz_');
fn = fn(2:end);
end
v = nonzeros(times(:,b-8)); % % extract the measurement times that correspond to the b-th file
for z = 1:115 % 1-Hz variables from row 1 to row 115
temp = []; % empty dummy array
for i = 1:2:length(v)
temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
end
t.(fn{z}) = vertcat(t.(fn{z}),temp); % write extracted values into the new struct to the z-th variable
end
for z = 116:154 % 2-Hz variables up to row 154
% preprocessing: mean of 2Hz
for i = 1:2:numel(s.(fn{z}))-1
s.(fn{z})(i:i+1) = mean(s.(fn{z})(i:i+1));
end
s.(fn{z})(2:2:end) = [];
temp = []; % empty dummy array
for i = 1:2:length(v)
temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
end
t.(fn{z}) = vertcat(t.(fn{z}),temp); % write extracted values into the new struct to the z-th variable
end
end
toc
However, this ran very slowly, because the last variables in each file contain a few million values (damn 1kHz loggers). As Matlab gave me the hint, that the temp array was changing size with every loop iteration and this would hurt performance. So I completely rewrote the whole script, included functions and preallocated memory. However, now the whole operation takes thrice as much time. Here is the "improved" script. Have a made any obvious mistake?
tic
%% initialize
clear
clc
load('meas_times.mat')
times = B;
%% define path
path = 'C:\Users\felix\Documents\HeatFlow\Messkampagne Itterbeck\Messdaten\';
list = dir(path);
%% create empty struct t to collect relevant data in
t = struct(load(sprintf('%s%s',path,list(9).name)));
fn = fieldnames(t);
for a = 1:numel(fn)
t.(fn{a})=[];
end
%% filter relevant data
for b = 9:18
% extract files from 8th to 18th, b is their position in the folder
filename = sprintf('%s%s',path,list(b).name);
s = struct(load(filename)); % this is the b-th file as struct
fn = fieldnames(s);
if fn{1} == "Abtastrate_1_HzZeit_1_Hz_" % removes a duplex variable that some files have
s = rmfield(s,'Abtastrate_1_HzZeit_1_Hz_');
fn = fn(2:end);
end
v = nonzeros(times(:,b-8)); % extract the measurement times that correspond to the b-th file
w = [[0,0]';diff(v)+1]; % important for indexing later on
for z = 1:154 % 154 variables in the struct s
switch z
case num2cell(1:115)
hz = 1;
t.(fn{z}) = vertcat(t.(fn{z}),selectdata(hz,z,v,w,s,fn));
case num2cell(116:154)
hz = 2;
t.(fn{z}) = vertcat(t.(fn{z}),selectdata(hz,z,v,w,s,fn));
end
end
end
toc
function temp = selectdata(hz,z,v,w,s,fn)
% preprocessing
for i = 1:numel(s.(fn{z}))/hz
s.(fn{z})(i:i+(hz-1)) = mean(s.(fn{z})(i:i+(hz-1)));
s.(fn{z})(i+1:i+(hz-1)) = [];
end
% empty dummy array
temp = zeros(sum(w(1:2:end)),1);
% extract data into temp
for i = 1:2:length(v)
temp(sum(w(i:-2:1))+1:sum(w(i+2:-2:1)),1) = s.(fn{z})(v(i):v(i+1));
end
end
Feel free to ask questions. I do not want help on my project, I want to understand why the code got SLOWER when I followed Matlabs instructions and changed the code in a way that it preallocated memory.
The 1st code took 9.7 seconds, the 2nd code approximately 31 seconds.

9 commentaires

Not sure, but your code doesn't look clean, a lot of hard code indexing, bad indentation,
But instead of
temp = [];
for i = 1:2:length(v)
temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
end
You could do (and put it in a function)
start = v(1:2:end);
stop = v(2:2:end)
idx = arrayfun(@(start,stop) start:stop, start, stop, 'unif', 0);
idx = cat(2, idx{:});
temp = s.(fn{z})(idx);
Same thing for the outer loop on files, instead of doing vertloop, store data in array of structs, extract data for each struct. Then after all that is done concatenate the field of struct array to form a single struct.
Hello Bruno,
thank you for your reply. I am going to check that today. I didn't know arrayfun, it sounds like a great solution. In MATLAB, you learn new things everyday, haha.
Since I am eager to learn: What do you mean by "bad indentation"? It ist just the standard indentation, isnt it?
For the hard code indexing: The script really only needs to be applied in this one case, so a more general solution would make it unnecessarily big. However, it would look cleaner for sure, so I might make a general backup with parameterized indexing for similiar applications in the future, who knows what's coming up next. Thanks!
Be careful with arrayfun and cellfun. They hide a loop, but they have less performance than a well-written loop.
Good morning Rik,
thanks for the advice, in that case, arrayfun may not be my solution and I need to improve the loop itself, since this is about performance.
Please not that each mat - file contains about 20-25 million values and I must loop through 10 of them. Right now, the code can get this done in approximately 30 minutes, but I would like it to be faster.
Wait you don't seem to get why my method could make thing improves. The main point is NOT replacing for-loop with arrayfun, the main point is replacing the loop of indexing-concatenation of you data by a SINGLE statement of indexing (last statement), that is the costly part.
But other than that Rik is right arrayfun/cellfun is always slower than for-loop. Do not apply on a long aray (it's not the case here). But their usage here is NOT my main point.
Your indentation is bad, look at this, where is the indentation oof the inner statements?
for b = 9:18
% extract files from 8th to 18th, b is their position in the folder
filename = sprintf('%s%s',path,list(b).name);
...
w = [[0,0]';diff(v)+1]; % important for indexing later on
Just wonder why you care about speed if it has to be applied once? Just save the result in another file, then the next time you need it, read the result.
Felix Schönig
Felix Schönig le 11 Déc 2020
Modifié(e) : Felix Schönig le 11 Déc 2020
Hello Bruno, you are right, the b-loop was not indented, that is fixed now. Thanks.
I must admit, I dont understand the arguments of the arrayfun in your first post. I am trying to understand this now and I will edit this answer later on. Thank you for your patience and help!
Edit: It's true what you say and I already have the results. However, I want to get better and improve. It completely puzzled me how the code execution time got longer when I followed Matlabs preallocation advice (I have no experience with code performance optimization), so I want to understand what makes it so slow. Give me a little time and I will try to understand and implement your proposal.
Bruno Luong
Bruno Luong le 11 Déc 2020
Modifié(e) : Bruno Luong le 11 Déc 2020
What costly is growing an array in the for-loop.
I believe delete elements of an array is less costly (but still) if the array isn't shared.
Your second method has not reduced the number of growing/shriking arrays.
You still have the upper level of file-loop to optimize (see my comment about array of struct), but I let you digest first my suggestion on building temp.
Your input and Jan's are what I needed. I have understood your use of arrayfun and the anonymous function handle (didn't know that before), it's beautiful! :-) I also reworked the outer loop to use the fields feature of structs. Is this what you had in mind?
path = 'somepath';
list = dir(path);
for b = 9:18 % bad hard code indexing, I know :>
filename = sprintf('%s%s',path,list(b).name);
tempS = load(filename);
fn = fieldnames(tempS); %
if fn{1} == "Abtastrate_1_HzZeit_1_Hz_" % removes a duplex variable that some files have
tempS = rmfield(tempS,'Abtastrate_1_HzZeit_1_Hz_');
fn = fn(2:end);
end
u(b-8) = tempS;
end
This gives me a 1x10 struct which I can now work on using indexing as you recommended. For your information:
This field concatenation alone still takes 4.97 seconds; it's really a big bunch of data. I will try to improve speed with the indexing method in a function and report later.
E: I would mark your help as "Accept this answer", but it is only as a comment. Because of this I will accept Jan's answer, yet, a big thank you to you! You really helped me.
Yes that's what I have in mind. Might be something about allocation of u you could improve, but it's not much a big deal and it's a detail you can work on later.
Now what you could do outside this loop on u to build a single group should go something like this (you need to look for comma list MATLAB syntax to understand the code):
% test data, replace with your u array of structure
u(1) = struct('a', [0;1], 'b', [2;3])
u(2) = struct('a', [4], 'b', [5;6;7])
f = fieldnames(u);
data = cellfun(@(f) vertcat(u.(f)), f, 'unif', 0);
sarg = [f,data].';
sall = struct(sarg{:})

Connectez-vous pour commenter.

 Réponse acceptée

Try to replace this part of the first solution:
for z = 1:115 % 1-Hz variables from row 1 to row 115
temp = []; % empty dummy array
for i = 1:2:length(v)
temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
end
t.(fn{z}) = vertcat(t.(fn{z}),temp); % write extracted values into the new struct to the z-th variable
end
by:
for z = 1:115 % 1-Hz variables from row 1 to row 115
lenv = numel(v);
tmpC = cell(lenv / 2); % empty dummy array
sfnz = s.(fn{z}); % Cheap shared data copy instead of repeated indexing
for k = 1:lenv / 2
idx = 2 * k - 1;
tmpC{k} = sfnz(v(idx):v(idx + 1));
end
t.(fn{z}) = vertcat(t.(fn{z}), tmpC{:}); % write extracted values into the new struct to the z-th variable
end
The iterative growing or shrinking of arrays is extremely expensive. So avoid things like this:
s.(fn{z})(i+1:i+(hz-1)) = [];
A tiny exmple:
x = [];
for k = 1:1e6
x(k) = k;
end
This does not request 8MB (8 byte per double), but sum(1:1e6)*8MB, because in each iteration a new array is created the old contents is copied. This means more than 4 TB of RAM! Of course this is slow. With a pre-allocation and without a growing array, Matlab requests the expected 8MB only: x = zeros(1, 1e6). The same effect applies for an iterative shrinking.
In Matlab 2018b vertcat has some potential for improvements. Then it might be idea to test the speed with https://www.mathworks.com/matlabcentral/fileexchange/28916-cell2vec .

1 commentaire

Hello Jan,
great example, I begin to understand! I just tested and compared your code in a copy, now I am however confused again: It is by approximately half a second slower than my "bad first solution". (5.3s vs 5.7.. s) Any idea why?
However, you made a clear point about the growing/shrinking of arrays. I will rework my code to avoid this. I have a lot to digest right now, still busy with Bruno's hint.
Thank you!

Connectez-vous pour commenter.

Plus de réponses (0)

Produits

Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by