can someone suggest to make this code faster

4 views (last 30 days)
Can someone suggest me a trick to make this code faster.
"all_DOY" = 1000*15000*23 ; %(i,j,k) such that k dimension is the day of the year whic is random, while (i,j) are my pixel location starting from (1,1) to (1000,15000). The values of day of the year (DOY) in the kth dimension rranges range from -1 to 366
"all_variable" = 1000*15000*23; This is my variable value for (i,j,k) for the (i,j) position and k dimension i.e., day of the year in the "all_DOY" matrix. The values in the kth dimension range from -3000 to 15000
all_variable_new = 1000*15000*366
I'm trying to update the "all_variable_new" such that, my k dimension is from 1:366 (days of the year). for each day of the year obtained from "all_DOY" matrix, the variable value is updated in the "all_variable_new" for that respective (i,J,k) from "all_variable"
all_variable_new=zeros(1000,15000,366,'int16');
all_variable = load('all_variable.mat');
all_DOY = load('all_DOY.mat');
tic
for ii =1: size(all_DOY,1) % for each ii
for jj = 1:size(all_DOY,2) % for each jj
idx = squeeze(all_DOY(ii,jj,:)); % sequeez k'th dimension. idk is the DOY which is also the position value in all_variable_new
variable = squeeze(all_variable(ii,jj,:)); % obtain the variable values
variable(idx==-1)=[]; % if -1, remove
idx(idx==-1)=[]; % if -1 then remove the values
Ind=idx;
if isempty(Ind)
all_variable_new(ii,jj,:) = NaN; % if there are no idx values then replace with NAN
else
for mm = Ind
all_variable_new(ii,jj,mm) = variable; % for the day of the year values obtained, replace those positions with variable values.
end
end
clear idx Ind mm
end
end
toc
  4 Comments
nlm
nlm on 10 Mar 2020
  • What do the inputs represent?
  • It looks like you have 3D variables, what does each dimension represent? and if within one dimension, the rows/columns/pages represent different things, what are these?
"all_DOY" : is a 3D matrix of size 1000*15000*23 where (i,j) represent pixel position, while k represent day of the year (DOY). the values of DOY randomly varies in the range of -1 to 366.
My variable is the temperature data.
"all_variable": is a 3D matrix (i,j,k), where (i,j) represent pixel position, while k represent temperature. Temperature ranges from -3000 to 10000.
"all_variable_new": is a 3D matrix of size 1000*15000*366 and I want to make a 3D matrix (i,j,), where (i,j) represent pixel position, and store the temperature values sequentially according to the DOY values obatined from "all_DOY" .
  • Why is all_variable of class int16?
to reduce the memory of the variable.
  • What is the purpose of the code? What does it try to achieve?
I want to make a "all_variable_new", such that the temperature variable is arranged according to the DOY.

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 10 Mar 2020
Edited: Guillaume on 10 Mar 2020
If I understood correctly:
pixels_dayofyear = load('all_doy.mat');
pixels_temperature = load('all_variable.mat');
assert(isequal(size(pixels_dayofyear), size(pixels_temperature)), 'Size of matrices doesn''t match!');
%sort each pixel (row, column) temperature (pages) according to the corresponding day of year.
%temperature for invalid days of year (-1) are all replaced by 0 at the end of the page.
pixels_dayofyear(pixels_dayofyear == -1) = NaN; %replace by a value that is sorted last. Could be Inf instead of NaN
[~, order] = sort(pixels_dayofyear, 3); %sort the day of year of each pixel. Get the new order
[rows, cols, ~] = ndgrid(1:size(pixels_dayofyear, 1), 1:size(pixels_dayofyear, 2), 1:size(pixels_dayofyear, 3)); %create row and column indices replicated along the correct dimensions for sub2ind
newpixels_temperature = pixels_temperature(sub2ind(size(pixels_temperature), rows, cols, order)); %reorder temperature according to the sorting of the days
newpixels_temperature(isnan(pixels_dayofyear)) = 0; %replace invalid values by 0
  26 Comments
Guillaume
Guillaume on 13 Mar 2020
Edited: Guillaume on 13 Mar 2020
"While interpolating the data, it is required to have all the data even if it is same DOY, instead of mean." I'm not saying it doesn't exist, but I'm not aware of an interpolation method that can cope with two different values at the same point.
"dividing it into 10 rows, makes it hard to keep count of data blocks" I don't see why it would be hard. It's certainly an easy way to get rid of memory issue. At the same time, it also makes it possible to parallelise the processing (but beware of memory usage!) since the processing of the blocks is completely independent of each other. On a supercomputer, I assume that would be a better use of the resources. So, yes, Instead of splitting by year, I'd split by satellite coordinates and process blocks of size MxNx(10*23) -> MxNx366 where M and N are such that the data fits comfortably in memory.
Addendum: Assuming than M and N are reasonable so that the below fits in memory, then this is how I'd implement the transformation from the MxNx(10*23) to MxNx366 matrix. This directly interpolates the MxNx(10*23) data without ever creating the matrix of 0s:
%inputs:
%pixels_dayofyear, a M x N x day matrix indicating the day of measurement of the respective location in pixels_temperature
%pixels_temperature, a M x N x day matrix indicating the temperature on the respective day in pixels_dayofyear
[M, N, days] = size(pixels_dayofyear);
[inrows, incols, ~] = ndgrid(1:M, 1:N, 1:days); %create matrices of locations for each temperature
isvalid = pixels_dayofyear > -1
temperature_interpolant = scatteredinterpolant(M(isvalid), N(isvalid), pixels_dayofyear(isvalid), pixels_temperature(isvalid)); %create interpolant using locations and days as coordinates
%the line below creates 3 double matrices of size M x N x 366. They will use a lot of memory for large M and N!
[qrows, qcols, qday] = ndgrid(1:M, 1:N, 1:366); %matrices of all query locations and days
interpolated_temperature = reshape(temperature_interpolant(qrows, qcols, qday), M, N, 366); %interpolate at all query points and reshape into 3D matrix
Beware that qrows, qcols and qday will require M*N*366*8 bytes of memory. ScatteredInterpolant only accepts doubles so there's no way to save memory there by using a smaller type. However, these 3 arrays can be used unchanged for all the blocks (assuming they're all the same size).

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by