Applying vectorization techniques to speedup the performance of dividing a 3D matrix by a 2D matrix
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Matthew Kehoe
le 29 Juil 2021
Commenté : Matthew Kehoe
le 31 Juil 2021
I'm working on removing a for loop in my Matlab code to improve performance. My original code has one for loop (from j=1:Nx) that is harmful to performance (in my production code, this for loop is processed over 20 million times if I test large simulations). I am curious if I can remove this for loop through vectorization, repmat, or a similar technique. My original Matlab implementation is given below.
clc; clear all;
% Test Data
% I'm trying to remove the for loop for j in the code below
N = 10;
M = 10;
Nx = 32; % Ny=Nx=Nz
Nz = 32;
Ny = 32;
Fnmhat = rand(Nx,Nz+1);
Jnmhat = rand(Nx,1);
xi_n_m_hat = rand(Nx,N+1,M+1);
Uhat = zeros(Nx,Nz+1);
Uhat_2 = zeros(Nx,Nz+1);
identy = eye(Ny+1,Ny+1);
p = rand(Nx,1);
gammap = rand(Nx,1);
D = rand(Nx+1,Ny+1);
D2 = rand(Nx+1,Ny+1);
D_start = D(1,:);
D_end = D(end,:);
gamma = 1.5;
alpha = 0; % this could be non-zero
ntests = 100;
% Original Code (Partially vectorized)
tic
for n=0:N
for m=0:M
b = Fnmhat.';
alphaalpha = 1.0;
betabeta = 0.0; % this could be non-zero
gammagamma = gamma*gamma - p.^2 - 2*alpha.*p; % size (Nx,1)
d_min = 1.0;
n_min = 0.0; % this could be non-zero
r_min = xi_n_m_hat(:,n+1,m+1);
d_max = -1i.*gammap;
n_max = 1.0;
r_max = Jnmhat;
A = alphaalpha*D2 + betabeta*D + permute(gammagamma,[3,2,1]).*identy;
A(end,:,:) = repmat(n_min*D_end,[1,1,Nx]);
b(end,:) = r_min;
A(end,end,:) = A(end,end,:) + d_min;
A(1,:,:) = repmat(n_max*D_start,[1,1,Nx]);
A(1,1,:) = A(1,1,:) + permute(d_max,[2,3,1]);
b(1,:) = r_max;
% Non-vectorized code - can this part be vectorized?
for j=1:Nx
utilde = linsolve(A(:,:,j),b(:,j)); % A\b
Uhat(j,:) = utilde.';
end
end
end
toc
Here is my attempt at vectorizing the code (and removing the for loop for j).
% Same test data as original code
% New Code (completely vectorized but incorrect)
tic
for n=0:N
for m=0:M
b = Fnmhat.';
alphaalpha = 1.0;
betabeta = 0.0; % this could be non-zero
gammagamma = gamma*gamma - p.^2 - 2*alpha.*p; % size (Nx,1)
d_min = 1.0;
n_min = 0.0; % this could be non-zero
r_min = xi_n_m_hat(:,n+1,m+1);
d_max = -1i.*gammap;
n_max = 1.0;
r_max = Jnmhat;
A2 = alphaalpha*D2 + betabeta*D + permute(gammagamma,[3,2,1]).*identy;
A2(end,:,:) = repmat(n_min*D_end,[1,1,Nx]);
b(end,:) = r_min;
A2(end,end,:) = A2(end,end,:) + d_min;
A2(1,:,:) = repmat(n_max*D_start,[1,1,Nx]);
A2(1,1,:) = A2(1,1,:) + permute(d_max,[2,3,1]);
b(1,:) = r_max;
% Non-vectorized code - can this part be vectorized?
%for j=1:Nx
% utilde_2 = linsolve(A2(:,:,j),b(:,j)); % A2\b
% Uhat_2(j,:) = utilde_2.';
%end
% My attempt - this doesn't work since I don't loop through the index j
% in repmat
utilde_2 = squeeze(repmat(linsolve(A2(:,:,Nx),b(:,Nx)),[1,1,Nx]));
utilde_2 = utilde_2(:,1);
Uhat_2 = squeeze(repmat(utilde_2',[1,1,Nx]));
Uhat_2 = Uhat_2';
end
end
toc
diff = norm(Uhat - Uhat_2,inf); % is 0 if correct
I'm curious if repmat (or a different builtin Matlab function) can speed up this part of the code:
for j=1:Nx
utilde = linsolve(A(:,:,j),b(:,j)); % A\b
Uhat(j,:) = utilde.';
end
Is the for loop for j absolutely necessary or can it be removed?
Réponse acceptée
Bruno Luong
le 29 Juil 2021
If you have C compilers the fatest methods are perhaps mmx and MultipleQR avaikable on FEX
9 commentaires
Bruno Luong
le 30 Juil 2021
MMX and MultipleQRSolve make parallel loop on page, MATLAB for-loop make parallize on the algorithm of linsolve.
That's why the matrix size matters, and possibly the number of the physical processor cores.
MultipleQRSolve is less efficient for larg matrix because my implementation of QR is less efficient than the Lapack function called by MMX and MATLAB native for-loop.
Plus de réponses (2)
Matt J
le 29 Juil 2021
Modifié(e) : Matt J
le 29 Juil 2021
Another idea.
clc; clear all;
% Test Data
% I'm trying to remove the for loop for j in the code below
N = 10;
M = 10;
Nx = 32; % Ny=Nx=Nz
Nz = 32;
Ny = 32;
AA=kron(speye(Nx),ones(Nx+1));
map=logical(AA);
% Original Code (Partially vectorized)
tic
for n=0:N
for m=0:M
....
%Vectorized code
AA(map)=A(:);
Uhat=reshape(AA\b(:),Nx+1,Nx).';
end
end
toc
5 commentaires
Bruno Luong
le 29 Juil 2021
AA(map)=A2(:)
Well, I know someone who is wonder why (:) can be much slower than reshape. ;-)
Matt J
le 29 Juil 2021
Yeah, I didn't see that A was complex-valued. So,
AA(map)=rehape(A,[],1);
Uhat=reshape(AA\b(:),Nx+1,Nx).';
Voir également
Catégories
En savoir plus sur Particle & Nuclear Physics dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!