Using GPU on multiple nested loops

The following code is slow for large ncnt (typically >2000)
and I want to use my GPU for the outermost (iplane) loop.
Can you give me an hint? (I have an NVIDIA RTX 8000.)
nxs=101;
nys=101;
nzs=101;
ncnt=100;
xnmin=-1.0;
xnmax= 1.0;
ynmin=-1.0;
ynmax= 1.0;
znmin=-1.0;
znmax= 1.0;
coefftmp=complex(rand(1,ncnt));
igalltmp=rand(3,ncnt);
vktmp=rand(1,3);
wfrtmp=complex(zeros(nxs,nys,nzs));
tic
for iplane=1:ncnt % GPU loop
ee=exp(2.*pi*complex(0.,1.));
vkg=vktmp+double(igalltmp(:,iplane)');
ekx=ee^vkg(1);
eky=ee^vkg(2);
ekz=ee^vkg(3);
coefft=coefftmp(iplane);
for iz=1:nzs
z=znmin+(znmax-znmin)*double(iz-1)/double(nzs-1);
ekzz=ekz^z;
for iy=1:nys
y=ynmin+(ynmax-ynmin)*double(iy-1)/double(nys-1);
ekyy=eky^y;
for ix=1:nxs
x=xnmin+(xnmax-xnmin)*double(ix-1)/double(nxs-1);
ekxx=ekx^x;
wfrtmp(ix,iy,iz)=wfrtmp(ix,iy,iz)+coefft*ekxx*ekyy*ekzz;
end
end
end
end
wfr(ispin,:,:,:)=wfrtmp/sqrt(Vcell);
toc

4 commentaires

Walter Roberson
Walter Roberson le 26 Août 2019
It is not clear what the question is?
Jhinhwan Lee
Jhinhwan Lee le 26 Août 2019
Sorry. It's updated now.
If you want to use GPU, you are going to have to rewrite your code to be vectorized.
I suggest you consider
zvec = linspace(znmin, znmax, nzs);
yvec = linspace(ynmin, ynmax, nys);
xvec = linspace(xnmin, xnmax, nxs);
[X, Y, Z] = ndgrid(xvec, yvec, zvec);
before any looping. After that you can do things like
coeff .* ekz.^Z .* eky.^Y .* ekx.^X
Thanks! I did something basically the same and it is more than ten times faster now.
I also found including the X, Y and Z in the argument of the exp function slightly better: In the cases of kx=1, 2, 3, ... ekx=exp(2.*pi*complex(0.,1.)*kx)=1 and ekx^x==1 no matter what x is, while exp(2.*pi*complex(0.,1.)*kx*x) depends on x (unless k=0) as expected.
coeff*exp(2.*pi*complex(0.,1.)*(kx*X+ky*Y+kz*Z))

Connectez-vous pour commenter.

Réponses (1)

Raunak Gupta
Raunak Gupta le 30 Août 2019

0 votes

Hi,
For speeding up the code you need to first vectorize the three loops inside the main loop as they are independent of each other. As mentioned in the comments you can use linspace and ndgrid for doing exponentiation for all three variables independently.
The above part only vectorizes the code but to actually use GPU you can create the initial arrays using gpuArray. This may also significantly fasten up the code. The function that you have used inside the code is supported for gpuArray but if you want to use any specific function you can check about all the supported function here.

Catégories

En savoir plus sur Loops and Conditional Statements dans Centre d'aide et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by