MATLAB GPU: arrayfun with indexing

Hi
I am new to MATLAB GPU computing and have made some initial tests. Now I am looking to parallelize a the following code.
for i=1:n ;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(24,i);
a(currindices ) = a(currindices ) + A(24x24)*(b(currindices )+B(24x24)*c(currindices ));
end
In a test I parallelized this code without any of the indices by using arrayfun and it worked well. Meaning just having the following code in an function that was called by arrayfun:
for i=1:n
a=a+A*(b+B*c)
end
I wonder how to deal with the indexing of the vectors and whether arrayfun still makes sense. The matrices A and B are constant. I read that indexing is rather slow on a GPU.
What would be the best way to parallelize the above code?
Thanks for any help. This whole paralellization does not come natural to me yet.
BR

6 commentaires

Walter Roberson
Walter Roberson le 22 Oct 2017
Modifié(e) : Walter Roberson le 24 Oct 2017
? currindices appears to be unused before you assign to it.
Markus Ess
Markus Ess le 22 Oct 2017
sorry, was a mistake. indexing should happen to currindices. fixed the code in the sample
Joss Knight
Joss Knight le 24 Oct 2017
I'm not sure what language you've written your code in so it's difficult to interpret. What is A(24x24)? And if this were MATLAB code then indices(24,i) would be a scalar. But then your algebra doesn't make sense.
Markus Ess
Markus Ess le 24 Oct 2017
Modifié(e) : Walter Roberson le 24 Oct 2017
it wasn't meant to be real code. it is just to show that A is of size 24x24 and that for currindices I read 24 values. so currindices is currindices(:,i) in MATLAB code and the multiplication with A and B is simply that.
for i=1:n %;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(:,i);
a(currindices ) = a(currindices ) + A*(b(currindices )+B*c(currindices ));
end
well, one of the things I learnt anyway is that I have to use pagefun. the problem is still indexing.
however the main feeling i have is that anyway I have to rewrite the math for an optimal parallelization.
I don't think you need pagefun. Can't you just do this with indexing and matrix multiplication? It seems indices is the correct shape, namely 24-by-n. So b(indices) and c(indices) return 24-by-n, the multiplies return 24-by-n, and the addition works.
a(indices) = a(indices) + A * (b(indices) + B * c(indices));
If the indices repeat this may not work as you intended, because some elements of a will get one of the answers and not another. You might have to use accumarray in that case.
result = a(indices) + A * (b(indices) + B * c(indices));
a = accumarray(result, indices(:), size(a));
Markus Ess
Markus Ess le 31 Oct 2017
got it. at least on CPU the multiplication is 10 times faster than the for loop. anyway I know need to rewrite the code and see how that could work on a GPU.
thanks!

Connectez-vous pour commenter.

Réponses (0)

Catégories

Commenté :

le 31 Oct 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by