BLAS or LAPACK in CUDA kernel

Hi, I need to do x=A\b several hundred million times, along with other trivial arithmetic, for an A that is 4x4 and dense. I was thinking about writing a little CUDA kernel that would get called within MATLAB to do this, but I don't know how I would call something like DGETRS or SGETRS within a thread. CUBLAS, MAGMA, and things of that kind seem to parallelize this operation for a single, massive A, but I don't how they would help me. Is this possible?
Thanks!

 Réponse acceptée

James Lebak
James Lebak le 5 Juin 2013

0 votes

You're correct that you can't call DGETRS and SGETRS directly, as that's CPU-side code. The CUDA 5 version of CUBLAS has a batched LU factorization API, and an ability to call BLAS routines on the device, either of which might be helpful. You can call CUBLAS CPU-side routines from a GPU MEX file in R2013a.

1 commentaire

Rodrigo
Rodrigo le 5 Juin 2013
thanks. I guess I'll have to get the new version of matlab from my department and play around with the new CUDA toolbox.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by