Is it possible to use cuRAND with feval (Parallel computing toolbox)?
Afficher commentaires plus anciens
Hi,
I am trying to call feval instruction (Parallel Computing toolbox) with a kernel which uses the cuRAND library (<http://developer.nvidia.com/curand)>, and I need to pass to feval an argument of type curandState (needed to initialize random generators in cuRAND).
I have something similar to:
K=parallel.gpu.CUDAKernel('kernel.ptx','kernel.cu');
[arg_out]=feval(K,arg_in, state);
"state" must be a curandState variable.
I tried cheating MATLAB with:
[arg_out]=feval(K,arg_in, 1);
But I got the following error message:
_Error using iParseToken (line 259) Unsupported type in argument specification "curandState * state".
Error in C:\Program Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\handleKernelArgs.p>iParseCPrototype (line 181)
Error in C:\Program Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\handleKernelArgs.p>handleKernelArgs (line 70)_
I have not found any information in google. Could anyone please help me?
Thank you in advance.
María.
Réponse acceptée
Plus de réponses (2)
Edric Ellis
le 1 Fév 2012
For what it's worth, I have some example CUDA code and MATLAB driving code to show how one might use CURAND. First off, here's the CUDA code:
#include <curand_kernel.h>
const size_t stateSize = sizeof( curandState );
__device__ void copyState( void * out, void const * in ) {
unsigned char * outc = static_cast< unsigned char * >( out );
unsigned char const * inc = static_cast< unsigned char const * >( in );
for ( int i = 0; i < stateSize; ++ i ) {
outc[i] = inc[i];
}
}
__global__ void returnStateSize( unsigned int * value ) {
value[0] = stateSize;
}
__global__ void initState( unsigned char * stateArray ) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
curandState state;
curand_init( 1234, idx, 0, &state );
copyState( stateArray + idx * stateSize, &state );
}
__global__ void generate( double * x, unsigned char * stateArray ) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
curandState state;
copyState( &state, stateArray + idx * stateSize );
x[idx] = curand_uniform_double( &state );
copyState( stateArray + idx * stateSize, &state );
}
And here's some MATLAB code which uses that:
import parallel.gpu.GPUArray;
% Get the number of bytes per thread of state.
stateSizeK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'returnStateSize' );
stateSz = double( gather( feval( stateSizeK, zeros( 'uint32' ) ) ) );
% Set up the random state
initK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'initState' );
initK.ThreadBlockSize = 256;
initK.GridSize = 10;
randState = feval( initK, GPUArray.zeros( stateSz, 256*10, 'uint8' ) );
genK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'generate' );
genK.ThreadBlockSize = 256;
genK.GridSize = 10;
% Generate some random numbers
[rand1, randState] = feval( genK, GPUArray.zeros(1, 256*10), randState );
1 commentaire
María
le 1 Fév 2012
María
le 31 Jan 2012
0 votes
Catégories
En savoir plus sur Code Performance dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!