Using a templated CUDA kernel via MATLAB

2 vues (au cours des 30 derniers jours)
Alexandros Iliopoulos
Alexandros Iliopoulos le 23 Jan 2013
Hello,
Is it possible to use a C++-style templated CUDA kernel via MATLAB's GPU Computing interface?
For example, consider the following (useless) toy code:
template<typename T>
__global__ void get_nans(T*, const int*);
template<>
__global__ void get_nans<double>(double* out, const int* dims)
{
const int tx = blockIdx.x*blockDim.x + threadIdx.x;
const int ty = blockIdx.y*blockDim.y + threadIdx.y;
if ((tx < dims[1]) && (ty < dims[0]))
out[tx*dims[0] + ty] = nan(0);
}
template<>
__global__ void get_nans<float>(float* out, const int* dims)
{
const int tx = blockIdx.x*blockDim.x + threadIdx.x;
const int ty = blockIdx.y*blockDim.y + threadIdx.y;
if ((tx < dims[1]) && (ty < dims[0]))
out[tx*dims[0] + ty] = nanf(0);
}
I then compile this into PTX code, but when I try to instantiate the kernel object in MATLAB I get the following error:
>> k = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu' );
Error using handleKernelArgs (line 61)
Found multiple matching entries in the PTX code. Matches found:
_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_
_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_
Thank you,
Alex

Réponse acceptée

Narfi
Narfi le 29 Jan 2013
Alex,
Allow me to quote from the help for parallel.gpu.CUDAKernel and try to parse it for you. If you have suggestions for how to improve the wording, please let me know! "If specified, FUNC must be a string that unambiguously defines the appropriate kernel entry name in the PTX file. If FUNC is omitted, the PTX file must contain only a single entry point"
In your case, the get_nans.cu defines two global functions:
  1. get_nans<double>
  2. get_nans<float>
and the get_nans.ptx defines the corresponding two entry points:
  1. Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5 (For the double function)
  2. Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5 (For the float function)
When you create the parallel.gpu.CUDAKernel, it is ambiguous whether you want to invoke the double or the float function. Therefore, you must provide the name of the entry point you want to use and construct either the double or the float version:
kDouble = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu', '_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_'); kFloat = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu', '_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_');
Now, this almost works, but not quite because the parser in parallel.gpu.CUDAKernel cannot parse the template function definition. Therefore, we stop using this way of constructing the CUDAKernel:
KERN = parallel.gpu.CUDAKernel(PTXFILE, CUFILE, FUNC)
and use this one instead:
KERN = parallel.gpu.CUDAKernel(PTXFILE, CPROTO, FUNC)
We then end up with:
kDouble = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'double* out, const int* dims', '_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_');
kFloat = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'float* out, const int * dims', '_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_');
Does this make sense?
Best,
Narfi

Plus de réponses (0)

Catégories

En savoir plus sur GPU Computing dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by