CUDA number of tasks exceed number of threads times blocks

2 vues (au cours des 30 derniers jours)

Robert le 23 Jan 2013

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/59638-cuda-number-of-tasks-exceed-number-of-threads-times-blocks

I have a problem if my number of tasks exceed the number of total available threads. Lets images I want to add tow vectors of length 100 000.

Matlab Code:

N=100*1000

a=double(-[1:N]);    
b=double(2*[1:N]);
a_gpu=gpuArray(a);%Create array on GPU
b_gpu=gpuArray(b);%Create array on GPU
c_gpu=gpuArray(zeros(1,N));%Create array on GPU
k = parallel.gpu.CUDAKernel('add.ptx', 'add.cu');
k.ThreadBlockSize = 100;
k.GridSize=[100,1];
o = feval(k, a_gpu,b_gpu,c_gpu);

I know that I could increase the Threadblocksize and GridSize, but this is not I want to now. Imagine my vector would be much longer..

My Cuda code looks like this

__global__ void add( double *a, double *b, double *c) { 
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    a[tid] = a[tid] + b[tid];
    tid += blockDim.x * gridDim.x;
}

In the last line I try to force the program to really go to the end of my make, by using the same threads a second, third... time. That's what I read in the book "Cuda by Example".

But for some reason using Matlab it is not working. If I use this only using C and CUDA it works.

What is wrong with my code? What is the usual way to avoid if the number of tasks are larger than the MaxThreadSize time size Gridsize? I could use the other dimension too, but still how to avoid this problem?

Thanks a lot

Robert