very simple cuda ptx code memory speed
Afficher commentaires plus anciens
Hello, i made a simple cuda kernel to measure global memory transfer speed to the cuda processors:
__global__ void SR2add(float* dataout,const float* datain,int size) {
int mindex=blockIdx.x*blockDim.x+threadIdx.x;
if (mindex>=size)
return;
dataout[mindex]=datain[mindex];
}
The matlab function i wrote for it:
function GPU_MemBandTest()
import parallel.gpu.GPUArray
xsize=1024;
ysize=768;
vectorsize=xsize*ysize;
threadpblock=1024;
k=parallel.gpu.CUDAKernel('MemBandTest.ptx', 'MemBandTest.cu');
k.ThreadBlockSize=[threadpblock,1,1];
k.GridSize=[ceil(vectorsize/threadpblock),1];
ddatain=parallel.gpu.GPUArray.zeros(vectorsize,1,'single');
dataout=rand(vectorsize,1,'single');
ddataout=GPUArray(dataout);
tic
for i=1:1000
[ddataout]=feval(k,ddataout,ddatain,vectorsize);
end
time=toc;
disp(['ms time= ' num2str(time)])
disp([num2str(vectorsize*4/(time*10^6)) 'GB/s'])
end
I got ms time= 0.73629 and 4.2724GB/s result for that. I would like to ask: 1; that am i doing correctly the measurement? 2; Is there anything i can do to speed up this simple code or this is an expectable result for this kernel in matlab?
I have MATLAB R2011a, CUDA Toolkit 3.2, gt425m device, newest driver installed for it
If I use float* datain instead of const float* datain, the execution time goes up to 2.4ms
3; What could be the explanation of this?
Thanks for anyone who helps,
Gaszton
Réponse acceptée
Plus de réponses (0)
Catégories
En savoir plus sur GPU Computing dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!