gpuDevice command very slow
Afficher commentaires plus anciens
I am running CUDA kernels using the parallel computing toolbox and r2012a. Recently upgraded to a 600 series (Kepler) gpu. To setup the CUDA kernel we extract the maximum threads per block using: gpu_han=gpuDevice(1); k = parallel.gpu.CUDAKernel('gpu_tfm_linear_arb.ptx', gpu_tfm_linear_arb.cu'); k.ThreadBlockSize = gpu_han.MaxThreadsPerBlock;
This is now executing very slowly (order 2mins). If I specify the threadblocksize manually to the max of the card (1024 in this case), it executes in 0.1 s.
This used to run quickly with a 400 series card. Any help gratefully received
Réponse acceptée
Plus de réponses (2)
Andrei Pokrovsky
le 15 Sep 2016
Modifié(e) : Andrei Pokrovsky
le 15 Sep 2016
3 votes
Try setting these env vars:
export CUDA_CACHE_MAXSIZE=2147483647
export CUDA_CACHE_DISABLE=0
This cured the problem on my GTX1080.
https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-understand-fat-binaries-jit-caching/
Anthony
le 17 Juin 2013
0 votes
2 commentaires
Edric Ellis
le 18 Juin 2013
The cache is not stored where the program lives, this page from NVIDIA has all the gory details, including this:
- on Windows, %APPDATA%\NVIDIA\ComputeCache,
- on MacOS, $HOME/Library/Application\ Support/NVIDIA/ComputeCache,
- on Linux, ~/.nv/ComputeCache
Anthony
le 12 Juil 2013
Catégories
En savoir plus sur GPU Computing dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!