CPU-gpuArray transfer speeds on modern GPUs

For a long time, the conventional wisdom has been that, when computing on the GPU, one should try to minimize CPU-GPU transfers, since these transfers present a significant bottleneck. I have a comparatively old GPU (GTX 1080 Ti), where I see that the absolute transfer time of a typical 3D image is significant, though not exactly terrible.
>> A=rand(500,500,500); tic; B=gather(gpuArray(A)); r=B(1)+1; toc
Elapsed time is 0.636720 seconds.
I am wondering if this problem is still a prominent one on newer GPUs and if it might diminsh to a non-issue as GPU technology progresses. Basically, my question is a request to people with newer GPUs to run my benchmark test above and report the results.

7 commentaires

Here are a couple of results I ran on some HPC clusters (so you need to factor in other potential system "noise")
  • V100-32GB
Elapsed time is 0.682715 seconds. % When run a 2nd time 0.549020 seconds
  • P100-16GB
Elapsed time is 1.504278 seconds. % When run a 2nd time 0.580932 seconds
Here are a couple run from Cloud Center on AWS
  • A10G-24GB
Elapsed time is 1.132200 seconds. % When run a 2nd time 0.482009 seconds
  • M60-8GB
Elapsed time is 1.747954 seconds. % When run a 2nd time 1.133121 seconds
Matt J
Matt J le 29 Août 2023
Thanks @Raymond Norris, although the fact that they are remote clusters seems to make matters worse.
Raymond Norris
Raymond Norris le 29 Août 2023
They are remote, but for the HPC systems, I ran MATLAB on a compute node and for AWS on the VM.
Joss Knight
Joss Knight le 30 Août 2023
The data transfer time is commonly not the issue; usually it is the cost of GPU synchronisation. Every time you gather your data (which occurs automatically to display or plot so isn't always explicit) you force the computation of the gathered variable to complete. In some workflows heavily dependent on asynchronous execution this can seriously damage performance.
Bruno Luong
Bruno Luong le 30 Août 2023
Is the memory type or bus matters somehow or mainly the sync cost?
Matt J
Matt J le 30 Août 2023
The data transfer time is commonly not the issue; usually it is the cost of GPU synchronisation.... In some workflows heavily dependent on asynchronous execution this can seriously damage performance.
It's hard to imagine, though, how my benchmark test depends on asynchronous execution. All it does is transfer data.
Bruno Luong
Bruno Luong le 30 Août 2023
Nvidia GF RTX 3060 Laptop
  • Elapsed time is 0.464324 seconds.

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Parallel Computing dans Centre d'aide et File Exchange

Produits

Version

R2021a

Question posée :

le 29 Août 2023

Commenté :

le 30 Août 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by