Any suggestions for upgrading desktop GPU for doing CUDA computing?

Hi,
We are currently running Matlab on a regular office desktop workstation which has a Nvidia Quadro P4000, that we regularly do GPU computing with. We are now looking to add a second workstation, but since the P4000 is almost 6 years old now, I am looking for any advice on what model would be a good (Matlab-compatible) GPU to buy for this type of general purpose use these days?
Our current GPU has 1792 CUDA cores and 8GB RAM, so would ideally more than this on the new one. Cost should be less than around $2500 if possible.
I have read about Tesla cards, but these seem to be more for server racks, and also some cards also seem to have additional cooling requirements (our current card does not have additional cooling as far as I'm aware). So I would like a little advice before purchasing.
(the Nvidia T1000 looks similar, but number of CUDA cores seems less than the P4000. The RTX series, e.g. A2000, has a lot of cores, but I'm not sure if it's complicated to install in a desktop machine).
Thank you

9 commentaires

One option to consider would be the Nvidia Quadro RTX 6000. It has 4608 CUDA cores and 24GB of RAM, which is more than the P4000, and has support for Matlab's GPU computing capabilities. Another option to consider would be the Nvidia GeForce RTX 3080, which has 8704 CUDA cores and 10GB of RAM. The GeForce RTX series is a high-end gaming graphics card, but can also be used for general purpose GPU computing.
In terms of cost, both the Quadro RTX 6000 and the GeForce RTX 3080 are above $2500, but the GeForce RTX 3080 is currently in high demand and may be difficult to find at a reasonable price.
It's always recommended to check the system requirements and recommended graphics card specifications for Matlab before purchasing to ensure compatibility and optimal performance.
Walter Roberson
Walter Roberson le 2 Fév 2023
Modifié(e) : Walter Roberson le 2 Fév 2023
The Quadro P4000 has 1:32 single:double precision performance -- it is not designed for efficient double precision. We could guess that means you are not concerned about double precision, but it is also possible that you simply didn't pay attention to that factor, or did not know that more efficient cards are possible. How much double precision GPU work do you expect to do?
Are you doing Deep Learning work? If I understand correctly, the RTX series have advanced chips (but I do not know if MATLAB is able to take advantage of them.)
Hi Walter, thanks for your reply.
You are correct - yes, we are concerned with double precision, and yes we didnt' pay attention to how efficient the particular card is at single vs double precision. We are not doing deep learning - typically our work includes manipulating large matrices (complex doubles, around at least 6 GB memory size to store the matrix on the GPU), and doing operations such as Fourier transforms, point-wise multiplication, etc.
What does it mean exactly "1:32 single:double precision performance" ? And what should we be on the lookout for to know if it's having an effect?
I was looking at an upgrade of the P4000, such as the P6000 for instance, but now I'm hesitant to go for that in light of your single/double comment.
Thanks
@Walter Roberson actually I think I found that you answered this before. It relates to the speed of calculations, if I understand right (as opposed to loss of precision / inaccuracy). In light of this, could you suggest a model that would be a good compromise?
1:32 in this context means that single precision is 32 times faster than double precision -- or to phrase it the other way, double precision is 1/32 times as fast as the single precision rates that is normally what you find quoted.
Your current Quadro P4000 has a double precision rate of 165.6
Some of the other models have hardware assist for double precision.
Quadro GP100 (2016) -- double precision 5168 (half rate of single precision)
Quadro GV100 (2018) -- double precision 7400 (half rate of single precision)
Quadro RTX 3000, 4000, 5000 (2019) -- over 10000 (18980 for 5000)
Tesla P100 and V100 accelerator cards -- 5000 to 7000 -- but these are add-ons to Tesla-class GPUs
A100 accelerator card -- 9700 -- add-on card for Axxx workstation
H100 accelerator card -- over 25000 -- add on card
Some of the items simply do not have double precision listed
So for example if you got the Quadro RTX 5000 then then it would be over 100 times faster for double precision compared to the card you have.
But by the same token if you got a used GP100 (2016, actually older than your current card) then the double precision would be almost 32 times faster already -- and that means that if you are willing to go for used devices you might be able to satisfy your budget and be much faster than you have now.
Thomas Barrett
Thomas Barrett le 2 Fév 2023
Modifié(e) : Thomas Barrett le 2 Fév 2023
That's helpful @Walter Roberson thank you. I wasn't aware of such a specification on these cards.
I'm confused on one thing though - just to make sure I understand how to compare the specifications:
For double precision, the link you sent shows the following:
  • 165.6 Gflops for my current card Quadro P4000 (2017)
  • 359.0 Gflops for the Quadro 5000 (2011)
  • 277.3 Gflops for the Quadro P5000 (2016)
  • 348.5 Gflops for the Quadro RTX 5000 (2018)
  • 296.6 Gflops for the Quadro RTX 5000 (2019)
So these 5000 models are around twice as fast as my current card. But you say "Quadro RTX 5000 would be over 100 times faster". Am I missing something?
Cheers
Ah, I accidentally looked at Half Precision rates for those Quadro RTX !!!
So that would seem to leave the Quadro GV100 as the fastest for double precision by a fair bit. Unfortunately prices appear to be in the range of $US5000. The GP100, not as fast but still far better than the others that are listed, appears to be going for less than $US3000.
@Walter Roberson that's great, thanks. At least I wasn't misunderstanding something.
If I compare the Quadro GP100 (2016) to, for example, a current model like the RTX A4000 (2021), something in the specs confuses me:
  • The more modern RTX A4000 has 6144 CUDA cores and a double precision rate of 599 Gflops.
  • The older GP100 has fewer CUDA cores at 3584, but a much faster double precision rate of 5168 Gflops.
Before we started this discussion, I assumed that more CUDA cores would be better for our calculations, but now I'm not sure if it's an important spec to look at, and I should be focusing on the double precision rate?
Thanks for your patience

Connectez-vous pour commenter.

Réponses (1)

Walter Roberson
Walter Roberson le 3 Fév 2023
Déplacé(e) : Edric Ellis le 11 Août 2025
NVIDIA implements double precision in four different ways.
  • most of their systems implement double precision in software by using the internal single-precision cores to emulate double precision. It takes 32 single precision instructions to emulate a double precision instruction, so the double precision rate is 1/32 of the single precision rate on those machines
  • Some of their older boards (I think it was some of the earlier Quadro, not sure anymore) had a hardware assist for double precision that permitted double precision to execute a 1/24 of the single precision rate
  • Some of the boards such as the GP100 have a hardware assist that permits double precision as 1/8 of the single precision rate
  • Recently, double precision has been implemented in TensorFlow cores; I do not know anything about the implementation of that. Those are for high-end systems, such as the H100 accelerator card for their workstations.
The list price of the RTX 3090 is pretty much the same as your budget; some places are still charging a premium for it, but other places are apparently currently discounting it (because the RTX 4090 is out.) Depending on the exact model, the RTX 3090 double precision rate is roughly 500 gigaflops (the wikipedia list columns show teraflops for that series, which is why it looks slower at first glance.) But that is still roughly 10 times slower than the GP100. It is roughly 3 times faster than what you have now -- but the GP100 (roughly twice the price, and so out of your original budget) is more than 30 times as fast as your current system. Unless, that is, I am badly misreading the tables.

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by