As you've seen, gpuDevice() gives you information about your GPU. This is what I get for mine
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
AvailableMemory: 7.2955e+09
KernelExecutionTimeout: 1
The important parameter here is AvailableMemory. I have 7.2955e+09 bytes (you have rather more!). What does this mean in terms of matrix size?
A double precision number is 8 bytes so in theory I can have 7.2955e+09/8 = 911937500 doubles on the card. This is my hard, nothing I can do about it, limit. There simply isn't the capacity on my GPU to have more than that. Consider this an upper bound. In terms of a square matrix its roughly 30,000 x 30,000 since
Let's transfer a matrix that big to my GPU and see if I'm successful
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
KernelExecutionTimeout: 1
Worked! and I had 110592 bytes left over.
However, the useful limit will be rather lower than this. If I stuff my card full of data then there's no room for any GPU algorithm to do any computation. Even adding 1 to all the elements of a GPU array this big is too much. Clearly matrix addition isn't done completely in place.
Out of memory on device. To view more detail about available memory on the GPU,
use 'gpuDevice()'. If the problem persists, reset the GPU by calling
I can at least do something though. The sum command works, for example, even though the answer isn't very interesting in this case.
How much memory you need to do computations depends on the algorithms involved but hopefully you can use this thinking as a starting point for what you can expect to squeeze onto your GPU.