How do I know how large an array can fit on the GPU?

Question

0 votes

Hi

I am trying some analysis on gpu like fft() functions.

But the array is too large to calulate on my GPU(TITAN Xp).

So, I thought slicing array and put it on GPU then collecting and reshape after calculating.

But, I don't know what size is fit on my GPU.

Please how can I know the fit array size on my GPU.

thank you.

Jae-Hee Park

2 commentaires
Afficher Aucune Masquer Aucune

KSSV le 26 Août 2022

REad about gpuDevice

Jae-Hee Park le 26 Août 2022

@KSSV

My gpuDevice return like this. and then What can I do?

Name: 'NVIDIA TITAN Xp'

Index: 1

ComputeCapability: '6.1'

SupportsDouble: 1

DriverVersion: 11.7

ToolkitVersion: 11

MaxThreadsPerBlock: 1024

MaxShmemPerBlock: 49152

MaxThreadBlockSize: [1024 1024 64]

MaxGridSize: [2.1475e+09 65535 65535]

SIMDWidth: 32

TotalMemory: 1.2885e+10

AvailableMemory: 1.1665e+10

MultiprocessorCount: 30

ClockRateKHz: 1582000

ComputeMode: 'Default'

GPUOverlapsTransfers: 1

KernelExecutionTimeout: 1

CanMapHostMemory: 1

DeviceSupported: 1

DeviceAvailable: 1

DeviceSelected: 1

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Mike Croucher le 26 Août 2022

Modifié(e) : Mike Croucher le 26 Août 2022

Ouvrir dans MATLAB Online

0 votes

As you've seen, gpuDevice() gives you information about your GPU. This is what I get for mine

>> gpuDevice()
ans = 
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5894e+09
AvailableMemory: 7.2955e+09
MultiprocessorCount: 46
ClockRateKHz: 1725000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1

The important parameter here is AvailableMemory. I have 7.2955e+09 bytes (you have rather more!). What does this mean in terms of matrix size?

A double precision number is 8 bytes so in theory I can have 7.2955e+09/8 = 911937500 doubles on the card. This is my hard, nothing I can do about it, limit. There simply isn't the capacity on my GPU to have more than that. Consider this an upper bound. In terms of a square matrix its roughly 30,000 x 30,000 since

sqrt(911937500)
ans =
3.0198e+04

Let's transfer a matrix that big to my GPU and see if I'm successful

a = zeros(3.0198e+04);
>> gpuA = gpuArray(a);
>> gpuDevice()
ans = 
  CUDADevice with properties:
                      Name: 'NVIDIA GeForce RTX 3070'
                     Index: 1
         ComputeCapability: '8.6'
            SupportsDouble: 1
             DriverVersion: 11.6000
            ToolkitVersion: 11.2000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 8.5894e+09
           AvailableMemory: 110592
       MultiprocessorCount: 46
              ClockRateKHz: 1725000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
           DeviceAvailable: 1
            DeviceSelected: 1

Worked! and I had 110592 bytes left over.

However, the useful limit will be rather lower than this. If I stuff my card full of data then there's no room for any GPU algorithm to do any computation. Even adding 1 to all the elements of a GPU array this big is too much. Clearly matrix addition isn't done completely in place.

gpuA = gpuA +1;
Error using  + 
Out of memory on device. To view more detail about available memory on the GPU,
use 'gpuDevice()'. If the problem persists, reset the GPU by calling
'gpuDevice(1)'. 

I can at least do something though. The sum command works, for example, even though the answer isn't very interesting in this case.

>> sum(gpuA,'all')
ans =
     0

How much memory you need to do computations depends on the algorithms involved but hopefully you can use this thinking as a starting point for what you can expect to squeeze onto your GPU.

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Joss Knight le 1 Sep 2022

Just FYI, MATLAB won't allow in-place computation on a workspace variable because it needs to hold onto the original array in case of error (or user Ctrl-C). Computation inside a function on local variables will be more optimized.

Connectez-vous pour commenter.

How do I know how large an array can fit on the GPU?

2 commentaires
Afficher Aucune Masquer Aucune

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

How do I know how large an array can fit on the GPU?

2 commentaires Afficher Aucune Masquer Aucune

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

2 commentaires
Afficher Aucune Masquer Aucune

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens