Max size for efficient vectorization?

Question

Wouter le 20 Mar 2019

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/451210-max-size-for-efficient-vectorization

Commenté : Walter Roberson le 20 Mar 2019

As far as I understand, vectorization can highly optimize code, because of its parallelization properties.

As you have for example n separate samples to treat, a classical for loop would imply that the computational time grows linealy:

.

By vectorizing, t remains virtually constant as function of n, as all n samples fit together in the shift register of the CPU.

However, this only works as long as

for some

that determines the maximum amount of floats to enter the register simultaneously. For

, computation will be sequential and thus with a linearly growing time again.

My question is: is there a simple way to find this

?

In addition: am I correct that the advantage of using gpuArray is that these can have larger

, while c is also larger? Are there similar commands to retrieve these parameters of the GPU?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Walter Roberson le 20 Mar 2019

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/451210-max-size-for-efficient-vectorization#answer_366428

Modifié(e) : Walter Roberson le 20 Mar 2019

? The x84 architecture has shift instructions but no shift register.

What the x64 architecture has is cache lines -- groups of bytes that are loaded together even if not explicitly needed by an instruction, under the theory that there is a good chance that the bytes will be needed as well. On most common intel x64 implementations, those cache lines are 64 bytes -- which is merely the size of a double.

Anything beyond that is due to primary cache or secondary cache, which tend to be more model specific.

https://www.aristeia.com/TalkNotes/ACCU2011_CPUCaches.pdf

https://stackoverflow.com/questions/7281699/aligning-to-cache-line-and-knowing-the-cache-line-size

2 commentaires
Afficher AucuneMasquer Aucune

Wouter le 20 Mar 2019

Thanks for your answer. Yes, I must have been mistaken about the term 'shift register'.

So, long story short, the max size for vectors to retain efficiency is that they fit in cache memory as a whole, but within matlab there is no simple way to find out this size except by trial-and-error?

Walter Roberson le 20 Mar 2019

For large enough matrices, for a lot of the common operations, MATLAB involves LAPACK or BLAS or MKL (Intel Math Kernel Library), which are high efficiency multi-processor aware routines that know about caches and automatically take cache efficiency into account. Working in blocks (for cache efficiency) can end up using more calculations than the theoretical minimum, but the practical reduction in time can be high. The routines can do a better job of block processing than you could hope to do yourself in MATLAB, so if you have larger arrays it is typically better to have MATLAB take care of the details rather than trying to split the work up yourself. (However if you are using Parallel Computing Toolbox, there is still room for you to use your knowledge of the problem to balance loads more efficiently between workers.)

CUDA is a quite different architecture where what is important is not so much cache but rather that the same instruction is being applied to all locations. Cache does exist, but conditional operation leads unselected processors to be suspended, which can affect processing performance more.

https://www.mathworks.com/matlabcentral/answers/113412-configure-l1-cache-shared-memory-size

Connectez-vous pour commenter.

Max size for efficient vectorization?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires
Afficher AucuneMasquer Aucune

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Max size for efficient vectorization?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires Afficher AucuneMasquer Aucune

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune