Fast implementation of max-plus matrix multiplication
Afficher commentaires plus anciens
I am trying to implement a fast max-plus algebra multiplication of square matrices in MATLAB. The max-plus multiplication ⊗ between two matrices
returns the matrix
such that
.
returns the matrix
such that A naive implementation of the max-plus multiplication in MATLAB is:
function C = mp_prod(A,B)
n = size(A,1);
C = -inf*ones(n);
for i = 1:n
for j = 1:n
for k = 1:n
C(i,j) = max(C(i,j), A(i,k) + B(k,j));
end
end
end
However, this is not particularly fast. The fastest implementation I could come up with is:
function C = fast_mp_prod(A,B)
n = size(A,1);
C = -inf*ones(n);
A = transpose(A);
for i = 1:n
C(i,:) = max(A(:,i) + B);
end
which takes about 0.18 seconds to multiply two 100×100 matrices 100 times on my computer. Since performing the same test using a "standard" multiplication takes about 0.004 seconds (45 times faster), I was wondering if there were a way to speed up the code and obtain more comparable timings using MATLAB.
If this were not possible, would it be worth to try to create a package similar to LAPACK for max-plus algebra to get faster performances? Or is this the maximum achievable speed for reasons like "the CPU is inherently slower to calculate the maximum than the product of two numbers"?
4 commentaires
Of course finding the maximum of a vector with 100 elements is slower than the multiplication of two numbers: You have to access 100 elements in the memory and perform 99 comparisons.
-inf(n) might be faster than -inf * ones(n), but this is not the bottleneck of the code. In the fast version C=zeros(n) would be sufficient.
Davide Zorzenon
le 13 Avr 2021
Modifié(e) : Davide Zorzenon
le 13 Avr 2021
Jan
le 13 Avr 2021
The naive max() implementation contains a branch. This slows down the processing in general, because it impedes the pipelining in the CPU, which decides for one of the branches using heuristics. In case of max() about half of the predictions are false.
The matrix multiplication is performed in highly optimized library. Even the naive implementation in Matlab profits from Matlab's JIT acceleration, which does not handle max() with the same efficiency.
Do you have a C compiler installed?
Davide Zorzenon
le 13 Avr 2021
Réponse acceptée
Plus de réponses (1)
Bruno Luong
le 13 Avr 2021
Modifié(e) : Bruno Luong
le 13 Avr 2021
function C = mp_prod(A,B)
m=size(A,1);
n=size(B,2);
AA=reshape(A,m,1,[]);
BB=reshape(B.',1,n,[]);
C=max(AA+BB,[],3);
tic/toc result
>> A=rand(100);
>> B=rand(100);
>> tic; C = mp_prod(A,B); toc
Elapsed time is 0.005206 seconds.
8 commentaires
Davide Zorzenon
le 13 Avr 2021
Bruno Luong
le 13 Avr 2021
Modifié(e) : Bruno Luong
le 13 Avr 2021
On my computer the later code (working major column) is slightly faster (about 10%).
>> A=rand(100);
>> B=rand(100);
>> tic; for i = 1:1000; fast_mp_prod(A,B); end; toc
Elapsed time is 3.127958 seconds.
>> tic; for i = 1:1000; bl_mp_prod(A,B); end; toc
Elapsed time is 2.723747 seconds.
>>
major column code
function C = bl_mp_prod(A,B)
m=size(A,1);
n=size(B,2);
A=reshape(A.',[],m,1);
B=reshape(B,[],1,n);
C=A+B;
C=max(C,[],1);
C=reshape(C,[m n]);
Davide Zorzenon
le 13 Avr 2021
Bruno Luong
le 14 Avr 2021
Modifié(e) : Bruno Luong
le 14 Avr 2021
Mine is R2021a, the hardware is pretty old 7 year old PC Windows 8.
Davide Zorzenon
le 14 Avr 2021
@Davide Zorzenon: Which machine and Matlab version are you using?
The different timings can mean, that Davide's setup is more efficient for the loop, or Bruno's setup is more efficient for the vectorized solution.
Davide Zorzenon
le 14 Avr 2021
Catégories
En savoir plus sur Parallel Computing Toolbox dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
