My mex file is slower than my original matlab equivalent
10 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Mohammad Shojaei Arani
le 18 Juil 2022
Commenté : Mohammad Shojaei Arani
le 19 Juil 2022
Hello friends,
I need to calculate some quantities of linear algeibra type, so they are merely matrix and vector products. The following is an example
EZ=[(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
where Ds0,Ds1,Ds2,Dm0,Dm1,Dm2 are 1*n vectors. When I do the calculations using matlabFunction (attached) it is fast. However, I am not satisfied since I really need to do such calculations thousands of time s(if not millions of times). To overcome this issue I decided to give mex a try. Unfortunately, the equivalent mex file (which I made by matlab coder) is slower 2-3 times (I could not upload it here, unforetunately).
Is there any hope to create a mex file out of this function which is much faster? I hope so!
Thanks for your help in advance,
Babak
7 commentaires
Bruno Luong
le 18 Juil 2022
Modifié(e) : Bruno Luong
le 18 Juil 2022
So you don't think simplying expression matters? I do think the contrary. Everytime in the expression there is a gpu array involves there is a whole transfer data from cpu to gpu, you mighte have 200 such terms in your expression, I don't even try to count or understand your code as it is a so unreable and messy expression.
If you a raw unsimplified expression like yours, throw it in the computer and ask why it doesn't accelerate, you need to think a much more lower level how it works.
Réponse acceptée
Jan
le 18 Juil 2022
Just some experiments. You can gain some clarity, but hardly improve the speed with this simplifications. I've tried a loop version also.
n = 1e4;
Ds0 = rand(1, n);
Ds1 = rand(1, n);
Ds2 = rand(1, n);
Dm0 = rand(1, n);
Dm1 = rand(1, n);
Dm2 = rand(1, n);
tic;
for rep = 1:1e4
EZ = [(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
end
toc
tic;
for rep = 1:1e4
Ds0_2 = Ds0 .* Ds0;
Dm0_2 = Dm0 .* Dm0;
EZ2 = [(1 ./ Ds0_2 .* (Ds0 .* (Dm0 * 2 + Dm0 .* Ds1 .^ 2 - ...
Ds0 .* (Ds1 + Dm0 .* Ds2 + Dm1 .* Ds1 - Dm2 .* Ds0 ./ 2) + ...
Dm0 .* Dm1) - Dm0_2 .* Ds1)) / 2; ...
1 ./ Ds0_2 .* (Ds0 .* (Ds0 .* (Dm1 + Ds1 .^ 2 / 4 - Ds0 .* Ds2 / 2 + 1) - ...
Dm0 .* Ds1 * 2) + Dm0_2);
(Dm0 * 3 - Ds0 .* Ds1 * 1.5) ./ Ds0];
end
toc
tic;
for rep = 1:1e4
EZ3 = zeros(3, n);
for k = 1:n
a = Ds0(k);
b = Dm0(k);
c = Ds1(k);
d = Dm1(k);
e = Ds2(k);
EZ3(1, k) = (1 / a^2 * (a * (b * 2 + b * c ^ 2 - ...
a * (c + b * e + d * c - Dm2(k) * a / 2) + b * d) - b^2 * c)) / 2;
EZ3(2, k) = (a * (a * (d + c ^ 2 / 4 - a * e / 2 + 1) - b * c * 2) + b^2) / a^2;
EZ3(3, k) = b * 3 / a - c * 1.5;
end
end
toc
max(abs(EZ(:) - EZ2(:)))
max(abs(EZ(:) - EZ3(:)))
5 commentaires
Jan
le 18 Juil 2022
@Mohammad Shojaei Arani: The rules are straight:
- Avoid repeated work. If a calculation appears repeatedly, compute it once and store it in a temporary variable.
- Reduce the call to expensive functions: exp, power, trigonometric functions, faculty, ...
- Combine operations, but keep in mind, that the result can be influenced by rounding effects. E.g. 1/a*b takes more time than b/a, but the result can be slightly different.
The clarity of the code improves the time needed for debugging:
- Spaces around operators.
- Compact names of variables.
- Be careful with using parentheses, if they are not required.
- Avoid elementwise operators, if the calculation does not need it. 3.0.*2.0 is harder to read then 3 * 2.
Bruno's point is important: The result of numerically instable functions can be influenced massively by simplifications. A basic example:
1e17 + 1 - 1e17
1e17 - 1e17 + 1
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Write C Functions Callable from MATLAB (MEX Files) dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!