Accuracy problems using mldivide on GPU

9 vues (au cours des 30 derniers jours)
Damian
Damian le 4 Jan 2023
Commenté : Damian le 9 Jan 2023
Hello all,
I want to solve a system of complex linear equations A*x=B on the GPU. However, when I compare the solution with the results of the same computation on the CPU, there seems to be a relatively large error.
Here is a minimal working example:
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
for k = length(A):-1:1
x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagefun(@mldivide, g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
On my machine, I get
max_error = 1.0197e+03
which means that the results are unusable.
In a last-ditch attempt I tried using complex() both in gpuArray() and in pagefun(), as described here, but it changes nothing.
I am running Matlab R2021a with the latest updates. GPU driver is also the latest version (527.27).
ans =
CUDADevice with properties:
Name: 'Quadro P2200'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.3685e+09
AvailableMemory: 4.3222e+09
MultiprocessorCount: 10
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
I would be very grateful for ideas and suggestions on how to solve the problem. Thanks in advance!
  1 commentaire
Bruno Luong
Bruno Luong le 9 Jan 2023
Again the issue only occurs with A complex and not square.

Connectez-vous pour commenter.

Réponse acceptée

Matt J
Matt J le 9 Jan 2023
Confirmed. But it doesn't seem to be mldivide that's the problem per se, but rather pagefun:
load data
for k = length(A):-1:1
x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
%g_x = pagefun(@mldivide, g_A, g_B);
for k = length(A):-1:1
g_x(:,:,k) = g_A(:,:,k) \ g_B(:,:,k);
end
x_GPU = gather(g_x);
nrm=@(z) vecnorm(z(:,:),inf,1);
%% comparison
max_error = max(nrm(x_GPU - x_CPU) ./ nrm(x_CPU)) % relative error
max_error =
7.4269e-20
  8 commentaires
Damian
Damian le 9 Jan 2023
Alright, thanks.

Connectez-vous pour commenter.

Plus de réponses (2)

Joss Knight
Joss Knight le 7 Jan 2023
Modifié(e) : Joss Knight le 7 Jan 2023
The answer is not less accurate, it is just as inaccurate. What you have are 60001 massively overdetermined systems generated by completely random data. The least squares solution to this is going to be all over the place, in other words, there will be no good answer. If you look at the relative residuals you'll find there's little difference in the (lack of) success of either CPU or GPU.
%%
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
x_CPU = pagemldivide(A,B);
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagemldivide(g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
res = abs(sqrt(sum((pagemtimes(A, x_CPU)-B).^2,1)));
g_res = abs(sqrt(sum((pagemtimes(g_A, g_x)-g_B).^2,1)));
rhsNorm = abs(sqrt(sum(B.^2,1)));
relres = reshape(res./rhsNorm, 2, []);
g_relres = reshape(g_res./rhsNorm, 2, []);
average_cpu_relative_residual = mean(relres, 'all')
average_gpu_relative_residual = mean(g_relres, 'all')
std_cpu = std(relres(:))
std_gpu = std(g_relres(:))
To which I got the answers
1.0197e+03
average_cpu_relative_residual =
0.9979
average_gpu_relative_residual =
1.0543
std_cpu =
0.1935
std_gpu =
0.2664
It seems the GPU solution is marginally worse, but given that the input was nonsense anyway I wouldn't read too much into it.
It's possible that you intended for the system matrix A to be square and only B to be 60x2 (i.e. 2 right-hand-sides), in which case with random data you'd get something much better behaved. I got:
1.3056e-11
average_cpu_relative_residual =
1.8911e-14
average_gpu_relative_residual =
1.6029e-14
std_cpu =
3.6059e-14
std_gpu =
3.1088e-14
  3 commentaires
Damian
Damian le 9 Jan 2023
You are right about the quality of the data. Since I know that, I did not expect a perfect fit, so in my case 7.5% residual over the entire dataset is good enough.
Thank you for your suggestions and your help!

Connectez-vous pour commenter.


Walter Roberson
Walter Roberson le 9 Jan 2023
Modifié(e) : Matt J le 9 Jan 2023
It is a known bug that should be fixed soon that affects page left and right divisions for complex numbers
  2 commentaires
Bruno Luong
Bruno Luong le 9 Jan 2023
I agree with Walter it looks pretty much similar cause (wrong detection of matrix type and decomposition) for GPU and CPU.

Connectez-vous pour commenter.

Catégories

En savoir plus sur GPU Computing in MATLAB dans Help Center et File Exchange

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by