Accuracy problems using mldivide on GPU

Question

2 votes

Hello all,

I want to solve a system of complex linear equations A*x=B on the GPU. However, when I compare the solution with the results of the same computation on the CPU, there seems to be a relatively large error.

Here is a minimal working example:

rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
for k = length(A):-1:1
    x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagefun(@mldivide, g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all');  % relative error
disp(max_error);

On my machine, I get

max_error = 1.0197e+03

which means that the results are unusable.

In a last-ditch attempt I tried using complex() both in gpuArray() and in pagefun(), as described here, but it changes nothing.

I am running Matlab R2021a with the latest updates. GPU driver is also the latest version (527.27).

ans = 
  CUDADevice with properties:
                      Name: 'Quadro P2200'
                     Index: 1
         ComputeCapability: '6.1'
            SupportsDouble: 1
             DriverVersion: 11.6000
            ToolkitVersion: 11
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 5.3685e+09
           AvailableMemory: 4.3222e+09
       MultiprocessorCount: 10
              ClockRateKHz: 1493000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
           DeviceAvailable: 1
            DeviceSelected: 1

I would be very grateful for ideas and suggestions on how to solve the problem. Thanks in advance!

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Bruno Luong le 9 Jan 2023

Again the issue only occurs with A complex and not square.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Matt J le 9 Jan 2023

Ouvrir dans MATLAB Online

1 vote

Confirmed. But it doesn't seem to be mldivide that's the problem per se, but rather pagefun:

load data
for k = length(A):-1:1
    x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
%g_x = pagefun(@mldivide, g_A, g_B);
for k = length(A):-1:1
    g_x(:,:,k) = g_A(:,:,k) \ g_B(:,:,k);
end
x_GPU = gather(g_x);
nrm=@(z) vecnorm(z(:,:),inf,1);
%% comparison
max_error = max(nrm(x_GPU - x_CPU) ./ nrm(x_CPU))  % relative error

max_error =

7.4269e-20

8 commentaires
Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

Bruno Luong le 9 Jan 2023

Modifié(e) : Bruno Luong le 9 Jan 2023

Ouvrir dans MATLAB Online

Some alternative woakaround ways on GPU, but CPU is still fatest with this test on my PC

rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
% for k = length(A):-1:1
%     x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
% end
tic
x_CPU = pagemldivide(A, B);
toc 
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
tic
g_x = pagefun(@mldivide, g_A, g_B);
x_GPU = gather(g_x); % BUGGY ISSUE
toc
% Least square formulation
tic
g_sx = pagefun(@mldivide, pagefun(@mtimes, pagefun(@ctranspose, g_A), g_A), pagefun(@mtimes, pagefun(@ctranspose, g_A), g_B));
x_recGPU = gather(g_sx);
toc
% Real matrix formulation
tic
rA = real(A);
iA = imag(A);
g_AA = gpuArray([rA -iA; iA rA]);
g_BB = gpuArray([real(B); imag(B)]);
g_xx = pagefun(@mldivide, g_AA, g_BB);
g_xx = g_xx(1:2,:,:) + 1i*g_xx(3:4,:,:);
xx_GPU = gather(g_xx);
toc
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all');  % relative error
disp(max_error);
max_recerror = max(abs((x_recGPU - x_CPU) ./ x_CPU), [], 'all');  % relative error
disp(max_recerror);
max_xxerror = max(abs((xx_GPU - x_CPU) ./ x_CPU), [], 'all');  % relative error
disp(max_xxerror);
plot(squeeze(pagenorm(x_CPU-x_GPU,'fro')./pagenorm(x_CPU)))

Matt J le 9 Jan 2023

Modifié(e) : Matt J le 9 Jan 2023

In pageloop in this example, unlike in your first one, you pull A and B from the GPU, use mldivide pagewise (on the CPU) and write it back to the GPU. Is it possible that you mixed up gather() and gpuArray()?

No, that was deliberate. That is what you would have to do to interrupt and then resume your GPU workflow.

Damian le 9 Jan 2023

Alright, thanks.

Connectez-vous pour commenter.

Answer 2

Joss Knight le 7 Jan 2023

Modifié(e) : Joss Knight le 7 Jan 2023

Ouvrir dans MATLAB Online

1 vote

The answer is not less accurate, it is just as inaccurate. What you have are 60001 massively overdetermined systems generated by completely random data. The least squares solution to this is going to be all over the place, in other words, there will be no good answer. If you look at the relative residuals you'll find there's little difference in the (lack of) success of either CPU or GPU.

%%
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
x_CPU = pagemldivide(A,B);
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagemldivide(g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all');  % relative error
disp(max_error);
res = abs(sqrt(sum((pagemtimes(A, x_CPU)-B).^2,1)));
g_res = abs(sqrt(sum((pagemtimes(g_A, g_x)-g_B).^2,1)));
rhsNorm = abs(sqrt(sum(B.^2,1)));
relres = reshape(res./rhsNorm, 2, []);
g_relres = reshape(g_res./rhsNorm, 2, []);
average_cpu_relative_residual = mean(relres, 'all')
average_gpu_relative_residual = mean(g_relres, 'all')
std_cpu = std(relres(:))
std_gpu = std(g_relres(:))

To which I got the answers

   1.0197e+03
   
average_cpu_relative_residual =
    0.9979
average_gpu_relative_residual =
    1.0543
std_cpu =
    0.1935
std_gpu =
    0.2664
    

It seems the GPU solution is marginally worse, but given that the input was nonsense anyway I wouldn't read too much into it.

It's possible that you intended for the system matrix A to be square and only B to be 60x2 (i.e. 2 right-hand-sides), in which case with random data you'd get something much better behaved. I got:

   1.3056e-11
   
average_cpu_relative_residual =
   1.8911e-14
average_gpu_relative_residual =
   1.6029e-14
std_cpu =
   3.6059e-14
std_gpu =
   3.1088e-14

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Joss Knight le 9 Jan 2023

Well, I'll admit that numerical analysis is not my speciality (at doctorate level, anyway) but I'm fairly sure that 17% residual indicates that the data is still very poorly conditioned. Since it's 2-D data why not plot some of the data along with the best fit lines to get an idea of how poorly the GPU is doing. It may be that you can fix the problem with some basic outlier rejection, which would be sensible anyway if you have some bad outliers.

cublas's batch QR routines may have some issues, of course, with correct handling of poorly conditioned complex systems, so we'll take a look at that. In the meantime you could try using a pseudo-inverse as Matt J shows.

Damian le 9 Jan 2023

You are right about the quality of the data. Since I know that, I did not expect a perfect fit, so in my case 7.5% residual over the entire dataset is good enough.

Thank you for your suggestions and your help!

Connectez-vous pour commenter.

Answer 3

Walter Roberson le 9 Jan 2023

Modifié(e) : Matt J le 9 Jan 2023

1 vote

It is a known bug that should be fixed soon that affects page left and right divisions for complex numbers

https://www.mathworks.com/matlabcentral/answers/1882867-weird-behavior-for-pagemrdivide-matlab-2022b-vs-2022a#comment_2532847

2 commentaires
Afficher Aucune Masquer Aucune

Damian le 9 Jan 2023

Modifié(e) : Damian le 9 Jan 2023

I am not sure if this applies here since I am still on 2021a, so pagemldivide does not exist for me (yet). But this is something I will keep in mind when upgrading.

Thank you for the info!

Bruno Luong le 9 Jan 2023

I agree with Walter it looks pretty much similar cause (wrong detection of matrix type and decomposition) for GPU and CPU.

Connectez-vous pour commenter.

Accuracy problems using mldivide on GPU

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponse acceptée

8 commentaires
Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

Plus de réponses (2)

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

2 commentaires
Afficher Aucune Masquer Aucune

Catégories

Produits

Version

Tags

Community Treasure Hunt

Accuracy problems using mldivide on GPU

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponse acceptée

8 commentaires Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

Plus de réponses (2)

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

2 commentaires Afficher Aucune Masquer Aucune

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

8 commentaires
Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

2 commentaires
Afficher Aucune Masquer Aucune