Matrix product and inversion speed up
Afficher commentaires plus anciens
Hi,
my question is simple. I have a function where the most computational intesinve line is this simple matrix operation
iF = iR - iRZ*(iP - Z' *iRZ )\iRZ';
Dimesions of the involved objects vary at each call, but tipically are of the orders of magnitude listed here:
iR -> diagonal 40kx40k (saved as sparse)
iRZ -> 40k x 100 (saved as sparse)
Z -> 40k x 100
iP -> 100x100
Is there something one can do to speed up the code?
Thanks
8 commentaires
James Tursa
le 18 Juin 2020
Does everything vary from call to call, or do only some of the variables change?
John D'Errico
le 18 Juin 2020
You may think it simple, but big problems tend to be computationall intensive, no matter how badly you want things otherwise.
Odds are though, the product
iRZ*(iP - Z' *iRZ )\iRZ'
may not be as sparse as are the components. This sort of thing looks like it will have MASSIVE amounts of fill-in. In that case, these will be huge matrices, no matter how you want to solve it. In fact, in that case, you may see that using sparse storage may be costing you time here. So I would look to see exactly how sparse it is.
Anyway, far too often, I see people think they have a sparse matrix that is only 50% non-zero. So when you say something is sparse, how sparse is it? And most importantly, what are you looking at in terms of that result above?
Angelo Cuzzola
le 19 Juin 2020
Modifié(e) : Angelo Cuzzola
le 19 Juin 2020
Bjorn Gustavsson
le 19 Juin 2020
Maybe this is no longer relevant but once uppon a time expressions with matrix multiplications were not done in a clever order (i.e. with the smalest maximum size of the products). For some application I had to guide matlab to do the multiplications in an order that did not blow up in size. If that is still an issue, you could try to compare these versions with yours:
iF = iR - iRZ*((iP - Z' *iRZ )\iRZ');
iF = iR - (iRZ*(iP - Z' *iRZ ))\iRZ';
One of them might have a much smaller footprint.
Angelo Cuzzola
le 19 Juin 2020
Modifié(e) : Angelo Cuzzola
le 19 Juin 2020
Christine Tobler
le 19 Juin 2020
What are you doing with the matrix iF in the next step? Since this is a dense 1e4x1e4 matrix, it will be comparatively expensive to compute, and just avoiding to store this explicitly altogether would probably be the fastest thing.
For example, if your next step is to do matrix-vector multiplication with iF, you could replace
y = iF*x
with
y = iR*y - iRZ*((iP - Z' *iRZ )\(iRZ'*y));
For some other possible operations, you will need to store the complete iF as a dense matrix - it depends on that next step.
Angelo Cuzzola
le 19 Juin 2020
Réponses (2)
John D'Errico
le 19 Juin 2020
Modifié(e) : John D'Errico
le 19 Juin 2020
What you seem not to appreciate is that 90-95% is NOT very sparse, not when you are talking matrix factorizations (i.e., backslash.) Backslash does not compute an inverse, although the inverse of a general not terribly sparse matrix will also probably be almost full too.
Inside this expression:
iRZ*(iP - Z' *iRZ )\iRZ'
we see a 100x100 matrix in the parens. (Roughly, since you say the sizes changes somewhat arbitrarily.) That matrix is what controls a LOT, since it will likely be fairly non-sparse, given the components. But once that 100x100 matrix is decomposed using a matrix factorization, it likely becomes close to a full matrix, or a product of a full lower and upper triangular pair.
In that case, then if you multiply that result using a pre multiply by iRZ as well as an implicit multiply by iRZ', you also get something fairly full.
My question in the comments was to ask what is the sparsity of those submatrices. That is, compute
A = iP - Z' *iRZ ;
nnz(A)/numel(A)
You might also then try for that same A:
[L,U] = lu(A);
nnz(L)/numel(L)
nnz(U)/numel(U)
If each of those matrices is nearly 50% non-zero, then they are essentially full triangular matrices.
Next, on the parent computation:
A = iRZ*(iP - Z' *iRZ )\iRZ';
nnz(A)/numel(A)
Those are the things you need to do. You should not be surprised to see a great deal of fill-in. In fact, consider yourself lucky if that does not happen.
You might consider sparsity enhancing permutations, applied to the inner 100x100 matrix, thus tools like colamd, dmperm, etc. (Sorry, but it has been many years since I was actively using those tools. They can help a great deal when used properly.)
Remember that once things start to become too filled in, a sparse matrix computation can start to become LESS efficient than a full one would have been. You might test if making that inner square matrix a full matrix helps too.
Are these things important? Yes.
For example:
A = rand(100);
As = sparse(A);
B = sprand(10000,100,0.01);
So B is 99% sparse. A is essentially a full matrix. And remember that even if A is only fairly full, a factorization of A will probably be quite close to full anyway.
Consider the term
C = A\B';
nnz(C)/numel(C)
ans =
0.6299
It does not matter whether or not A is sparse, backslash created a matrix that was 63% non-zero. It is effectively full in the world of sparse matrices. In fact, if we look at what happens with
Cs = As\B';
>> whos C Cs
Name Size Bytes Class Attributes
C 100x10000 8000000 double
Cs 100x10000 10158408 double sparse
So that 63% non-zero matrix used roughly 25% MORE space to store than the comparable full version.
Now compare times:
timeit(@() B*(A\B'))
ans =
0.3445460668125
timeit(@() B*(As\B'))
ans =
1.5696587328125
Finally, consider this:
Bf = full(B);
timeit(@() Bf*(A\Bf'))
ans =
0.1513420138125
And those times were only for a 10000x100 problem, and a sparse version of B that was 99% sparse, not just 90-95%.
As long as I could store that final result in memory as a full matrix, I actually gained by just keeping the computation completely full. To be useful, sparse matrices need to be seriously sparse. As well, sparsity can disappear quickly.
So if that inner part is full, or even close to being full, then sparsity of B does not help. In fact using sparse matrices may be LESS efficient, using more storage in those products, and using more time. And you need to look at the sparsity of the computed results. At least consider fill-in reducing permutations, as it is fill-in that can be a killer for sparse matrices.
Bjorn Gustavsson
le 19 Juin 2020
My comment still seems relevant. A simple commandline test gives:
tic,AA = ones(3e4,2)*(ones(2,3e4)*ones(3e4,1));toc
% Elapsed time is 0.001027 seconds.
tic,BB = (ones(3e4,2)*ones(2,3e4))*ones(3e4,1);toc
% Elapsed time is 2.031760 seconds.
isequal(AA,BB)
% logical 1
If this is run inside a function where the JIT-optimization can work its wonders the multiplication might be a bit more cunning. But when you know the sizes ofthe matrices that should be multiplied together you can guide the order of the multiplications so that you avoid outer-products producing large intermediate matrices that then vanishes to produce a smaller matrix or vector in the end.
HTH
2 commentaires
Angelo Cuzzola
le 19 Juin 2020
Modifié(e) : Angelo Cuzzola
le 19 Juin 2020
Bjorn Gustavsson
le 19 Juin 2020
That's a downer, I hoped that inverting a 100x100 matrix wouldn't be that expensive. Perhaps you can have some use of Factorize.
Catégories
En savoir plus sur Creating and Concatenating Matrices dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!