Sparse gpuArray indexing limitations
Afficher commentaires plus anciens
When trying to extract a sub-matrix of a sparse gpuArray, I get,
>> S=gpuArray(sprand(100,100,0.1));
>> S(1:5,1:5)
Sparse gpuArray matrices only support referencing whole rows or whole columns.
OK, but here, I am indexing a block of complete rows and get the same error,
>> S(1:5,:)
Sparse gpuArray matrices only support referencing whole rows or whole columns.
Aside from this, what is the internal hurdle that makes subsrefing a sparse matrix so hard on the GPU? This has been a limitation for quite a while now.
1 commentaire
Walter Roberson
il y a environ 2 heures
I do not have easy access to a gpu system at this time.
I wonder if
S(1,:)
S(:,1)
would work? That is, perhaps the limit is on indexing more than one complete row or column ?
Réponses (1)
It's all about how the GPU stores the sparse matrix; it uses Compressed Sparse Column (CSC) or Compressed Sparse Row (CSR) layouts and since zeros are not stored, storage is represented by three arrays:
- Values: The actual non-zero numbers.
- Inner Indices: The exact row indices (for CSC) or column indices (for CSR).
- Outer Pointers: A map specifying the physical offset index where each new column or row starts in the arrays above.
Because different rows/columns have a varying number of non-zero elements, the distance between elements is completely dynamic. there is no fixed "stride" to allow for direct addressing without the lookup.
Consequently, trying to pull a sub-matrix (e.g., A(2:5, 2:5)), requires thousands of parallel threads to fetch that data simultaneously. To get to the initial element A(2, 2) a GPU thread must:
- Look up the Outer Pointer for column 2.
- Read the start pointer for column 3.
- Linear-search through every non-zero index inside that block just to see if row 2 exists.
Because the data are tightly packed, a thread handling row 3 cannot know where its data begins until the threads handling rows 0, 1, and 2 have resolved how many elements they contain. This creates a strictly sequential data dependency and forcing a massively parallel architecture like a GPU to execute a sequential pointer-chasing search destroys performance, making arbitrary sub-array slicing structurally infeasible.
The reason MATLAB allows full rows or full columns (depending on if it is utilizing CSR or CSC under the hood) is because the starting and ending bounds of a full slice are explicitly defined by the Outer Pointers array. The GPU can read the single pointer for the requested column and instantly copy that entire contiguous chunk of memory into a new vector.
I don't know if the MATLAB implementation documents whether CSR or CSC is used or if it may depend upon the structure of the sparsity, perhaps. I could see that when creating a particular sparse GPU array, it must choose one or the other and it then could run into the indexing problem if one asks for the opposite slice direction.
I wonder if the example might be an illustration of that; if you try the other direction
S(:,1:5)
on the same array it would succeed?
In short, GPUs are good if you can load 'em up and then they can operate on the data in parallel; anything else is likely to be nothing but a bottleneck that totally defeats the purpose.
Catégories
En savoir plus sur Matrix Indexing dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!