Efficiently populating an array without for loops
25 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi Everyone,
I have a list of data with 10,000,000 rows and 3 columns. The columns correspond to the shape, size, and color of an object, which is indexed with a number. There are 100 shapes, 100 sizes, and 50 colors.
I want to create a matrix (100x100x50) that essentially stores the count of each object type, kind of like a histogram for unique objects.
Rather than my following code, which is too slow to run because of the for-loops, does anyone know of a way to complete the same operation using direct matrix operations? It seems these comparisons should be relatively fast, but are extremely slow in Matlab the way I am doing it.
ObjectTypes = zeros(100,100,50);
for Shape=1:100
for Size=1:100
for Color=1:50
ObjectTypes(Shape,Size,Color) = size(MyData(MyData(:,1) == Shape & MyData(:,2) == Size & MyData(:,3) == Color),1);
end
end
end
0 commentaires
Réponse acceptée
Geoff
le 27 Mai 2012
Hah... So an alternative in Order(N) time...
for n = 1:size(MyData,1)
row = MyData(n, [1,2,3]);
ObjectTypes(row(1),row(2),row(3)) = ObjectTypes(row(1),row(2),row(3)) + 1;
end
Plus de réponses (2)
Geoff
le 27 Mai 2012
Yeah that's searching through your data an awful lot every time you do the == comparisons. The way I do this kind of thing when populating a matrix from database results is to have the data sorted by two variables, and then use diff and find to get the data ranges.
So start with this:
MyData = sortrows(MyData);
Grab out the begin and end index for each group of values in column one.
% Partition by shape
begin1 = [1; 1+find(diff(MyData(:,1)))];
end1 = [begin1(2:end)-1; size(MyData,1)];
Now you can combine these into a loop variable, so each time through the loop will give you a 2x1 vector containing the start and end range. You do the same thing again with column 2. Finally I use accumarray to count up all the colours for a given size and shape:
% Process the Shape partitions
for r1 = [begin1, end1]'
Shape = MyData(r1(1), 1); % Single Shape
% Partition by Size
idx1 = r1(1):r1(2);
col2 = MyData(idx, 2);
begin2 = [1; 1+find(diff(col2))];
end2 = [begin2(2:end)-1; numel(col2)];
% Process the Size partitions
for r2 = [begin2, end2]'
Size = col2(r2(1)); % Single Size
idx2 = r1(1)+r2(1):r1(1)+r2(2);
% Count up all the Color occurrences for Shape and Size
Color = MyData(idx2, 3);
colorCount = accumarray(Color, ones(numel(Color),1));
ObjectTypes(Shape, Size, 1:max(Color)) = colorCount;
end
end
I would hope this is faster than your current loop, although there are probably clever ways to use accumarray without all the looping guff I've done. Apologies if there are errors in this code. I just hacked it straight into my web browser =)
1 commentaire
Walter Roberson
le 27 Mai 2012
Are the numbers for the shape, size, color consecutive integers each starting from 1? If they are then the code can be reduced to
ObjectTypes = accumarray(MyData, 1);
If not then you can create the consecutive integers by using the thiree-output version of unique().
[ushape, junk, shapeidx] = unique(MyData(:,1));
[ucol, junk, colidx] = unique(MyData(:,2));
[usize, junk, sizidx] = unique(MyData(:,3));
ObjectTypes = accumarray( [shapeid(:), colidx(:), sizidx(:)], 1);
0 commentaires
Voir également
Catégories
En savoir plus sur Data Distribution Plots dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!