# Vectorizing the addition of a vector of values with associated indices to an existing vector, when the vector of indices has repeated values.

4 views (last 30 days)
Nathan Neeteson on 8 Jan 2020
Commented: David Goodmanson on 10 Jan 2020
The situation is, you have a vector A, a vector i, and a vector x. The vector A is your base vector you want to add some values to. The vector x is the values you want to add to A, and the vector i is the indices of where you want the values in x to be added in A. So A is 1-by-n, and x and i are both 1-by-m.
You can execute this in two ways as far as I can tell, by either using a for loop to iterate through x (and i) and adding the values where appropriate, or you can use vectorized indexing to do the entire operation in one line. However, the vectorized indexing only works if the index vector contains only unique values. If the vector i contains multiple values that are the same, the two different ways of doing the operation give different results. It seems that what happens is Matlab just adds to A(i) whatever value in x corresponds to the last instance of that value of i in the vector, rather than adding them all cumulatively.
My question is, can anyone suggest a solution to this problem that is more efficient than just using a for loop? Is there an efficient function that can give me a unique version of the vector i with a corresponding new vector x' where all of the values that correspond to the same index (same value in i) have been added together, such that I can use a vectorized operation and get the same result as when using a for loop?
Here is some code that demonstrates the issue as simply as possible:
clear all; close all; clc;
% first, an example of it working when the index vector has no repeated
% values
A = rand(1,5);
A_vector = A;
A_forloop = A;
indices = [1 2 3];
x = rand(size(indices));
A_vector(indices) = A_vector(indices) + x;
for i = 1:length(indices)
A_forloop(indices(i)) = A_forloop(indices(i)) + x(i);
end
all(A_vector==A_forloop) % check that all values are the same
% second, an example of it not working when the index vector has repeated
% values
A = rand(1,5);
A_vector = A;
A_forloop = A;
indices = [1 1 1 2 3];
x = rand(size(indices));
A_vector(indices) = A_vector(indices) + x;
for i = 1:length(indices)
A_forloop(indices(i)) = A_forloop(indices(i)) + x(i);
end
all(A_vector==A_forloop) % check that all values are the same
The first check gives a true value, the second check gives a false value, because the vectorization of the procedure does not give the same result as the for loop when the vector i has repeated values.

David Goodmanson on 9 Jan 2020
Edited: David Goodmanson on 9 Jan 2020
Hi Nathan,
sparse matrices automatically do what you want. You have full matrices, and sparse operations are on the slow side when operating on full matrices, but unless the dimensions are quite large, this should not be a problem. For example,
A = 1:12;
x = 1:10;
ind = [1 2 4 4 5 9 9 2 2 2];
sA = size(A);
B = full(sparse(ones(size(ind)),ind,x,sA(1),sA(2)))
A = A + B
See the help on sparse for the input scheme.

Nathan Neeteson on 9 Jan 2020
Hi David,
Thank you for your suggestion. This does give the same result. However, in the script below I played around with the number of elements in A and x (N, M) and found that if they are both very small, the sparse method is just slightly faster than the for loop, but if N>~10000 (around the scale of the A equivalent in my problem) then the sparse method becomes slower, and it gets worse as you scale the problem up.
forlooptime = [];
sparsetime = [];
Narr = 10.^[1:8];
for N = Narr
M = 50;
A = rand(1,N);
A_vector = A;
A_forloop = A;
A_sparse = A;
indices = 1+round((N-1)*rand([1 M]));
x = rand(size(indices));
A_vector(indices) = A_vector(indices) + x;
tic
for i = 1:length(indices)
A_forloop(indices(i)) = A_forloop(indices(i)) + x(i);
end
forlooptime(end+1) = toc
tic
sA = size(A);
dA = full(sparse(ones(size(indices)),indices,x,sA(1),sA(2)));
A_sparse = A_sparse + dA;
sparsetime(end+1) = toc
sqrt(sum((A_vector-A_forloop).^2)) % check that all values are the same
sqrt(sum((A_sparse-A_forloop).^2))
end
figure()
hold on
grid on
plot(Narr,forlooptime,'k-')
plot(Narr,sparsetime,'k--')
xlabel('length(A)')
ylabel('clock time [s]')
legend({'for loop','full(sparse())'},'location','northeast')
set(gca,'XScale','log')
set(gca,'YScale','log')
hold off So the solution works but does not provide an improvement in computational efficiency over using a for loop (the sparse method also scales worse than the for loop as you increase M). Unless I am doing something wrong and there is another way to code it so that it is faster?
Still, thank you for taking the time to give a suggestion. It was worth looking into whether this method could be faster.
David Goodmanson on 10 Jan 2020
Hi Nathan,
Never know until you try, thanks for checking that out.