Speed up my randperm code

3 vues (au cours des 30 derniers jours)
Jorre Goossens
Jorre Goossens le 20 Août 2017
Réponse apportée : Jan le 20 Août 2017
Hi guys
For a GUI I'm making, I want to generate a 6x5 matrix which contains a randperm of about 20 numbers (the 20 isn't fixed). I found out that half of my computation time goes into this function (it's inside a for-loop). Is there a way to make this code more efficient? Thanks in advance!!
The way the code works is that it determines the range of the randompermutation 'm' by determining the amount of rows of the matrix Dnum. In the matrix 'aantal' is listed how many numbers of the randperm needs to be written to each row of my 6x5 matrix. I know my explanation isn't so clear, but maybe an example will help more:
In case 'm' is equal to 20 and 'aantal' is [4 4 4 3 3 2], the final 6x5 matrix could look something like this: [ 7 11 8 5 0; 14 1 6 18 0; 20 2 3 17 0; 4 19 12 0 0; 9 16 13 0 0; 10 15 0 0 0 ]
function [mat]=Generate(Dnum,aantal)
[mat]=zeros(6,5);
[m,~]=size(Dnum);
if isequal(m,sum(aantal))
a=randperm(m);
b1=aantal(1);
mat(1,1:aantal(1))=a(1:b1);
b2=b1+aantal(2);
mat(2,1:aantal(2))=a(b1+1:b2);
b3=b2+aantal(3);
mat(3,1:aantal(3))=a(b2+1:b3);
b4=b3+aantal(4);
mat(4,1:aantal(4))=a(b3+1:b4);
b5=b4+aantal(5);
mat(5,1:aantal(5))=a(b4+1:b5);
b6=b5+aantal(6);
mat(6,1:aantal(6))=a(b5+1:b6);
end

Réponse acceptée

Jan
Jan le 20 Août 2017
randperm(m,m) is faster than randpern(m). This reduces the runtime of the Generate() function by 20% already.
You can find a faster C-Mex function Shuffle in the FileExchange. But for small inputs like m=20, the overhead of calling a C-Mex function removes the advantage.
But you can avoid other unneeded code details:
function mat = Generate(Dnum, aantal)
mat = zeros(6, 5);
m = size(Dnum, 1);
if m == sum(aantal)
a = randperm(m,m);
b = cumsum(aantal);
mat(1, 1:aantal(1)) = a(1:b(1));
mat(2, 1:aantal(2)) = a(b(1)+1:b(2));
mat(3, 1:aantal(3)) = a(b(2)+1:b(3));
mat(4, 1:aantal(4)) = a(b(3)+1:b(4));
mat(5, 1:aantal(5)) = a(b(4)+1:b(5));
mat(6, 1:aantal(6)) = a(b(5)+1:b(6));
end

Plus de réponses (1)

Walter Roberson
Walter Roberson le 20 Août 2017
randperm(n) is (or was) implemented as
[~, output] = sort( rand(1, n) );
This involves one call to rand() to produce n random numbers; the theoretic minimum would be that perhaps it could be done by generating n-1 random numbers instead (since the position of the last entry is "pinned" by the position of the others.)
However, when you call a random number generator with replacement, you need to compare every number to every other number, to check for duplicates, and you need to have a strategy for dealing with the possibility that a duplicate was found. If you were to do randi(n, 1, n) then for larger n it would be virtually certain that you would get a duplicate most of the time, so you could end up spinning around a lot re-generating entries hoping to just happen to be a permutation. By the time n = 9, the probability of generating a permutation out of that kind of randi is less than 1/1000 . Therefore, although the theory might say you can generate in n-1 random numbers, in practice that is a bust for vectorized code.
The sort() approach side-steps the uniqueness problems nicely (at the risk of a very small bias in the ordering under the condition that two of the randomly generated double precision numbers were exactly equal, as the sort() might not break the tie randomly.) The sort takes time proportional to n*log(n).
Is it theoretically possible to do better? Yes. You could use the virtual "draw a ball without replacement" approach:
balls = 1:n;
output = zeros(1, n);
for K = 1 : n-1
ball_idx = randi(length(balls), 1, 1);
output(K) = balls(ball_idx);
balls(ball_idx) = [];
end
output(n) = balls; %only one left
However, this is only more efficient under the theoretical situation where memory allocation is free: deleting a ball at a time in practice requires a bunch of memory re-allocations and data copying. And, of course, in practice, looping is slower then vectorization.
But you could benchmark to figure out which is faster for your situation. I am fairly confident you will find that randperm() is faster.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by