For loop over permutations of 1:n with very large n
8 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Davide Zorzenon
le 25 Avr 2023
Commenté : Walter Roberson
le 27 Avr 2023
Question
Take the following instruction:
for i = 1:1e18
% do something
end
now, probably the execution of the code will take a looong time. I am not concerned with this (see section Context to see why), but with the following problem: consider the instruction
for i = randperm(1e18)
% do something
end
Running the code above returns:
Error using randperm
Requested array exceeds the maximum possible variable size.
My question is: how to perform a for loop over a permutation of the elements of vector 1:n where n is a very large number, without incurring in memory issues? Moreover, how does Matlab perform the first for loop without incurring in such issues?
Context
I am building a simple program with a GUI to solve integer optimization problems. The for loop is used to evaluate the optimization function in the whole search space (which can be very large), and the GUI has a button that the user can press to "break" the loop; when the user does this, the program returns a suboptimal solution of the optimization problem. The second for loop would thus represent a simple implementation of a random search.
2 commentaires
Bruno Luong
le 27 Avr 2023
Modifié(e) : Bruno Luong
le 27 Avr 2023
for i = 1:1e18
% do something
end
Even that simple loop there is a already a problem, since MATLAB cannot store distinct integers up to 1e18, which is beyond the capacity of 53 bits mantissa coding. You don't loop over all integers up to 1e18 as you would expected.
Réponse acceptée
Steven Lord
le 25 Avr 2023
How long does your loop body take to execute? If it's one loop body execution per second, running all 1e18 of those iterations would take:
y = years(seconds(1e18))
31 billion years. According to Wikipedia's timeline of the far future Earth likely will have been destroyed about 23 billion years before the end of execution of that loop, when the Sun expands into a red giant.
If you could do a thousand loop body executions per second:
y = years(seconds(1e18/1000))
you're looking at "only" 3 million years. But it'll take fewer days to run the last couple iterations of that loop, since a day on Earth is expected to be one minute longer than it is today.
You can't exactly represent all the numbers between 1 and 1e18 as double precision numbers. So your attempt to choose a random integer-valued double in that range will fail.
1e18 > flintmax
Going to uint64 would help.
1e18 > double(intmax('uint64'))
So you could use randi to generate two 32-bit integers and typecast the result to uint64. [randi does not let you directly generate 64-bit integers.]
x = typecast(randi([intmin('uint32'), intmax('uint32')], 1, 2, 'uint32'), 'uint64')
Reject any trials where x is greater than:
u = uint64(10)^18
The odds of repeating the same value of x over a reasonable amount of time is likely very, very small.
8 commentaires
Walter Roberson
le 25 Avr 2023
Suppose that you had a random permutation generator that could generate additional numbers incrementally rather than having to generate the entire sequence at once (which would have a heavy memory penalty).
Now, at any given time, how does the algorithm know whether it has generated a particular value already if there is not at least a binary "already used" map, size proportional to log2(n) ? (so, 2^57 bytes in the case of n = 1e18) ?
The answer is that it would have to be an algorithm that cannot generate a previous value, and so does not need to keep track of where it has been.
Do such algorithms exist? Well, yes and no. In the case where the n to generate over is a prime, then you can use a https://en.wikipedia.org/wiki/Linear_congruential_generator Linear Congruential Generator . With proper n and proper selection of parameters, such generators are certain not to repeat until they reach the end of the sequence. The problem is that the random statistics are Not Good. For example if you plot the values over a 2D box, you will see distinct lines being filled out, rather than the space being filled randomly.
Are there alternatives? Maybe. The discussion at https://en.wikipedia.org/wiki/Mersenne_Twister refers to some alternatives. Mersenne Twister itself is not suitable for such a purpose: it uses a state vector of 624 words, and any one word can (will!!) repeat before the end of the overall period; when used as a pseudo-random number generator, it is deliberate that MT can repeat individual entries: in most cases when a permulation is not being generated, if the generator cannot produce duplicates before the end of the period, then you would be able to use that fact to tell that you are not generating true random integers. (With true random integers, the more integers you generate, the more likely it becomes that you will generate an integer that has already been generated.)
Plus de réponses (1)
Bruno Luong
le 27 Avr 2023
Modifié(e) : Bruno Luong
le 27 Avr 2023
If you don't care about finding optimal solution, why not just do
n = 1e15; % 1e18 has problem of coding as in my comment above
for j = 1:n
i = randi(n);
% do something with i
fprintf('i=%d\n',i);
end
This is more or less has periodicity as Walter comment of LCG (yes behind rand, randi, randperm is Merssene Twister), but collision occurs when compress in smaller range 1:n. But you say you want to find suboptimal solution,so collision should only have negligible impact to runing time and not on randomness covering.
5 commentaires
Bruno Luong
le 27 Avr 2023
Modifié(e) : Bruno Luong
le 27 Avr 2023
Remember my warning: you cannot store in standard matlab double any integer larger than 2^53 ~ 1e16. Let alone doing any meaningful arithmetics and computing prime beyond that.
Voir également
Catégories
En savoir plus sur Historical Contests dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!