For loop over permutations of 1:n with very large n

8 vues (au cours des 30 derniers jours)
Davide Zorzenon
Davide Zorzenon le 25 Avr 2023
Commenté : Walter Roberson le 27 Avr 2023
Question
Take the following instruction:
for i = 1:1e18
% do something
end
now, probably the execution of the code will take a looong time. I am not concerned with this (see section Context to see why), but with the following problem: consider the instruction
for i = randperm(1e18)
% do something
end
Running the code above returns:
Error using randperm
Requested array exceeds the maximum possible variable size.
My question is: how to perform a for loop over a permutation of the elements of vector 1:n where n is a very large number, without incurring in memory issues? Moreover, how does Matlab perform the first for loop without incurring in such issues?
Context
I am building a simple program with a GUI to solve integer optimization problems. The for loop is used to evaluate the optimization function in the whole search space (which can be very large), and the GUI has a button that the user can press to "break" the loop; when the user does this, the program returns a suboptimal solution of the optimization problem. The second for loop would thus represent a simple implementation of a random search.
  2 commentaires
Davide Zorzenon
Davide Zorzenon le 26 Avr 2023
See here for a related question
Bruno Luong
Bruno Luong le 27 Avr 2023
Modifié(e) : Bruno Luong le 27 Avr 2023
for i = 1:1e18
% do something
end
Even that simple loop there is a already a problem, since MATLAB cannot store distinct integers up to 1e18, which is beyond the capacity of 53 bits mantissa coding. You don't loop over all integers up to 1e18 as you would expected.

Connectez-vous pour commenter.

Réponse acceptée

Steven Lord
Steven Lord le 25 Avr 2023
How long does your loop body take to execute? If it's one loop body execution per second, running all 1e18 of those iterations would take:
y = years(seconds(1e18))
y = 3.1689e+10
31 billion years. According to Wikipedia's timeline of the far future Earth likely will have been destroyed about 23 billion years before the end of execution of that loop, when the Sun expands into a red giant.
If you could do a thousand loop body executions per second:
y = years(seconds(1e18/1000))
y = 3.1689e+07
you're looking at "only" 3 million years. But it'll take fewer days to run the last couple iterations of that loop, since a day on Earth is expected to be one minute longer than it is today.
You can't exactly represent all the numbers between 1 and 1e18 as double precision numbers. So your attempt to choose a random integer-valued double in that range will fail.
1e18 > flintmax
ans = logical
1
Going to uint64 would help.
1e18 > double(intmax('uint64'))
ans = logical
0
So you could use randi to generate two 32-bit integers and typecast the result to uint64. [randi does not let you directly generate 64-bit integers.]
x = typecast(randi([intmin('uint32'), intmax('uint32')], 1, 2, 'uint32'), 'uint64')
x = uint64 11258641691898087842
Reject any trials where x is greater than:
u = uint64(10)^18
u = uint64 1000000000000000000
The odds of repeating the same value of x over a reasonable amount of time is likely very, very small.
  8 commentaires
Walter Roberson
Walter Roberson le 25 Avr 2023
Suppose that you had a random permutation generator that could generate additional numbers incrementally rather than having to generate the entire sequence at once (which would have a heavy memory penalty).
Now, at any given time, how does the algorithm know whether it has generated a particular value already if there is not at least a binary "already used" map, size proportional to log2(n) ? (so, 2^57 bytes in the case of n = 1e18) ?
The answer is that it would have to be an algorithm that cannot generate a previous value, and so does not need to keep track of where it has been.
Do such algorithms exist? Well, yes and no. In the case where the n to generate over is a prime, then you can use a https://en.wikipedia.org/wiki/Linear_congruential_generator Linear Congruential Generator . With proper n and proper selection of parameters, such generators are certain not to repeat until they reach the end of the sequence. The problem is that the random statistics are Not Good. For example if you plot the values over a 2D box, you will see distinct lines being filled out, rather than the space being filled randomly.
Are there alternatives? Maybe. The discussion at https://en.wikipedia.org/wiki/Mersenne_Twister refers to some alternatives. Mersenne Twister itself is not suitable for such a purpose: it uses a state vector of 624 words, and any one word can (will!!) repeat before the end of the overall period; when used as a pseudo-random number generator, it is deliberate that MT can repeat individual entries: in most cases when a permulation is not being generated, if the generator cannot produce duplicates before the end of the period, then you would be able to use that fact to tell that you are not generating true random integers. (With true random integers, the more integers you generate, the more likely it becomes that you will generate an integer that has already been generated.)
Davide Zorzenon
Davide Zorzenon le 26 Avr 2023
@Walter Roberson: thank you for your insightful comment. This is exactly what I was looking for.

Connectez-vous pour commenter.

Plus de réponses (1)

Bruno Luong
Bruno Luong le 27 Avr 2023
Modifié(e) : Bruno Luong le 27 Avr 2023
If you don't care about finding optimal solution, why not just do
n = 1e15; % 1e18 has problem of coding as in my comment above
for j = 1:n
i = randi(n);
% do something with i
fprintf('i=%d\n',i);
end
This is more or less has periodicity as Walter comment of LCG (yes behind rand, randi, randperm is Merssene Twister), but collision occurs when compress in smaller range 1:n. But you say you want to find suboptimal solution,so collision should only have negligible impact to runing time and not on randomness covering.
  5 commentaires
Bruno Luong
Bruno Luong le 27 Avr 2023
Modifié(e) : Bruno Luong le 27 Avr 2023
Remember my warning: you cannot store in standard matlab double any integer larger than 2^53 ~ 1e16. Let alone doing any meaningful arithmetics and computing prime beyond that.
Walter Roberson
Walter Roberson le 27 Avr 2023
https://en.m.wikipedia.org/wiki/Linear-feedback_shift_register

Connectez-vous pour commenter.

Catégories

En savoir plus sur Historical Contests dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by