Effacer les filtres
Effacer les filtres

Generate large nchoosek in batches?

20 vues (au cours des 30 derniers jours)
Peta
Peta le 22 Sep 2016
Modifié(e) : John D'Errico le 22 Sep 2016
I would like to generate all possible combinations of a binary vector, I’ve tried using nchoosek and it does what I want it to do, but I run out of memory if I have a k value of more than about 4.
Is there any way that I can put the nchoosek in a loop and trade speed for memory usage by generating the combinations in batches instead of all at once?
Say I need
nchoosek(1:200,5)
is there a clever way to evaluate say the first 20% of the combinations, do what I need to do with them and then throw them away before I generate the next 20% of the nchoosek(1:200,5) sequence?
Thanks.

Réponse acceptée

John D'Errico
John D'Errico le 22 Sep 2016
Modifié(e) : John D'Errico le 22 Sep 2016
No. Nchoosek is not written in any way to allow you to choose only some reduced subset. In fact, the set of 200 bit binary numbers has a HUGE number of subsets with 5 bits set. I debated showing you how to write such a code, but then you would immediately want to compute combinations of 6 or 7 or 50 of the elements from that set in batches, and you would very quickly overwhelm even a batched version.
The set of combinations that you want to generate gets HUGE, and does so very quickly.
I'll relent just a tiny bit though. The set that you want to generate is simple enough to compute.
Consider if bit 1 is set. Then you need to generate all sets of 199 bits, with exactly 4 of them chosen. You already know how to do this. If bit 1is not set, then you need to generate all possible subsets of 5 bits, choosing from 199 bits. So you can do the whole thing recursively, until the sets become small enough to generate.
  2 commentaires
Peta
Peta le 22 Sep 2016
Isn’t the total number of combinations just:
200! / (5! (200 - 5)!) which would roughly be 2 535 650 040?
Under 3 billion in other words. A pathetically small, almost negligible, number compared to say for example the U.S. national debt etc.?
Surely this is something that a computer in the year 2016 should be able to do without even breaking a sweat?
John D'Errico
John D'Errico le 22 Sep 2016
Modifié(e) : John D'Errico le 22 Sep 2016
Sigh. People think their computers are infinitely large and infinitely fast. TV and the movies spoil them.
It is true that
nchoosek(200,5)
ans =
2.5357e+09
So 2.5 billion combinations.
But each such combination generates one row of a 2.5 billion by 5 matrix.
So there are
nchoosek(200,5)*5
ans =
1.2678e+10
elements in the array you are trying to generate. But worse, each of those elements is a double. So 8 bytes per element.
nchoosek(200,5)*5*8
ans =
1.0143e+11
So roughly 100 gigabytes of RAM to store. Usually more that that, since every operation you do with an array sometimes forces the allocation of temporary memory to store some modification of that array. You should always try to keep several times as much memory available as your largest array to be safe, and even then, you are usually pushing your luck, because memory becomes easily fragmented, preventing huge arrays to be stored.
So do you have several hundred gigabytes of addressable (AND CONTIGUOUS) RAM on your machine? Remember that if MATLAB needs to go deeply into virtual memory here, using your hard disk, expect things to get REALLY SLOW, even if it does not just give up for lack of memory.
A = nchoosek(1:50,5);
B = nchoosek(uint8(1:50),5);
whos A B
Name Size Bytes Class Attributes
A 2118760x5 84750400 double
B 2118760x5 10593800 uint8
So you can save just a wee bit by using uint8, since a uint8 can store the integers 1:200. That might just make your current problem solvable. But tomorrow you will be back again, wailing about how to compute nchoosek(uint8(1:200),6), and why do you run out of memory.

Connectez-vous pour commenter.

Plus de réponses (1)

Stephen23
Stephen23 le 22 Sep 2016
  1 commentaire
Peta
Peta le 22 Sep 2016
Hm, is this really the same thing as nchoosek? It seems the function you linked only calculates the combinations of 2 rows of a matrix.
I don’t see how I would call that function to achieve the same effect as nchoosek

Connectez-vous pour commenter.

Catégories

En savoir plus sur Loops and Conditional Statements dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by