Permutations of string without repetitions
Afficher commentaires plus anciens
I want to create all possible permutations of a string without duplicates. For example:
Input: 'AA' should give output: 'AA'
Input: 'AB' should give outputs: 'AB','BA'
Input: 'ABA' should give outputs: 'ABA' 'AAB' 'BAA'
My use case string can be as long as 25 characters so speed is crucial. Any help would be greatly appreciated.
2 commentaires
"My use case string can be as long as 25 characters"
In the worst case you could have 25 different characters, which means there could be 25! different results... that is 1.6e25 results (or sixteen septillion in words), which would require more than three hundred yobi bytes (or 300 YiB) of data storage.
Where do you plan on storing this data?
Although beginners often think that a computer is infinitely powerful and has infinite memory, in reality they don't. It is easy, in just a few lines, to define a practically unsolvable problem. You need to reconsider your algorithm.
Raunak Bhattacharyya
le 5 Mai 2016
Réponses (4)
Roger Stafford
le 4 Mai 2016
As Stephen has warned you, twenty-five characters is a dangerous number to apply this problem to. Let’s talk about some more sensible number such as twelve characters. Its worst case is 12! = 479001600. Suppose for the sake of discussion that three of the string are A’s, two are B’s, five are C’s, and two are D’s, totaling twelve characters altogether. The total number of possible different strings of such a set is:
12!/(3!*2!*5!*2!) = 166320
This can be expressed as;
12!/(3!*9!) * 9!/(2!*7!) * 7!/(5!*2!) =
220 * 36 * 21 = 166320
This fact gives us a clue as to how this problem can be done. The first step is to find out how many combinations we can form out of twelve characters in which three are A’s and nine are otherwise where we don’t distinguish between the three other characters. The matlab function ‘nchoosek’ will solve that problem for us with 220 ways. At the next stage, for each of these 220 possibilities we ask how many ways can we let two of the remaining nine be B’s and the other seven be otherwise. Again ‘nchoosek’ gives us the answer with 36 ways. Finally we ask, for each possible choice of the A’s and B’s, how many ways are there of choosing five C’s and two D’s out of the remaining seven, and again ‘nchoosek’ comes to the rescue and gives 21 possibilities.
As I see it, this is a problem that calls for recursion, though I will not lay out the details. First you would have to use probably the ‘unique’ and ‘histc’ functions to find the number of different characters and how many there were of each kind. Then each recursive step would use ‘nchoosek’ to work out combinations within combinations to go through all the possibilities. It will take some careful thinking to write such a recursion correctly.
I hope I have given you enough information to inspire you to solve your problem in a generalized manner. I regret that I do not have enough time to lay out all the necessary details. In the following Answers contributions I have given examples where a fixed number of different objects was given with a fixed number for each. The recursion routines you would write would have to generalize this:
http://www.mathworks.com/matlabcentral/answers/84215
http://www.mathworks.com/matlabcentral/answers/69232
http://www.mathworks.com/matlabcentral/answers/112670
http://www.mathworks.com/matlabcentral/answers/140986
John D'Errico
le 6 Mai 2016
Modifié(e) : John D'Errico
le 6 Mai 2016
Since there are only two possible elements in the string...
In the case of 2 distinct elements in the "string", assume a string of length N, with k of those elements as 'A', and N-k elements as 'B'.
So, there are nchoosek(N,k) distinct strings that satisfy the requirement.
Lets solve the problem using a sample string,
str = 'AAAABB';
N = numel(str);
k = sum(str == 'A');
So there are k=4 'A' elements, and N-k=2 of the 'B' elements. Now think about what this does for you?
C = nchoosek(1:N,N-k);
Think about that as an array that indicates the locations of the 'B' elements.
nc = size(C,1);
allstrs = accumarray([repmat((1:nc)',N-k,1),C(:)],1);
ab = 'AB';
allstrs = ab(allstrs + 1);
So the list of combinations is:
allstrs
allstrs =
BBAAAA
BABAAA
BAABAA
BAAABA
BAAAAB
ABBAAA
ABABAA
ABAABA
ABAAAB
AABBAA
AABABA
AABAAB
AAABBA
AAABAB
AAAABB
Since this was done constructively, it will be fairly efficient. I did not generate a large list, then delete those that were not appropriate, which would have been far less efficient. Of course, you can still VERY easily create a problem where even this scheme will fail miserably.
For example, a string of length 100, where there are 50 A's and 50 B's?
nchoosek(100,50)
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
> In nchoosek (line 92)
ans =
1.00891344545564e+29
Its just not going to happen anyway you will try to solve it. That is simply too large of a set to generate on any computer you will find in the foreseeable future.
3 commentaires
Raunak Bhattacharyya
le 8 Mai 2016
@Raunak Bhattacharyya: there are several FEX submissions which loop over combinations one-at-a-time, without generating them all in one matrix:
You can use one these to replace nchoosek in John D'Errico's answer (with other changes required too: loop, etc).
Stephen23
le 4 Mai 2016
>> unique(perms('AA'),'rows')
ans =
AA
>> unique(perms('AB'),'rows')
ans =
AB
BA
>> unique(perms('ABA'),'rows')
ans =
AAB
ABA
BAA
Azzi Abdelmalek
le 4 Mai 2016
str='ABA'
out=unique(str(perms(1:numel(str))),'rows')
Catégories
En savoir plus sur Structures dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!