Divide a data set into 4 parts so that the sum of each part 1/4th of the total
7 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Nagendra Reddy
le 16 Juin 2019
Commenté : Image Analyst
le 16 Juin 2019
I want to divide a data set into four groups such that the sum of elements of each group is approximately same.
for eg: [10, 5, 1, 20, 5, 22, 4, 15]
For the above data set: sum of all the elements = 82
So, I want this data set to be divided into 4 groups such that, the sum of elements of each group is almost same.
One such possibility is
Set 1: 10, 5, 4,1
Set 2: 20
Set 3: 22
Set 4: 15,5
How do I set up this?
0 commentaires
Réponse acceptée
Image Analyst
le 16 Juin 2019
Modifié(e) : Image Analyst
le 16 Juin 2019
I'd just sort them and then take the CDF and look for percentages:
c = cumsum(sort(data, 'ascend'));
c = c / c(end); % Normalize from 0 to 1
c25 = find(c>0.25, 1, 'first');
c50 = find(c>0.5, 1, 'first');
c75 = find(c>0.75, 1, 'first');
At least that's one way that might work, though it would work best for lots of data rather than just a few elements like you have.
4 commentaires
Image Analyst
le 16 Juin 2019
Try this:
data = [10, 5, 1, 20, 5, 22, 4, 15]
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));
You get
group1 =
1 4 5 5
group2 =
10 15
group3 =
20
group4 =
22
The sum of group 1 is 15 = 18.29268%
The sum of group 2 is 25 = 30.48780%
The sum of group 3 is 20 = 24.39024%
The sum of group 4 is 22 = 26.82927%
but for a much larger set, it's better:
numElements = 100000;
maxValue = 99;
data = randi(maxValue, 1, numElements);
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));
The sum of group 1 is 1250676 = 24.99972%
The sum of group 2 is 1250679 = 24.99978%
The sum of group 3 is 1250651 = 24.99922%
The sum of group 4 is 1250755 = 25.00129%
If the accuracy of the CDF method is not accurate enough for your small groups then I think the one approach you might take is to just take every single permutation and check which had the average absolute deviation closest to 25%. I don't have code for that and probably won't write any. I'm assuming you just gave a very small set of data just for a simple example and that your actual data is much larger. Good luck.
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Solver Outputs and Iterative Display dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!