Divide a data set into 4 parts so that the sum of each part 1/4th of the total

7 vues (au cours des 30 derniers jours)
I want to divide a data set into four groups such that the sum of elements of each group is approximately same.
for eg: [10, 5, 1, 20, 5, 22, 4, 15]
For the above data set: sum of all the elements = 82
So, I want this data set to be divided into 4 groups such that, the sum of elements of each group is almost same.
One such possibility is
Set 1: 10, 5, 4,1
Set 2: 20
Set 3: 22
Set 4: 15,5
How do I set up this?

Réponse acceptée

Image Analyst
Image Analyst le 16 Juin 2019
Modifié(e) : Image Analyst le 16 Juin 2019
I'd just sort them and then take the CDF and look for percentages:
c = cumsum(sort(data, 'ascend'));
c = c / c(end); % Normalize from 0 to 1
c25 = find(c>0.25, 1, 'first');
c50 = find(c>0.5, 1, 'first');
c75 = find(c>0.75, 1, 'first');
At least that's one way that might work, though it would work best for lots of data rather than just a few elements like you have.
  4 commentaires
Nagendra Reddy
Nagendra Reddy le 16 Juin 2019
Modifié(e) : Nagendra Reddy le 16 Juin 2019
I am really clueless of what 4 sets your code is suggesting. Could you please tell me.
If I am not wrong it is suggesting the following 3 sets
1, 4, 5, 5, 10
15, 20
22
Image Analyst
Image Analyst le 16 Juin 2019
Try this:
data = [10, 5, 1, 20, 5, 22, 4, 15]
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));
You get
group1 =
1 4 5 5
group2 =
10 15
group3 =
20
group4 =
22
The sum of group 1 is 15 = 18.29268%
The sum of group 2 is 25 = 30.48780%
The sum of group 3 is 20 = 24.39024%
The sum of group 4 is 22 = 26.82927%
but for a much larger set, it's better:
numElements = 100000;
maxValue = 99;
data = randi(maxValue, 1, numElements);
sortedc = sort(data, 'ascend');
c = cumsum(sortedc);
c = c / c(end); % Normalize from 0 to 1
c25 = find(c < 0.25, 1, 'last')
c50 = find(c < 0.5, 1, 'last')
c75 = find(c < 0.75, 1, 'last')
group1 = sortedc(1:c25);
group2 = sortedc(c25+1:c50);
group3 = sortedc(c50+1:c75);
group4 = sortedc(c75+1:end);
sumOfGroup1 = sum(group1)
sumOfGroup2 = sum(group2)
sumOfGroup3 = sum(group3)
sumOfGroup4 = sum(group4)
fprintf('The sum of group 1 is %d = %.5f%%\n', sumOfGroup1, 100 * sumOfGroup1 / sum(sortedc));
fprintf('The sum of group 2 is %d = %.5f%%\n', sumOfGroup2, 100 * sumOfGroup2 / sum(sortedc));
fprintf('The sum of group 3 is %d = %.5f%%\n', sumOfGroup3, 100 * sumOfGroup3 / sum(sortedc));
fprintf('The sum of group 4 is %d = %.5f%%\n', sumOfGroup4, 100 * sumOfGroup4 / sum(sortedc));
The sum of group 1 is 1250676 = 24.99972%
The sum of group 2 is 1250679 = 24.99978%
The sum of group 3 is 1250651 = 24.99922%
The sum of group 4 is 1250755 = 25.00129%
If the accuracy of the CDF method is not accurate enough for your small groups then I think the one approach you might take is to just take every single permutation and check which had the average absolute deviation closest to 25%. I don't have code for that and probably won't write any. I'm assuming you just gave a very small set of data just for a simple example and that your actual data is much larger. Good luck.

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by