How to split data set into multiple bins and perform condition statement on bins
Afficher commentaires plus anciens
I have a logical data set and I am trying to divide it into 31 bins but my data points are not evenly distributing and I need it to be divided into 31 bins so that I can then run an if statement that counts the total number of ones in each bin and compares it to a condition.
Réponses (1)
Use the discretize() command.
to assign bin labels to each of your data points.
11 commentaires
Matt J
le 24 Mar 2022
Also, splitapply() is useful for applying a function on data with a common bin label.
Keaton Looper
le 24 Mar 2022
Matt J
le 24 Mar 2022
Which discretize() command would I use?
The one at the link I gave you.
Will it do it even if my data can’t be distributed evenly between the bins?
You will see on the documentation page, that you can specify the bin boundaries.
Keaton Looper
le 25 Mar 2022
Keaton Looper
le 25 Mar 2022
Let’s say I have a data set Data = [0 0 0 1 0 1 1 0 1 1 1 0] And I wanted to split that data set into 3 smaller data sets.
According to what rule are the elements to be assigned to different bins in this example? I thought that was the essence of your question. If I were to assign them to bins randomly, I could proceed as follows:
Data = [0 0 0 1 0 1 1 0 1 1 1 0];
binLabels=randi(3,1,numel(Data))
numOnes=accumarray(binLabels',Data')
As a check on this, we can form the 3 groups by doing,
groups = splitapply(@(x){x},Data,binLabels )
You can see that numOnes gives the correct tally per group are accurate.
and if it is greater or equal to 1 then count that event as a one. Then count the total amount of occurrences.
Simply do,
overallCount = sum(numOnes>=1)
Keaton Looper
le 25 Mar 2022
Keaton Looper
le 25 Mar 2022
So my example should give me a count of 3.
No, that would depnd on how the data is split into subsets. Even without randomization, a subset may contain no ones, like in the following, where the subsets are sequential.
Data = [0 0 0 1 0 1 1 0 1 1 1 0];
binLabels=[1 1 1 2 2 2 2 3 3 3 3 3];
numOnes=accumarray(binLabels',Data')
overallCount = sum(numOnes>=1)
Once again, because you haven't specified the details of the processing with enough care and detail, we get a result you don't expect. Perhaps you meant for the subsets to be interleaved. That does give your expected result:
binLabels=[1 2 3 1 2 3 1 2 3 1 2 3];
numOnes=accumarray(binLabels',Data')
overallCount = sum(numOnes>=1)
Catégories
En savoir plus sur MATLAB Parallel Server dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!