Effacer les filtres
Effacer les filtres

Extracting data from histogram plots

10 vues (au cours des 30 derniers jours)
Haley Royer
Haley Royer le 10 Mar 2023
Modifié(e) : Adam Danz le 11 Mar 2023
Hello. I'm trying to process some data from some chemical analyses I did a while ago. I have 3 types of data: particle diameter, nitrogen content (%), and sulfur content (%). I've already managed to organize the particle diameter data into a histogram plot with something like 50 bins. Now, I'd like to figure out the average nitrogen and sulfur content of the particles in each bin. I'm not sure how to do this, though, and I haven't found any obvious tutorials to explain how to do this. Any advice?

Réponse acceptée

Adam Danz
Adam Danz le 10 Mar 2023
Modifié(e) : Adam Danz le 11 Mar 2023
3 methods to group data and compute mean for each group
Each method deals with empty bins differently.
discretize + splitapply
Use discretize to group each value into the bins used in histogram and then splitapply to compute the mean for each group. Note that each bin must contain at least one data point.
Example: compute the mean of data in bins defined by edges.
rng default % for reproducibility of this demo
data = rand(1,100)*100;
edges = 0:10:100;
binID = discretize(data,edges)
binID = 1×100
9 10 2 10 7 1 3 6 10 10 2 10 10 5 9 2 5 10 8 10 7 1 9 10 7 8 8 4 7 2
a = splitapply(@mean,data,binID)
a = 1×10
5.3838 15.2780 26.0259 35.6310 46.5284 55.4195 66.1338 75.5438 83.3041 94.1885
discretize + groupsummary
Use discretize to group each value into the bins and then groupsummary to compute the mean of each group. When working with vectors, the first two arguments must be column vectors.
Note that the output vector skips empty bins. See additional outputs to groupsummary to identify which bins are represented in the first output.
s = groupsummary(data(:),binID(:),'mean')
s = 10×1
5.3838 15.2780 26.0259 35.6310 46.5284 55.4195 66.1338 75.5438 83.3041 94.1885
discretize + accumarray
Use discretize to group each value into the bins and then accumarray to compute the mean of all bins.
Note that empty bins are represented by a 0.
m = accumarray(binID(:),data,[],@mean)
m = 10×1
5.3838 15.2780 26.0259 35.6310 46.5284 55.4195 66.1338 75.5438 83.3041 94.1885
Comparison of these methods when some bins are empty
data = randn(100,1)+10; % expected range: ~6 : ~13
edges = 0:3:15; % 5 bins but the first two will be empty
binID = discretize(data, edges);
m = accumarray(binID,data,[],@mean)
m = 5×1
0 0 8.4699 10.1766 12.7170
s = groupsummary(data,binID(:),'mean')
s = 3×1
8.4699 10.1766 12.7170
a = splitapply(@mean,data,binID)
Error using splitapply
For N groups, every integer between 1 and N must occur at least once in the vector of group numbers.
  7 commentaires
Haley Royer
Haley Royer le 11 Mar 2023
Hi again. I've run into another issue. When trying to use splitapply I get the following error
"Group numbers must be a vector of positive integers, and cannot be a sparse vector."
My understanding is that because I have values in my column that are zero, splitapply cannot be used. Some of the particles I'm looking at don't have nitrogen or sulfur, but I still have to average a group such as 0 0 0 5.0 2.5. Any way to get around this?
Adam Danz
Adam Danz le 11 Mar 2023
Let's keep it civil here.
As you mentioned, if one of the bins have no values, then splitapply won't work.
I'll add alternatives to my answer.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Genomics and Next Generation Sequencing dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by