How to create a boxplot from a PDF?
9 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hello!
I have a somewhat embarrassing question, but me and my colleagues cannot figure it out since several days. Thinking block ^^ So I would appreciate help!
I have a pdf of my data called pdfxcor (598x1), which resembles a normal distribution when I plot it along a x-axis resembling the molecular weight of my data (called pixelweight (598x1)).
plot(pixelweight,pdfxcor)
boxplot(pdfxcor)
I want to display the distribution as boxplot according to the correct molecular weight.
Thanks for your patience! :)
Jette
0 commentaires
Réponse acceptée
Teja Muppirala
le 23 Avr 2013
How about something like this. Generate the CDF from your data as Tom suggested, invert it, use the inverted CDF to generate a bunch of samples that follow your distribution exactly, and send those to BOXPLOT:
%%Just making some data that resembles yours
x = linspace(1000,12000,598);
P = normpdf(x,5800,1800);
figure, plot(x,P), title('PDF');
%%Generate the CDF
C = cumsum(P);
C = C/C(end);
figure, plot(x,C); title('CDF');
%%Sample linearly along the inverse-CDF to get a bunch of points
% that have your same distribution
BigNumber = 100000;
p = interp1(C,x,linspace(C(1),C(end),BigNumber));
figure, hist(p,100); % Confirm p indeed has your distribution
figure ,h = boxplot(p);
delete(findobj(h,'tag','Outliers')) % Hide the outliers
4 commentaires
Tom Lane
le 23 Avr 2013
It looks like your distribution is not symmetric. The normal distribution is symmetric, so it would not resemble the histogram in that respect.
Plus de réponses (1)
Tom Lane
le 22 Avr 2013
The boxplot shows the median, lower quartile, and upper quartile. You may be able to calculate these for your pdf. For example, if you have the pdf as a numeric vector, you might compute cumsum on the vector, then divide by the last value to impose the correct probability normalization, then interpolate.
The boxplot also shows a notion of the range of the data, and sometimes outliers. These are harder to extend to a pdf. You could decide that you want to compute the 1% and 99% points as in the previous paragraph, and use those to represent the end points of the range. You could decide not to show outliers.
Plotting these as lines or points will be relatively simple. It would be more of a challenge to plot them in exactly the way that the boxplot function does.
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!