How should I try to fit this probability distribution?
Afficher commentaires plus anciens

I have plotted the kernel density estimate for my scatter of data. One tail is fairly Gaussian in appearance, the other is skewed. I have tried fitting a gamma distribution, a skew-normal distribution, a log-normal distribution as well a high order polynomial but all of these are very poor fits. I would like to be able to reconstruct the PDF such that I can extract the left tail value corresponding to a CDF value of about 1E-17. Any ideas on how to proceed, perhaps even without explicitly trying to fit some type of PDF?
5 commentaires
Ameer Hamza
le 15 Juin 2020
What is your goal? Do you want to generate random numbers using this distribution?
Jeff Miller
le 15 Juin 2020
How many data points do you have?
Joseph Fustero
le 15 Juin 2020
John D'Errico
le 15 Juin 2020
extract the left tail value corresponding to a CDF value of about 1E-17.
This is incredibly optimistic, given that you had only 10000 points in the sample. Especially more so since you have no idea what distribution would make sense here. I.e., the ;left hand side "looks" like X, and the right hand side looks sort of...
When that is the case, a non-parametric scheme would seem most logical. (NEVER USE A HIGH ORDER POLYNOMIAL!!!!!! Certainly not for something like this.) Regardless, you have no realistic chance to predict that far into the tail. Essentially there is little meaningful you can predict beyond around the 0.0001 point in either tail, since then you are extrapolating. And going down to 1e-17 is a serious amount of extrapolation - 13 powers of 10 past the point where you can do something intelligent?
Jeff Miller
le 16 Juin 2020
I certainly agree with John that a nonparametric scheme has no chance here since the number of data points is far too small to estimate the tiny percentile point that you want to identify. Ergo, the only way to proceed is by making some parametric assumption.
Since you are only interested in the left tail, I wonder whether you should be thinking about just fitting the lower tail instead of fitting the whole distribution. If the observed distribution less than (say) 3 is well described by a normal distribution truncated at 3, for example, then you might use statistics from the truncated sample to estimate parameters of the truncated normal distribution, and from there extract the percentile point you want.
It's very seat-of-the-pants and rests on untestable assumptions, but maybe better than nothing & you don't have any good options.
Réponses (1)
Ameer Hamza
le 15 Juin 2020
Modifié(e) : Ameer Hamza
le 15 Juin 2020
2 votes
See non-parametric probability distribution (i.e., pdf is not expressed explicitly) fitting in MATLAB: https://www.mathworks.com/help/stats/nonparametric-and-empirical-probability-distributions.html
Empherical: https://www.mathworks.com/help/stats/ecdf.html
Piecewise-linear: https://www.mathworks.com/help/stats/piecewise-distributions.html
Generalized Pareto: https://www.mathworks.com/help/stats/generalized-pareto-distribution.html
Catégories
En savoir plus sur Exploration and Visualization dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!