How should I try to fit this probability distribution?

Question

0 votes

I have plotted the kernel density estimate for my scatter of data. One tail is fairly Gaussian in appearance, the other is skewed. I have tried fitting a gamma distribution, a skew-normal distribution, a log-normal distribution as well a high order polynomial but all of these are very poor fits. I would like to be able to reconstruct the PDF such that I can extract the left tail value corresponding to a CDF value of about 1E-17. Any ideas on how to proceed, perhaps even without explicitly trying to fit some type of PDF?

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

John D'Errico le 15 Juin 2020

extract the left tail value corresponding to a CDF value of about 1E-17.

This is incredibly optimistic, given that you had only 10000 points in the sample. Especially more so since you have no idea what distribution would make sense here. I.e., the ;left hand side "looks" like X, and the right hand side looks sort of...

When that is the case, a non-parametric scheme would seem most logical. (NEVER USE A HIGH ORDER POLYNOMIAL!!!!!! Certainly not for something like this.) Regardless, you have no realistic chance to predict that far into the tail. Essentially there is little meaningful you can predict beyond around the 0.0001 point in either tail, since then you are extrapolating. And going down to 1e-17 is a serious amount of extrapolation - 13 powers of 10 past the point where you can do something intelligent?

Jeff Miller le 16 Juin 2020

I certainly agree with John that a nonparametric scheme has no chance here since the number of data points is far too small to estimate the tiny percentile point that you want to identify. Ergo, the only way to proceed is by making some parametric assumption.

Since you are only interested in the left tail, I wonder whether you should be thinking about just fitting the lower tail instead of fitting the whole distribution. If the observed distribution less than (say) 3 is well described by a normal distribution truncated at 3, for example, then you might use statistics from the truncated sample to estimate parameters of the truncated normal distribution, and from there extract the percentile point you want.

It's very seat-of-the-pants and rests on untestable assumptions, but maybe better than nothing & you don't have any good options.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question