Effacer les filtres
Effacer les filtres

How to fit lognormal distribution to a dataset which contains some zero values?

43 vues (au cours des 30 derniers jours)
Payel
Payel le 10 Août 2023
Modifié(e) : dpb le 13 Août 2023
How to fit lognormal distribution to a dataset which contains some zero values?
  4 commentaires
Walter Roberson
Walter Roberson le 11 Août 2023
Exact zeros for rainfall values are common. The overall dataset cannot be lognormal. To get any further with a lognormal distribution you would have to start doing calculations based upon absolute humidity or relative humidity measured multiple times over the day so that you could calculate "available water"
Image Analyst
Image Analyst le 12 Août 2023
Please upload a screenshot of your distribution plotted.

Connectez-vous pour commenter.

Réponses (2)

Walter Roberson
Walter Roberson le 10 Août 2023
Modifié(e) : Walter Roberson le 10 Août 2023
Don't do that?
There are a small number of possibilities in that situation:
  1. That the log-normal distribution is just a wrong model for the system and you should be chosing a different model instead
  2. That the zeros are place holders for errors in the data. In such a case those measurements should be removed before trying to fit the data
  3. That the zeros are round-off for small measurements, perhaps due to limited precision of sensors. You will not be able to learn anything useful from those measurements, so you should remove them before trying to fit the data
  4. That the zeros are caused by noise in the system. In such a case, log-normal model is not going to apply, but you might be able to obtain an approximation by removing the zeros (and negatives) before trying to fit the data
  5. That the zeros are correct points, representing locations where the parameters are negative infinity. I would imagine that there are several papers to be written about the physics of such a system, which would probably have deep connections to Bose-Einsten Condensates and to Planck Distances...
  1 commentaire
John D'Errico
John D'Errico le 10 Août 2023
Modifié(e) : John D'Errico le 10 Août 2023
Be careful.
If the zeros are just low values that were "rounded" off to zero, then simply removing them will be a problem. Essentially you are biasing the estimate, since they SHOULD have been really small values. You are now estimating the parameters of a censored sample.
If that is the case, then you probably need to use MLE for a left censored sample.
A comparable example might be to estimate the distribution parameters of a normal distribution, but where all of the negative numbers were simply discarded. For example:
n = 100000;
x = randn(n,1);
x(x<0) = [];
mean(x)
ans = 0.8001
var(x)
ans = 0.3656
As you should expect, any attempt to estimate the normal parameters (which here should be (0,1)) will fail, unless you treat this properly as a censored sample.
The point being, you want to understand where the zeros are coming from, and deal with them properly.

Connectez-vous pour commenter.


dpb
dpb le 11 Août 2023
Modifié(e) : dpb le 13 Août 2023
One analysis technique for daily rainfall modeling divides the problem into two parts -- a "wet-day" model that predicts rainfall amounts for those days that rainfall occurs and an independent Markov or stochastic renewal model to predict the occurrence of the zero-rainfall days.
The Pearson Type-3 or the two-parameter gamma distributions have been able to do a reasonable job of modeling point location rainfall for wet-day amount predictions. There's extensive literature in the subject field...

Produits


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by