Generate normally distributed sample from data

2 vues (au cours des 30 derniers jours)
Andrea C
Andrea C le 8 Déc 2019
Commenté : Andrea C le 9 Déc 2019
Hi,
I have an array with many (>800000) rows. I want to select from one column 51 values to generate a new array with 51 normally distributed data. The values range from 0 to 10.
How can I do that?
Thanks,
Andrea

Réponse acceptée

Thiago Henrique Gomes Lobato
Modifié(e) : Thiago Henrique Gomes Lobato le 8 Déc 2019
I need to be careful to not start any discussion about how one actually define a normal distribution, but starting from the point that you don't want a exact perfect definition of normal distributed data you can use the Anderson-Darling test. The idea is to randomly sample 51 points from your array and them check if they are normal or not. To get it more robust, you can simply save the value with the highest p-value:
rng(33)
ArraySize = 80000;
A = rand(ArraySize,1); % not normal
A(500:1000) = randn(501,1); % normal
Founded = 0;
MaxIter = 1000;
Maxp = 0;
Ite = 1;
while ~Founded && Ite<MaxIter
SampledIndex = randperm(ArraySize,51); % Sample from your array
Asampled = A(SampledIndex);
[h,p] = adtest(Asampled); % Check if normal
% You can theoretically umcomment this, I however belive that looking at the max p
% is more robust
%Founded = ~h; % 0 if normal (can't reject the null hypotesis it is not normal)
if p>Maxp % Save the one that got the closest
BestAsoFar = Asampled;
Maxp = p;
end
Ite = Ite+1;
end
histogram(BestAsoFar)
  2 commentaires
Walter Roberson
Walter Roberson le 8 Déc 2019
? This looks like it cherry picks samples to find a subset that is approximately normally distributed??
Andrea C
Andrea C le 9 Déc 2019
Geat, it works.
This is exactly what I was looking for!

Connectez-vous pour commenter.

Plus de réponses (1)

Walter Roberson
Walter Roberson le 8 Déc 2019
You can only do that under the circumstance that the column already contains normally distributed samples. If that is the case then you could use randperm() to select indices to extract from.
However, values in the range 0 to 10 are not normally distributed: normally distributed values have infinite tails in both directions. When you have a fixed finite range such as 0 to 10, then the closest you can get is a Beta distribution.

Produits


Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by