Pick a value with some probability

Question

Francesco Pio le 21 Mai 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/1969574-pick-a-value-with-some-probability

Modifié(e) : John D'Errico le 22 Mai 2023

Hello everyone. Let's suppose we have a Gaussian distribution on people's height. there will be an average value with higher probability and values that deviate with lower probability. In the example, x contains some height samples and y contains the probability with which that element occurs.

x = [1.50 1.60 1.70 1.70 1.70 1.80 1.90];

mu = mean(x);

s = std(x);

y = normpdf(x, mu, s);

Let's suppose we want to take a random value from the Gaussian distribution keeping in mind the various different probabilities. How can i do that?

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

John D'Errico le 21 Mai 2023

Modifié(e) : John D'Errico le 21 Mai 2023

I think you don't understand what a gaussian distribution means. The mean and standard deviation of those points do not imply a Gaussian that has the same distribution as that set of heights. The mean and variance will be known, but those points do not follow a Gaussian.

As such, you are not taking those probabilities into account, IF you use that normal PDF in y.

If instead, what you really want to do is sample from the given distribution, then you could use a discrete distribution, with the specific probabilities.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Image Analyst le 22 Mai 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1969574-pick-a-value-with-some-probability#answer_1240964

Ouvrir dans MATLAB Online

Try randn with your desired mean and std

r = mu + s * randn(1000, 1); % 1000 random numbers

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

John D'Errico le 22 Mai 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1969574-pick-a-value-with-some-probability#answer_1241644

Modifié(e) : John D'Errico le 22 Mai 2023

Ouvrir dans MATLAB Online

Please stop asking the same question. There is NO way to know what distribution any set of data comes from. You can use tools to fit a distribution to some data. But that does not prove it is the true distribution. And even when you do, for example, fit a normal distribution to your data, that won't insure that the random samples from that normal distribution have the same distribution as your data.

You CAN decide to use a discrete distribution. Here, for example, you have a sample where 1.70 arises 3 times as often as the others.

help datasample
 DATASAMPLE Randomly sample from data, with or without replacement.
    Y = DATASAMPLE(DATA,K) returns K observations sampled uniformly at random,
    with replacement, from the data in DATA.  If DATA is a vector, then Y is a
    vector containing K elements selected from DATA. If DATA is a matrix, then
    Y is a matrix containing K rows selected from DATA.  If DATA is an N-D
    array, DATASAMPLE samples along its first non-singleton dimension.  DATA
    may be a dataset array or table. Because the sample is taken with replacement, the
    observations that DATASAMPLE selects from DATA may be repeated in Y.
 
    Y = DATASAMPLE(DATA,K,DIM) returns a sample taken along dimension DIM of
    DATA. For example, if DATA is a matrix and DIM is 2, Y contains a
    selection of DATA's columns.  If DATA is a dataset array or table and DIM is 2, Y
    contains a selection of DATA's variables.  Use DIM to ensure sampling
    along a specific dimension regardless of whether DATA is a vector, matrix
    or N-dimensional array.
 
    Y = DATASAMPLE(DATA,K, 'PARAM1',val1, 'PARAM2',val2, ...) or Y =
    DATASAMPLE(DATA,K,DIM, 'PARAM1',val1, 'PARAM2',val2, ...) specifies
    optional parameter name/value pairs to control how DATASAMPLE creates the
    sample.  Parameters are:
 
       'Replace' - select the sample with replacement if REPLACE is true (the
                   default), or without replacement if REPLACE is false.  When
                   sampling without replacement, the observations that
                   DATASAMPLE selects from DATA are unique.
 
       'Weights' - create a weighted sample using the positive weights in
                   the vector W.
 
    [Y,I] = DATASAMPLE(...) returns an index vector indicating which values
    were sampled from DATA.  For example, Y = DATA(I) if DATA is a vector,
    Y = DATA(I,:) if DATA is a matrix, etc.
 
    DATASAMPLE uses RANDPERM and RANDI to generate random values and therefore
    changes the state of MATLAB's global random number generator.  Control
    that generator using RNG.
  
    Y = DATASAMPLE(S,...) uses the random number stream S for random number
    generation.
 
    Examples:
 
    Draw five unique values from the integers 1:10.
       y = datasample(1:10,5,'Replace',false)
 
    Generate a random sequence of the characters ACGT, with replacement,
    according to specified probabilities.
       seq = datasample('ACGT',48,'Weights',[0.15 0.35 0.35 0.15])
 
    Select a random subset of columns from a data matrix.
       X = randn(10,1000);
       Y = datasample(X,5,2,'Replace',false)
 
    Resample observations from a dataset array to create a bootstrap
    replicate dataset.
       load hospital
       y = datasample(hospital,size(hospital,1))
    
    Use the second output to sample "in parallel" from two data vectors.
       x1 = randn(100,1);
       x2 = randn(100,1);
       [y1,i] = datasample(x1,10)
       y2 = x2(i)
 
    See also RAND, RANDI, RANDPERM, RNG.

    Documentation for datasample
       doc datasample

    Other uses of datasample

       tall/datasample
x = [1.50 1.60 1.70 1.70 1.70 1.80 1.90];
xsam = datasample(x,20)
xsam = 1×20
    1.7000    1.7000    1.7000    1.8000    1.5000    1.8000    1.7000    1.8000    1.8000    1.8000    1.7000    1.7000    1.6000    1.6000    1.9000    1.7000    1.7000    1.6000    1.7000    1.9000

So the vector x contains ONLY elments which lie in the original data set. And they will have the same relative frequency. It is my guess this what you want.

histogram(datasample(x,1000000),'norm','pdf')

But if you use a normal approximation to that distribution, the relative frequencies of the data will not match the original data set at all well.

[muhat,sigmahat] = normfit(x)

muhat = 1.7000

sigmahat = 0.1291

fplot(@(x) normpdf(x,muhat,sigmahat),[1.4,2])

hold on

histogram(x,5,'norm','pdf')

As you can see, a normal pdf fits that data like I fit into the suit I wore when I got married.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Pick a value with some probability

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

Pick a value with some probability

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens