random vector v from uniform distribution at (0,1) with sum(v)=1

Question

2 votes

Hello,

How can I generate a uniformly distributed random vector with its sum to be equal to 1?

Thank you

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

John D'Errico le 14 Mar 2014

Ouvrir dans MATLAB Online

3 votes

Too many people think that generating a uniform sample, then normalizing by the sum will generate a uniform sample. In fact, this is NOT at all true.

A good way to visualize this is to generate that sample for the 2-d case. For example, suppose we do it the wrong way first?

xy = rand(100,2);
plot(xy(:,1),xy(:,2),'.')

Now, lets do the sum projection that virtually everyone poses. (Yes, it is the obvious choice. Now we will see why it is the wrong approach.)

xys = bsxfun(@rdivide,xy,sum(xy,2));
hold on
plot(xys(:,1),xys(:,2),'ro')
axis equal
axis square

The sum-projected points lie along the diagonal line. Note the distribution seems to be biased towards the middle of the line. A uniform sample would have points uniformly distributed along that line.

In a low number of dimensions there are some nice tricks to generate a sample that is indeed uniform. I tend to use Roger Stafford's submission to the file exchange, randfixedsum. It is efficient, and works in any number of dimensions.

figure
xyr = randfixedsum(2,100,1,0,1)';
plot(xyr(:,1),xyr(:,2),'ro')
axis equal
axis square

17 commentaires
Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

Image Analyst le 13 Mai 2014

Ouvrir dans MATLAB Online

The PDF of a sum of two random variables is the convolution of the two individual PDFs. So you take two uniform variables and convolve them and you get a triangle, which you can see in the red circles in John's plot above. Of course by the Central Limit Theorem if you do it for tons of rv's you get a Normal distribution as Walter Mentioned. You can observe this triangle for the simple case of the sum of a pair of dice like in the sample below.

% Roll a pair of dice a million times.
d = randi(6, 1000000,2);
s = sum(d,2); % Sum the two dice
% Get distribution of the sums.
edges = 2:12;
counts = histc(s, edges);
% Plot distribution of the sum, which will be a triangle 
% which is the convolution of two uniformly distributed rv's.
plot(counts, 'ro-')

But when you just have a single rv and just divide the values by the sum so that they sum to 1 instead of what they used to sum to, I don't think the shape of the PDF will change. This little script seems to back that up:

% Roll one dice 10 million times.
d = randi(6, 100000, 1);
% Find the sum
theSum = sum(d)
% Get distribution of the rolls.
edges = 1:6;
counts = histc(d, edges);
max(counts)
% Plot distribution of the uniformly distributed rv's.
subplot(1,2,1);
plot(counts, 'ro-') 
ylim([0, 1.5*max(counts)]);
title('Original', 'FontSize', 15);
grid on;
% Normalize by dividing by theSum so that new sum will = 1
d2 = d / theSum;
% Find the sum
theSum2 = sum(d2)
% Get distribution of the rolls.
edges2 = [1:6] / theSum;
counts2 = histc(d2, edges2);
max(counts2)
% Plot distribution of the uniformly distributed rv's.
subplot(1,2,2);
plot(counts2, 'ro-') 
ylim([0, 1.5*max(counts2)]);
title('Normalized', 'FontSize', 15);
grid on;
% d and d2 are different but the PDFs (counts and counts2) are the same.

So I guess I can see Benjamin's point and would like clarification from John.

Matt J le 14 Mai 2014

Modifié(e) : Matt J le 14 Mai 2014

Ouvrir dans MATLAB Online

Another way to understand it (in 2D) is to remember that the random vector [x1,x2]=rand(1,2) is drawn uniformly from the unit square, but the line segments intersecting the unit square are not all of equal length. The length of these segments affects the probability mass of each outcome for the normalized random vector v=[x1,x2]/(x1+x2).

As an example, consider the case where v=[.5,.5]. This result for v is obtained by any x1=x2, i.e. any pair on the main diagonal of the unit square, which has length sqrt(2).

Conversely, for v=[.8,.2], any pair (x1,x2) in the unit square and on the line x2=x1/4 will achieve this v. However this line segment only has length 1.0308<sqrt(2). I.e., it has lower probably mass than v=[0.5,0.5] does.

I'm curious why a single dimension, uniformly distributed array of values ceases to be uniform when scaled with a linear transformation. Relative spacing between points is preserved. The "shape" of the data is unchanged, just the scale changes, no?

@Ben,

I'm not sure where the notion that this is a linear transformation is coming from. We start with a sequence of i.i.d uniformly distributed variables x(i), i=1...N and transform to the non-i.i.d variables

v(j)=x(j)/(x(1)+x(2)+...x(N)), j=1,2,...N

The right hand side of the above is a highly nonlinear, coupled function of the x(i).

Perhaps the idea was to view 1/sum(x) like a simple scaling constant? We can't. It is a random variable, dependent in part on the very x(j) that we are scaling.

Benjamin Avants le 14 Mai 2014

I'm not sure that it is correct to view x as a random variable, since x is fully known and unchanging once it has been computed. I believe at that point, it is merely a distribution and can be scaled without affecting uniformity.

I've had some trouble finding online documentation specifically regarding this, but here's an excerpt (and link) from a JMP page on the subject:

*****

http://www.jmp.com/support/help/Random_Functions.shtml

Random Uniform

Generates random numbers uniformly between 0 and 1. This means that any number between 0 and 1 is as likely to be generated as any other. The result is an approximately even distribution. You can shift the distribution and change its range with constants. For example, 5 + Random Uniform()*20 generates uniform random numbers between 5 and 25.

*****

The MATLAB documentation claims that rand() produces an approximately uniform distribution. It would stand to reason that this distribution should also maintain its uniformity if shifted or scaled. Intentionally selecting a scaling factor a posteriori which results in the sum of the elements of the distribution equalling 1 does not appear to be a special case, and should still be a linear transformation.

However, what has occurred to me is that the process of scaling would alter the range of the distribution such that the range is no longer (0,1). If the range does not need to be maintained, my suggestion should be valid. If the range must be maintained, then another approach would be required.

Based on the original question, it does not appear to me that maintaining the range is a requirement.

John D'Errico le 14 Mai 2014

Ouvrir dans MATLAB Online

Consider the set of points that gets mapped to any point along the line. There are simply MORE points that get mapped to the midpoint of the line, than those that get mapped to an end point of the line.

This must tell you that the distribution of points obtained by the renormalizing scheme is NOT uniform along the projected line.

There are schemes that DO generate a uniform distribution along that line, and they are absolutely trivial to write, at least in low numbers of dimensions. For example, in two dimensions,

A = [0 1];
B = [1 0];
t = rand(1000,1);
P = (1-t)*A + T*B;

Here each row of the array P can be interpreted as a point in 2-dimensions. Those points have the property that they MUST sum to 1. And most importantly, they are clearly uniformly distributed along that line. Any such point on the line is as likely to result as any other, to the extent that the function rand produces truly uniform pseudo-random deviates. That is something The MathWorks has spent a fair amount of effort to ensure happens.

Note that in higher dimensions there are also schemes much like the one I show, however Roger's randfixedsum is well written, fast - simply the best tool to use.

Benjamin Avants le 15 Mai 2014

Ouvrir dans MATLAB Online

Let me start by saying that I appreciate this discussion and I hope you'll be patient enough with me to continue.

@Matt, to clarify what I was saying, I was referencing x as the total distribution, not as individual points within the distribution. Once generated and scaled, a single distribution would need to remain unchanged, or the sum would change. It would be impossible to scale each point in the distribution as it was generated, because the sum of all generated points would be unknown.

@John, I generated a uniform distribution and scaled it to sum up to 1. I plotted each of these distributions (original and scaled) in their original order to compare shape and density. I then sorted each distribution and plotted them to look for grouping of points towards the mean of the distribution. The plots are posted below. I do not see any change of shape, density, or uniformity. The unsorted plots are of 10,000 points and the sorted plots are of 1,000 (for clear visualization). Is the effect you are describing too subtle to notice at these point densities or am I missing something?

Original order plot. Unscaled on left, scaled on right.

Sorted plot - Unscaled.

Sorted plot - Scaled.

Code that generated the above plots:

points = rand(10000,1);
s = sum(points);
spoints = points ./ s;
scatter(1:10000,points);
figure;
scatter(1:10000,spoints);
points = rand(1000,1);
s = sum(points);
spoints = points ./ s;
p1sort = sort(points);
p2sort = sort(spoints);
scatter(1:1000,p1sort,1);figure;scatter(1:1000,p2sort,1);

Matt J le 15 Mai 2014

Modifié(e) : Matt J le 15 Mai 2014

@Ben,

The shape of spoints, when plotted, is not what is germane to the posted topic. The spoints that you've generated is just one vector drawn randomly from the set S={x| sum(x)=1}. The idea of the post is to draw multiple such vectors from S repeatedly and in a uniformly randomized manner (uniformly over S).

Benjamin Avants le 15 Mai 2014

If that's the case, I concede the point. I had interpreted the post to be asking for a single vector with uniform distribution and a total sum of 1 derived from a uniform distribution with range (0,1). I was assuming @jimaras was simply asking for a way to convert a uniform distribution (perhaps generated using the rand function) into another uniform distribution with a total sum of 1.

Further, @John stated that my approach does not yield a uniformly distributed result. I suppose this is true if you are trying to maintain uniformity in the (0,1) range, but that did not seem to be his argument. Within the new range of the scaled distribution, I believe I have shown that uniformity is maintained.

I rely on shifting and scaling pseudo-random numbers in some of my work and I felt it was important to understand if my methods were in fact impacting the uniformity of those numbers. So far, it does not seem to be the case.

I appreciate you and @John's willingness to discuss this topic at length.

Connectez-vous pour commenter.

Answer 2

Benjamin Avants le 14 Mar 2014

Ouvrir dans MATLAB Online

0 votes

You could use rand() to create a uniform distribution then divide each element by the sum.

v = rand(10,1);
vSum = sum(v);
v = v ./ vSum;

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Benjamin Avants le 13 Mai 2014

John,

Your answer does not explain why my suggestion would not work. Please read my comment on your answer and explain it for me. I would like to understand why this approach is not valid.

John D'Errico le 14 Mai 2014

Read my answer, which does show that the simple renormalizing scheme fails to yield a uniform result.

A good way to look at it is if you think of projecting the domain from a square region onto a diagonal straight line crossing the square, you can see that the ends of the line will have fewer points that can contribute to those regions.

Your renormalizing scheme is a terribly common mistake I see made. After all, it is simple, and it seems to get the job done at first glance. It is only when you look more carefully at the actual distribution along the line that people should see it is wrong. Wrong here means non-uniform.

Connectez-vous pour commenter.

random vector v from uniform distribution at (0,1) with sum(v)=1

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

17 commentaires
Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

Plus de réponses (1)

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Tags

Community Treasure Hunt

random vector v from uniform distribution at (0,1) with sum(v)=1

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

17 commentaires Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

Plus de réponses (1)

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

17 commentaires
Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien