Calculating group statistics using frequency weights

Question

0 votes

I have a large table of survey data with about 3 Mio. observations and 120 variables. The survey also contains a variable called "weights" with integers between 250 and 250'000. These weights are intended to render the sample representative. Hence I have to weight every observation for every calculation. I need a number of (weighted) group means, medians and percentiles.

How can I calculate summary statistics and group statistics, weighting the observations with the frequency weights?

Considering the size of the dataset, I am looking for a solution that works for a large dataset of dimensions I mentioned above. (If I needed the unweighted statistics, it can easily and efficiently be done with the groupsummary() function, I have however not found an option allowing for the weights.)

Thanks for your help

Chris

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

clauper le 17 Déc 2019

Thanks for your comment. I have updated the question to give more context.

Adam Danz le 18 Déc 2019

I added an answer that shows how to scale your inputs according to your weights.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Adam Danz le 17 Déc 2019

Ouvrir dans MATLAB Online

0 votes

Since you integer weights are much too large to merely replicate values based on the weights, you can scale your data according to your weights. This is only 1 of many interpretations of applying weights.

There are several ways around this and the best method depends on how you're using the weights, what those weights mean, and the bounds of those weights. That's something you'll need to think about.

Here's my proposal.

% Create demo data
x = randi(10,1,20);               % Main data:  20 random integers
w = randi(24750,size(x))+250;     % random weights between 250 and 250000

This is the part you'll need to consider. The idea is to normalize your weights between [0,1] but keep in mind that a weight of 0 will completely eliminate a value.

% Normalize the weights
% If you know the upper and lower limits of the weights (Safer than alternative)
knownWeightBounds = [250,250000]; 
% Or maybe use
knownWeightBounds = [0,250000]; 
% or maybe
knownWeightBounds = [min(w), max(w)]; 
% Scale the weights to 0:1
wNorm = (w-knownWeightBounds(1))/range(knownWeightBounds); 

Now scale your data according to the weights.

% Scale the data by normalized weights. 
xScaled = x .* wNorm;
% compute whatever stats you want on xScaled
mu = mean(xScaled)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Calculating group statistics using frequency weights

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Tags

Community Treasure Hunt

Calculating group statistics using frequency weights

5 commentaires Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Tags

Voir également

Community Treasure Hunt

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens