resample data based on a particular variable

3 vues (au cours des 30 derniers jours)
Boram Lim
Boram Lim le 4 Mai 2018
Commenté : Boram Lim le 4 Mai 2018
I have a large dataset as below. From the data, I want to randomly sample based on 'id' produce the same size data. Since the data has 5 ids, I would like to sample 5 ids with replacement and produce a dataset.
id value var1 var2
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16
With the data, the desired output could be as below (because I want to sample ids with replacement, there could be duplicated ids)
id value var1 var2
2 5
2 6
2 7
4 11
4 12
4 13
3 8
3 9
3 10
2 5
2 6
2 7
1 1
1 2
1 3
1 4
  2 commentaires
KSSV
KSSV le 4 Mai 2018
What is the difference between both the datasets? They are same.......in the second one you have repeated id 2.
Boram Lim
Boram Lim le 4 Mai 2018
I want to randomly resample data based on id variable

Connectez-vous pour commenter.

Réponses (1)

KSSV
KSSV le 4 Mai 2018
A = [1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16 ];
id = A(:,1) ; val = A(:,2) ;
N = max(id) ;
idx = randperm(N) ;
iwant = cell(N,1) ;
for i = 1:N
iwant{i} = A(id==idx(i),:) ;
end
iwant = cell2mat(iwant)
  1 commentaire
Boram Lim
Boram Lim le 4 Mai 2018
Thank you for your comment. However, any simple way without using for-loop? my data size is around 10million and this work should be done several times.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Data Type Identification dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by