'reshape' by identifiers

Hi, I am trying to reshape data by two identifiers: date and id. The origianl data I have can be simplified as follow;
date id v1
2000 1 99
2001 1 84
1997 2 74
1998 2 89
1999 2 48
2000 2 43
2001 2 45
2002 2 49
And I need to change the original data into the matrix as below;
date id1 id2
1997 . 74
1998 . 89
1999 . 48
2000 99 43
2001 84 45
2002 . 49
I used to use nested for loop that goes through every elemnt in the original data and copy it into the matrix I need to generate if both of date and id are matched. But now, the length of original data is over 3 mil and it took me more than 30 min. to make the matrix I want. And I don't think plain reshape function can solve this problem. Can anyone help me to solve this problem? Thank you.
Minsoo

 Réponse acceptée

Walter Roberson
Walter Roberson le 25 Juin 2011

0 votes

This should probably be much faster:
[uyear, m, yearidx] = unique(Matrix(:,1));
OutMat = nan(length(uyear),3);
OutMat(:,1) = uyear(:);
OutMat(sub2ind([length(OutMat),3], yearidx, 1 + Matrix(:,2))) = Matrix(:,3);
(Function corrected as the output of the previous iteration required a reshape())

Plus de réponses (2)

Andrei Bobrov
Andrei Bobrov le 25 Juin 2011

0 votes

L = M(:,2)==1;
M2 = M(~L,:);
M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)

3 commentaires

Walter Roberson
Walter Roberson le 25 Juin 2011
M = [2000 1 99
2001 1 84
1997 2 74
1998 2 89
1999 2 48
2000 2 43
2001 2 45
2002 2 49
2003 1 55];
>> L = M(:,2)==1;
>> M2 = M(~L,:);
>> M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)
??? Subscripted assignment dimension mismatch.
If you use the original matrix from the question instead of this slightly augmented one,
M =[ 2000 1 99
2001 1 84
1997 2 74
1998 2 89
1999 2 48
2000 2 43
2001 2 45
2002 2 49]
>> L = M(:,2)==1;
>> M2 = M(~L,:);
>> M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)
M2 =
1997 2 74
1998 2 89
1999 2 48
2000 99 43
2001 84 45
2002 2 49
>> L = M(:,2)==1;
>> M2 = M(~L,:);
>> M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)
M2 =
1997 2 74
1998 2 89
1999 2 48
2000 99 43
2001 84 45
2002 2 49
Notice the 2's left in column 2 :(
Andrei Bobrov
Andrei Bobrov le 25 Juin 2011
Hi Walter!
I agree with you, my variant - the answer to a specific question.
'2' can be easily replaced by any number or NaN -> M2(M2(:,2)==2,2) = NaN;
Walter Roberson
Walter Roberson le 25 Juin 2011
But not if the id that _should_ go in the second column is 2.

Connectez-vous pour commenter.

Minsoo Kim
Minsoo Kim le 25 Juin 2011

0 votes

Hi Walter and Andrei! Thank you very much for your helpful answers. Yes, Andrei's answer can be easily complemented by deleting numbers in each column. I accepted Walter's solution because it doesn't need for loop to apply his answer to my original data of about 3mil observations and 20K unique id's. Thank you very much.
Minsoo

Catégories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by