# how to select particular rows from a matrix

31 views (last 30 days)
ektor on 6 Mar 2017
Commented: ektor on 7 Mar 2017
Dear all,
I have a panel data set(countries, years) of dimension 20000 by 200. My goal is
1) to choose those countries that correspond to a specific period (1970-2013) AND
2) for which period there are no NaN values Is there a way to do that? Here is a sample matrix Many thanks in advance.
A={ 'country' 'year' 'v1' 'v2' 'v3' 'v4' 'v5' 'v6' 'v7' 'v8' 'v9' 'v10' 'v11' 'v12'
'Albania' [1919] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1920] [1] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1921] [0] [0] [0] [0] [0] [ 0] [1] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1922] [0] [0] [0] [0] [0] [ 0] [1] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1923] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1924] [1] [0] [1] [0] [0] [ 0] [2] [1] [ NaN] [ 1] [ 2] [ 0]
'Albania' [1933] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
};

Image Analyst on 6 Mar 2017
Try converting it to a table to make it easier:
A={ 'country' 'year' 'v1' 'v2' 'v3' 'v4' 'v5' 'v6' 'v7' 'v8' 'v9' 'v10' 'v11' 'v12'
'Albania' [1919] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1920] [1] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1921] [0] [0] [0] [0] [0] [ 0] [1] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1922] [0] [0] [0] [0] [0] [ 0] [1] [0] [ 42] [ 1] [ 0] [ 0]
'Albania' [1923] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1924] [1] [0] [1] [0] [0] [ 0] [2] [1] [ NaN] [ 1] [ 2] [ 0]
'Albania' [1933] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
};
t = cell2table(A(2:end,:), 'VariableNames', A(1,:))
yearsInRange = t.year >= 1921 & t.year <= 1923
nonNanRows = ~isnan(t.v9)
% Find out where both are true:
% year in range, and v9 column is not nan
rowsToKeep = yearsInRange & nonNanRows;
% Extract only those rows
newTable = t(rowsToKeep,:)
Echoed to the command window, you can see the steps:
t =
country year v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12
_________ ____ __ __ __ __ __ __ __ __ ___ ___ ___ ___
'Albania' 1919 0 0 0 0 0 0 0 0 NaN 1 0 0
'Albania' 1920 1 0 0 0 0 0 0 0 NaN 1 0 0
'Albania' 1921 0 0 0 0 0 0 1 0 NaN 1 0 0
'Albania' 1922 0 0 0 0 0 0 1 0 42 1 0 0
'Albania' 1923 0 0 0 0 0 0 0 0 NaN 1 0 0
'Albania' 1924 1 0 1 0 0 0 2 1 NaN 1 2 0
'Albania' 1933 0 0 0 0 0 0 0 0 NaN 1 0 0
yearsInRange =
7×1 logical array
0
0
1
1
1
0
0
nonNanRows =
7×1 logical array
0
0
0
1
0
0
0
newTable =
country year v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12
_________ ____ __ __ __ __ __ __ __ __ __ ___ ___ ___
'Albania' 1922 0 0 0 0 0 0 1 0 42 1 0 0
>>

ektor on 6 Mar 2017
Hi, what worries me is that the command yearsInRange will not give me the interval [1921-1923] FOR EACH COUNTRY. For some countries this interval does not exist. So I want to exclude those countries and keep only those countries for which the interval [1921-1923] exists. I want to have a balanced panel data. Is there a solution to that? Many thanks
Image Analyst on 7 Mar 2017
Sure it will. Here is proof:
A={ 'country' 'year' 'v1' 'v2' 'v3' 'v4' 'v5' 'v6' 'v7' 'v8' 'v9' 'v10' 'v11' 'v12'
'Albania' [1919] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1920] [1] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1921] [0] [0] [0] [0] [0] [ 0] [1] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1922] [0] [0] [0] [0] [0] [ 0] [1] [0] [ 42] [ 1] [ 0] [ 0]
'Albania' [1923] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Albania' [1924] [1] [0] [1] [0] [0] [ 0] [2] [1] [ NaN] [ 1] [ 2] [ 0]
'Albania' [1933] [0] [0] [0] [0] [0] [ 0] [0] [0] [ NaN] [ 1] [ 0] [ 0]
'Canada' [1922] [1] [0] [1] [0] [0] [ 0] [2] [1] [ 11] [ 1] [ 2] [ 0]
'USA' [1923] [0] [0] [0] [0] [0] [ 0] [0] [0] [ 12] [ 1] [ 0] [ 0]
'Germany' [1922] [1] [0] [1] [0] [0] [ 0] [2] [1] [ 13] [ 1] [ 2] [ 0]
'Belgium' [1923] [0] [0] [0] [0] [0] [ 0] [0] [0] [ 14] [ 1] [ 0] [ 0]
};
t = cell2table(A(2:end,:), 'VariableNames', A(1,:))
yearsInRange = t.year >= 1921 & t.year <= 1923
nonNanRows = ~isnan(t.v9)
% Find out where both are true:
% year in range, and v9 column is not nan
rowsToKeep = yearsInRange & nonNanRows;
% Extract only those rows
newTable = t(rowsToKeep,:)
And look at all the countries it found:
newTable =
country year v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12
_________ ____ __ __ __ __ __ __ __ __ __ ___ ___ ___
'Albania' 1922 0 0 0 0 0 0 1 0 42 1 0 0
'Canada' 1922 1 0 1 0 0 0 2 1 11 1 2 0
'USA' 1923 0 0 0 0 0 0 0 0 12 1 0 0
'Germany' 1922 1 0 1 0 0 0 2 1 13 1 2 0
'Belgium' 1923 0 0 0 0 0 0 0 0 14 1 0 0
so explain why you said that - why you think it's getting countries outside the range or not including countries inside the year range. For example you can give me a cell array that proves what you said.
ektor on 7 Mar 2017
So I was confused! Yes, it is fine. Many thanks!

Rik on 6 Mar 2017
You can use ismember to select the country, then you can convert those rows (excluding the first column) to a matrix with cell2mat. With logical indexing you can select the correct range of years, and with isnan you can find out the rows with NaN values.
I think you should be able to figure out what to do with this pointer. If you have trouble writing the code, don't hesitate to reply.

ektor on 6 Mar 2017
Dear Rik,thank you for this. I am not sure how to choose the years. For example, I am interested in the 1945-2015 time period. However, if I set, for example
x>=1945&x<=2015
I might get counties with time period 1989-2009. But I want countries over 1945-2015. My logical indexing is wrong. Any suggestion on that?
Rik on 7 Mar 2017
I would say that your data is not complete then, because there is nothing wrong with your logical indexing.
I also like to add that the solution from Image Analyst is more elegant and will be easier to understand if you come back to your code after a year. You do need to make sure that the all machines you run your code on have 2013b or later, because older versions don't have the table data structure.