# Extract Data from Cell Array with specified key word in other cell

1 view (last 30 days)
Flo Mo on 14 Nov 2016
Edited: Flo Mo on 15 Nov 2016
I have this cell array (first 10 columns):
'year' 2013 'day' 96 'minute' 50 'sec' 35 [] NaN
'year' 2013 'day' 96 'hour' 11 'minute' 24 'sec' 31
'year' 2013 'day' 96 'hour' 14 'minute' 52 'sec' 34
'year' 2013 'day' 96 'minute' 35 'sec' 5 [] NaN
and want to get this matrix:
2013 96 0 50 35
2013 96 11 24 31
2013 96 14 52 34
2013 96 0 35 5
I already have produced logical matrixes with
distribution(:, :, 1) = strcmp(cell_array, 'year');
distribution(:, :, 2) = strcmp(cell_array, 'day');
... etc
looking like this:
distribution(:, :, 1) =
[1 0 0 0 0 0 0 0 0 0;
1 0 0 0 0 0 0 0 0 0;
1 0 0 0 0 0 0 0 0 0;
etc
and then with:
distribution = circshift(distribution, [0 1]);
I ll get the desired Logicals for each key word but cannot manage to get a new matrix like shown above. Whenever the loop (cycling through each col) encounters a "0" because there is no entry (like in first row for "hour") the loop is aborted because of dimension mismatch.
In my mind I thought (not real code):
cell_array * distribution(:,:,1) = [2013, 2013, 2013 ... ]
but this obviously doesn't work.
Any help appreciated for a solution without a for loop ;)
Fl0

Walter Roberson on 14 Nov 2016
2013 96 50 0 35
as the first line of output. Should that not be
2013 96 0 50 35
??
Walter Roberson on 14 Nov 2016
It is not clear to me how those could all be in columns if you have a different number of entries on each line. Are there padding entries on the shorter lines, or is each row a cell array?
Flo Mo on 15 Nov 2016
You are right about the first line. I ll correct it asap. There is more columns to the right of the input cell array. It is at least filled with NaN where there is no entry. I ll add NaN asap.

Flo Mo on 15 Nov 2016
Edited: Flo Mo on 15 Nov 2016
I figured it out:
datestring = zeros(length(cell_array), 5);
distribution(:, :, 1) = strcmp(cell_array, 'year');
distribution(:, :, 2) = strcmp(cell_array, 'day');
distribution(:, :, 3) = strcmp(cell_array, 'hour');
distribution(:, :, 4) = strcmp(cell_array, 'minute');
distribution(:, :, 5) = strcmp(cell_array, 'sec');
distribution = circshift(distribution, [0 1]);
distribution_perm = permute(distribution,[2 1 3]);
cell_array_perm = permute(cell_array,[2 1]);
d_str(logical(sum(distribution(:,:,1),2)),1) = cell2mat(cell_array_perm(distribution_perm(:,:,1)))';
d_str(logical(sum(distribution(:,:,2),2)),2) = cell2mat(cell_array_perm(distribution_perm(:,:,2)))';
d_str(logical(sum(distribution(:,:,3),2)),3) = cell2mat(cell_array_perm(distribution_perm(:,:,3)))';
d_str(logical(sum(distribution(:,:,4),2)),4) = cell2mat(cell_array_perm(distribution_perm(:,:,4)))';
d_str(logical(sum(distribution(:,:,5),2)),5) = cell2mat(cell_array_perm(distribution_perm(:,:,5)))';
with provided cell_array(see question). I chose to not use a loop because the input cell array has 30000 cols. The permutation is needed because my code processes each column. If there was no permutation the results would be wrong because e.g. the 'sec' more left in the cell array will come first in the resulting matrix.

#### 1 Comment

Flo Mo on 15 Nov 2016
As soon as I improve the code, I ll post it here for everyone to see. Just realized that the permutation can be done right in the beginning and then just swap the dimension in the end.