Vertically Concatenating Timetables of Different Sizes

15 vues (au cours des 30 derniers jours)
John
John le 14 Mar 2023
Modifié(e) : Stephen23 le 15 Mar 2023
As the title states, I wish to do just that. More specifically, I need to combine n timetables, each with one row, but varying column counts (Dimension counts). There is a nonzero intersection between each timetable, and so there are data being shared between each. However, some may also have data that other’s do not.
The goal here is to vertically concatenate each timetable so that all the data within the intersection are aligned by column/dimension name type. If there are data that are missing and so one timetable is smaller than the other, then that dimension/data type will be populated as a 'NaN' within the smaller timetable. In the example I give below, I say there are three timetables of lengths 1x40, 1x36 and 1x30. I wish to get a resulting timetable that is then 3x40.
Method 1:
Obviously doesn’t work as the # of columns are different.
Method 2:
I tried using a solution on another post that does exactly what I want, but with tables instead of timetables. This did not give the correct result as the output is a 3x215, which I don’t understand why. It also combines the dimension labels with “All” which doesn’t make sense to me as I specify “MergeKeys” as “True” which should leave the dimension labels alone according to the documentation. It comes close though in that it does concatenate the datetimes correctly and populates empty data/units with 'NaN's. I tried specifying the “Key” variables to align to, but because some timetables do not contain these variables and the Matlab function outerjoin wasn’t built to create missing variable columns that auto populate with NaNs when it encounters this, this does not work.
Method 3:
I tried using the synchronize function as this is specifically geared towards timetables. I get the exact same resulting timetable as Method 2. The synchronize documentation and "Combine Timetables and Synchronize Their Data" documentation give examples that do the exact same thing (Just that I have more columns and different dimension names) as my example from what I can see, so I am even more confused by why this does not produce the same result that I would like.
So I think I am just not employing Methods 2,3 correctly or I may have to create another for-loop prior to concatenation that creates the new variables in the smaller timetables and populates them with NaNs using indexing of the largest timetable.
%Create 3 timetables of different sizes and random dimension names with nonzero intersections
varTypes = {'string'};
doub = 'double';
str = 'string';
szList = [40,36,30];
ind = {'A','B','C'};
TimeDim = [datetime('2023-01-01 20:14:58'),datetime('2023-02-21 22:13:04'),datetime('2023-03-11 10:12:58')];
for i = 1:length(szList)
for j = 1:szList(i)
varTypes{end+1} = doub;
varTypes{end+1} = str;
end
ArrayList.ind{i} = varTypes;
varTypes = {'string'};
end
clear i j
for i = 1:length(szList)
ttSaved.ind{i} = timetable('Size',[1 2*szList(i)+1],'VariableTypes',ArrayList.ind{1,i},'RowTimes',TimeDim(i));
ttSaved.ind{i} = renamevars(ttSaved.ind{i},'Var1','Exp_ID');
ttSaved.ind{i}.Exp_ID = i;
end
clear i
%Label each dimenion and populate tables with data
for i = 1:length(ttSaved.ind)
choice = randperm(szList(i));
for j = 1:szList(i)
ttSaved.ind{i} = renamevars(ttSaved.ind{i},sprintf('Var%d',2*j),sprintf('Data_%d',choice(j)));
ttSaved.ind{i} = renamevars(ttSaved.ind{i},sprintf('Var%d',2*j+1),sprintf('Units_%d',choice(j)));
end
Names = ttSaved.ind{i}.Properties.VariableNames;
for k = 1:szList(i)
ttSaved.ind{i}.(Names{2*k}) = rand*100; %Add data
ttSaved.ind{i}.(Names{2*k+1}) = char(choice(k)); %Add Units
end
end
clear i j k
%Method 1
ttAll = [];
for i = 1:length(ttSaved.ind)
ttAll = [ttAll ; ttSaved.ind{i}];
end
clear i
%Method 2
ttAll = ttSaved.ind{1};
AllMeas = ttSaved.ind{1}.Properties.VariableNames;
for i = 2:length(ttSaved.ind)
ttAll = outerjoin(ttAll,ttSaved.ind{i},'Keys',AllMeas,'MergeKeys', true);
end
clear i
%Method 3
ttAll2 = ttSaved.ind{1};
for i = 2:length(ttSaved.ind)
ttAll2 = synchronize(ttAll2,ttSaved.ind{i});
end
clear i
Note: This post is somewhat of a continuation of a previous post, but since I marked the last as answered and this one is different enough and contains the same data structures to my actual data, I needed create a new question.

Réponse acceptée

Stephen23
Stephen23 le 14 Mar 2023
Modifié(e) : Stephen23 le 15 Mar 2023
"The goal here is to vertically concatenate each timetable so that all the data within the intersection are aligned by column/dimension name type."
First lets create some fake data, here are three tables in a cell array:
T = array2table(rand(3,7));
C = {...
T(1,[1,3,5,7]),...
T(2,[2,4,5,6,7]),...
T(3,[1,2,3])};
C{:}
ans = 1×4 table
Var1 Var3 Var5 Var7 _______ _______ ________ _______ 0.12972 0.41597 0.090679 0.25583
ans = 1×5 table
Var2 Var4 Var5 Var6 Var7 _______ _______ _______ _______ _______ 0.59592 0.97832 0.13188 0.36775 0.82012
ans = 1×3 table
Var1 Var2 Var3 _______ _______ _______ 0.86471 0.33433 0.68387
Now lets concatenate them together using OUTERJOIN. The trick is to ensure that you provide a key (or keys) that uniquely identifies each row (this could probably be a time, row name, etc). Note how the THISROW variable ensures at least one common key, which is required for joining tables:
T = C{1};
T.ThisRow = 1; % uniquely identify 1st row.
T = T(:,[end,1:end-1]); % ensure THISROW is the 1st column, to ensure the correct row order.
for i = 2:numel(C)
V = C{i};
V.ThisRow = i; % uniquely identify ith row.
T = outerjoin(T,V, 'MergeKeys',true);
end
display(T)
T = 3×8 table
ThisRow Var1 Var3 Var5 Var7 Var2 Var4 Var6 _______ _______ _______ ________ _______ _______ _______ _______ 1 0.12972 0.41597 0.090679 0.25583 NaN NaN NaN 2 NaN NaN 0.13188 0.82012 0.59592 0.97832 0.36775 3 0.86471 0.68387 NaN NaN 0.33433 NaN NaN
OPTIONAL, if you want a particular variable order:
[~,X] = sort(T.Properties.VariableNames);
T = T(:,X)
T = 3×8 table
ThisRow Var1 Var2 Var3 Var4 Var5 Var6 Var7 _______ _______ _______ _______ _______ ________ _______ _______ 1 0.12972 NaN 0.41597 NaN 0.090679 NaN 0.25583 2 NaN 0.59592 NaN 0.97832 0.13188 0.36775 0.82012 3 0.86471 0.33433 0.68387 NaN NaN NaN NaN
Use REMOVEVARS if you want to get rid of THISROW.
  3 commentaires
John
John le 15 Mar 2023
Modifié(e) : John le 15 Mar 2023
Interesting, so you used the intersection of the variable names to define the key variables for outerjoin but also manually adding in 'Time' since they're timetables and so that column is not considered apart of the variable names list by default. This works for my instance and is impemented like this:
ttAll = ttSaved.ind{1};
for i = 2:length(ttSaved.ind)
Int = intersect(ttAll.Properties.VariableNames,ttSaved.ind{i}.Properties.VariableNames);
Int{end+1} = 'Time';
ttAll = outerjoin(ttAll,ttSaved.ind{i}, 'Keys',Int, 'MergeKeys',true);
end
clear i
disp(ttAll)
Sorted = sortrows(ttAll,'Time')
I was close with my method 2, just missing the intersection part. So why did the outerjoin function modify the variable names after the timetables were joined, while in this case, they were unmodified? Thank you for your help and the knowledge provided for both of my questions @Stephen23
Stephen23
Stephen23 le 15 Mar 2023
Modifié(e) : Stephen23 le 15 Mar 2023
The main difference is actually that I uniquely specified each row. I realized later that using INTERSECT is superfluous (for tables), because by default OUTERJOIN uses all common variable names as keys. So we can simplify the code (see modified answer).
Not sure if that also works for timetables, because of the need(?) to specify the TIME.
"so why did the outerjoin function modify the variable names after the timetables were joined, while in this case, they were unmodified?"
Because OUTERJOIN keeps all variables/columns which are not keys, and makes no attempt to merge them. By specifying keys as being only the variables/column from the first table, you overrode the default behavior of using all common names as keys (or the equivalent using INTERSECT). Because you explicitly told it NOT to use any new variables/columns as keys OUTERJOIN helpfully renamed them and kept them all for you...

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Preprocessing Data dans Help Center et File Exchange

Produits


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by