How to add a new variable to a tall table which depends on information available in other lower size tall table?

5 vues (au cours des 30 derniers jours)
I have a tall table called tt_train with one of its variables named store_nbr. I also have other tall table called tt_stores with uniques store_nbr. I want to add a new variable tt_train.store_type whose content is the store type available in the tt_stores. Since tt_train has more rows than tt_stores, the store_type must be matched according to the store_nbr in tt_train.
In a normal table, I would do this:
tbl_train.store_type = NaN;
for i = 1:size(train,1)
tbl_train.store_type(i) = tbl_stores.store_type(tbl_stores.store_nbr == tbl_train.store_nbr(i));
end
Since indexing is not possible for tall tables, I do not how to proceed in this case, and how to save the new tall table in disk.
I am new on big data. I have experienced matlab user.
Thanks for your help.

Réponses (1)

Edric Ellis
Edric Ellis le 12 Déc 2017
You can use a tall table join or method to do this. Here's a simple example. I'm using innerjoin here because my simple info table doesn't have enough rows for all the data.
% Create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin'};
ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds);
% Create a non-tall table of information
info = table({'LAX'; 'SJC'; 'BUR'}, [1;2;3], ...
'VariableNames', {'Airport', 'SomeProperty'});
% Use 'innerjoin' to add information
jt = innerjoin(tt, info, 'LeftKeys', 'Origin', 'RightKeys', 'Airport');
% Display results
gather(head(jt))

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by