speed up renamecats/categorical multiple columns
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a huge csv file of about 16GB which over 9k columns. Each column is initially filled with some codes (either integer or string), and I have a code book with code and meaning for each column. What I'm trying to do is to translate the table and finally have a table that has readable texts instead of codes.
I can use either categorical or renamecats to "translate" them, but the issue is that it takes substentially long time to loop through these columns. I'm thinking if there is a way to speed this up.
See below an example
tbl = table(["a1", "b2", "c3", "d4", "e5"]', ...
["123", "234", "345", "456", "567"]', ...
'VariableNames', {'A', 'B'});
dictionary.A = table(["a1", "b2", "c3", "d4", "e5"]', ...
["apple", "banana", "cat", "dog", "elephont"]', ...
'VariableNames', {'Code', 'Meaning'});
dictionary.B = table(["123", "234", "345", "456", "567"]', ...
["East", "West", "North", "South", "Middle"]', ...
'VariableNames', {'Code', 'Meaning'});
Vars = tbl.Properties.VariableNames;
for iC = 1:width(tbl)
tbl.(iC) = categorical(tbl.(iC), dictionary.(Vars{iC}).Code, ...
dictionary.(Vars{iC}).Meaning);
end
Is that possible to avoid this loop, or any suggestions to speed this up (considering that I have over 500k rows and 9k columns).
Thank you!
0 commentaires
Réponses (1)
Campion Loong
le 9 Oct 2020
Hi Peng,
It seems you have the Dictionary code book to boot, and you already know which sets of code go wtih which field/name in the Dictionary (i.e. you can designate "VariableNames" in the first table(...) call).
In this case, why not create the table with categorical to begin with:
tbl = table(categorical(["a1"; "b2"; "c3"; "d4"; "e5"], dictionary.A.Code, dictionary.A.Meaning),...
categorical(["123"; "234"; "345"; "456"; "567"], dictionary.B.Code, dictionary.B.Meaning),...
'VariableNames', {'A', 'B'});
There is no loop, faster and much more readable.
3 commentaires
Voir également
Catégories
En savoir plus sur Standard File Formats dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!