Effacer les filtres
Effacer les filtres

Trying to arrange big data

2 vues (au cours des 30 derniers jours)
harel yadid
harel yadid le 3 Mar 2021
Commenté : harel yadid le 9 Mar 2021
hi everyone, im having trouble to reach max speed reading huge csv files, and i would like to hear your ideas
i'm using datastore in order to pre - arrange 1910 headers, and than i want to fix my output to hold them in a specific struct, and start reading individually because not all 1910 headers are full of data. my main problem is that every header seperated with '_' and it makes it hard to read.
for example:
FruitDs = datastore(Fruits.csv) %im keeping it short, but the function is going fine
NumOfHeaders = length(FruitDs.Variablenames); %e.g Headers : "Apples_colour_S1_red_dated" "Apples_colour_S1_red_dated" "Apples_colour_S_green_dated"....
for n = 1:NumOfHeaders
if strfind(Ds.VariableName{n},'Apples')
tmp = Ds.VariableName{n};
A =strfind(tmp,'_');
tmp(A[1,3])='.';
Ds.SelectedVariableName = Ds.VariableName(n);
ApplesData = readall(Ds);
eval([tmp '= ApplesData']); % the struct i need is that out.Apples.colour_XXXXX will contain all data of the specific apple
Fruits.Apples = tmp;
end
end
this function works fine, so my questions as follows:
  1. is there any faster way to do it?
  2. do you have a smart and fast logical way to avoid reading empty headers (because 1910x390000 can be too much and not all of them are full (i filled them with NA in the datastore..))
  3. i have some cases which the headers are different only by number, and i do want to seperate them. let's say "Apples_colour_S1_....", "Apples_colour_S2_...". is there a way to avoid second loop (loop that runs over all the SX)?
thanks in advance

Réponse acceptée

Walter Roberson
Walter Roberson le 3 Mar 2021
don't use eval there. Use setfield. Split the name at _ into a cell array and do cell expansion to insert the field names https://www.mathworks.com/help/matlab/ref/setfield.html
  3 commentaires
Walter Roberson
Walter Roberson le 3 Mar 2021
out = struct;
S = "Apples_colour_S_green_dated";
parts = split(S, '_');
out = setfield(out, parts{:}, 12345);
out.Apples.colour.S.green
ans = struct with fields:
dated: 12345
harel yadid
harel yadid le 9 Mar 2021
thank you very much!

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Startup and Shutdown dans Help Center et File Exchange

Produits


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by