Hello,
I have a huge CSV file and it has data stored as follows.
Column A stores ID numbers and the Column B stores dates (yyyymmdd hhmmss). Other columns are random.
I want to seperate the data based on ID numbers (Column A) and store them in different *.csv or *.txt files. The problem is that the number of data sets per one ID number is also random.
Ex: ID 102030 has 3 data while 125125 has 4 data
My CSV file has over 10,000 data rows, which makes is quite difficult to do this manually.
Is there a way to use MATLAB to sort this out? I know this is a big task but it would be a great help if anyone can at least point me in the right direction.
Thank you in advance!

 Réponse acceptée

Star Strider
Star Strider le 3 Mar 2020

0 votes

It woiuld be necessary to convert column B to strings in order for datetime to parse it, however that is not difficult. Grouping them is strraightforward, using either findgroups or the third output of unique. The accumarray function might be the best way to segregate them.
Example —
a = rand(10, 5); % Create Matrix
A = [[3 3 3 4 4 6 6 6 6 7]' a]; % Create Matrix
G = findgroups(A(:,1));
M = accumarray(G, (1:size(A,1)).', [], @(x){A(x,:)});
Cell_1 = [M{1}]
Cell_3 = [M{3}]
This does not include the datetime calls, howefver that would be straightforward if the date and time numbers needed to be converted.

5 commentaires

Jake
Jake le 3 Mar 2020
Thank you very much.
Although the column B stores Date/Time data, it can be treated as just another number. Because I will be changing it in the CSV file. I added that detail just to clarify. Sorry if that was a confusion.
So, I should print seperate csv or txt files after this approach?
(sorry, I'm still a bit new)
My pleasure!
I would do this for all of them:
for k = 1:numel(M)
writetable(array2table([M{k}]), sprintf('ID%6d.csv',M{k}(1,1)))
end
Choose appropriate names for the files (I used the ‘ID’ here, assuming that none of the ‘ID’ numbers have more than six digits, so you will need to change the format descriptor if they do not) and then use the writetable function to write the files, as I did here.
I tested that and it works. (Now I have to delete those files!)
Jake
Jake le 3 Mar 2020
Thanks! Your approach works perfectly on the dummy sample.
However, I figured you used A = [[3 3 3 4 4 6 6 6 6 7]' a]; because there are 3 data for first ID, 2 data for second ID, 4 data for third ID and one data for the last ID (as from my sample). But I don't know how many data are there per one ID in the actual data set. (There are over 10,000 rows :( )
Would that make a difference?
Jake
Jake le 3 Mar 2020
Perfect!
It works without a single error. You made the day.
Thanks a lot!
Star Strider
Star Strider le 3 Mar 2020
As always, my pleasure!
That should not make any difference, even if the ‘ID’ values are not contiguous in the original file. The tables will have them in the order they appear in the original file, and the individual files will be the same as the cell arrays accumarray created, that are then created as tables.
Note that the tables also give you the ability to label the columns (variables) to be what you want. See the documentation on table (and associated documentation) for those options.
Also, I would put them all in the same folder/subdirectory so you can find them easily. Use fullfile to create the appropriate filenames (including the subdirectory path), and use addpath to add that path to your MATLAB search path. See What Is the MATLAB Search Path? for those details, if you are not familiar with them.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by