Reading and combining multiple .txt files

Hi everyone
I have received a number of data files related to ship speed, each file is for 1 day with readings taken at 1 minute intervals. I would like to combine these files to create one data set covering a period of operations, before analysing this for time spent in different speed ranges over the time period.
My first issue is how to combine the files. Each .txt file contains a header on the first row, so I would like to eliminate this from the data set. The files are also named e.g. 01-Nov-03, so are not in a purely numerical order. I would like to eventually be able to combine n number of files, so I can extend the evaluation period as required.
I studied Matlab at university but am a little rusty now, so any help is greatly appreciated :)
Best regards, Henry

4 commentaires

Hi Henry,
Colud you give some more information on the text files? Are they just a colum vector of (60*24 =)1440 speed values? Aside from the header is there any other rows etc.?
In terms of the file names problem a better approch might be to use the date the file was created as a unique identifier, this would need to be unique or it could cause some robustness problems!
Christopher
Mathieu NOE
Mathieu NOE le 28 Mar 2022
hello
why not share (if not confidential) a txt file and make a sketch about which data you're interested in
Henry Allen
Henry Allen le 28 Mar 2022
Hi Christopher. Thanks for the reply.
In each .txt file, there are 17 columns, 1441 rows (inc. headers). I am interested in the 13th column (initial colum = 0), speed through the water. Each row is a collection of different data taken at 1 minute intervals.
Re. file names, the dates are the same for all files, I presume due to how they were copied across to myself.
Best regards, Henry
Henry Allen
Henry Allen le 28 Mar 2022
Example .txt files attached.
Thanks, Henry

Connectez-vous pour commenter.

Réponses (1)

Mathieu NOE
Mathieu NOE le 28 Mar 2022
hello again
try this - we can expand on that for further processing or saving...
natsortfiles is a very usefull tool to make sure files are sorted in natural order (what matlab is not good at)
clc
clearvars
% read current filenames in folder
S = dir('**/*.txt');
S = natsortfiles(S); % natsortfiles now works on the entire structure
% natsortfiles available from FEX :
% https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort?s_tid=srchtitle
figure(1),hold on
for k = 1:numel(S)
S(k).name % display file name in command window (for info only, you can remove / comment this line)
F = fullfile(S(k).folder, S(k).name);
%S(k).data = load(F); % or READTABLE or whatever.
out = readmatrix(F ,"NumHeaderLines", 1);
S(k).data = out(:,13); % this store the 13th column
% plot (for fun)
legstr{k} = S(k).name; % legend string
plot(S(k).data);
end
legend(legstr);
% % Take a look in the structure S: it contains all of your file data and the corresponding filenames, just as you require.
% % For example, the 2nd filename and its data:
% S(2).name
% S(2).data

3 commentaires

Stephen23
Stephen23 le 28 Mar 2022
Modifié(e) : Stephen23 le 28 Mar 2022
Note that NATSORTFILES will not sort dates (written using month-name abbreviations) into chronological order: e.g. "Feb" will come before "Jan", which is unlikely to be the desired order.
Note that even if the abbreviated month-names somehow sorted into the correct order the order of the date units in those filenames is written from smallest unit to biggest unit, so the dates would still not sort into chronological order (instead you would get a big groups of all 1sts of the month, then a big groups of all 2nds of the month, etc.). Basically that very unfortate date format makes processing the files more complex.
The simplest solution by far is to change the filenames to use an ISO 8601 date format:
then a trivial character sort will return the filenames in chronological order. Parsing into DATETIME would probably be simplest and most efficient way to sort the existing filenames into chronological order.
Henry Allen
Henry Allen le 28 Mar 2022
Thank you, this is all a very good start :) @Mathieu NOE I have tried to run the script myself and it is creating the directory of the files, S. However, then appears "Unrecognized function or variable 'natsortfiles'", so I am unable to get the same plot. That plot would be very helpful, I can then manually limit my number of files within the directory to the time period I wish to visualise, and this would be a great accompaniment to the summary statistics (I am anticipating just to use a COUNTIF function to get % of time spent in different speed bands).
@Stephen Yes I really don't think the file names, and indeed the dates created being the same for all, really help much, and with 3000+ files i dont plan to change the names manually! In some respects, for the summary statistics the date the speed is registered is not so important as I just wish to sum them within speed bands and divide by total data set size to give me % operating in each band. I could manually limit my .txt files within the directory to be the time period I wish to evaluate.
Thanks, Henry
Mathieu NOE
Mathieu NOE le 28 Mar 2022
hi again
you can simply remove or comment the line with natsortfiles for the time being
in the future , you may want to download that usefull function from the File Exchange

Connectez-vous pour commenter.

Catégories

En savoir plus sur File Operations dans Centre d'aide et File Exchange

Produits

Commenté :

le 28 Mar 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by