Regarding reading mulitple csv file with specific rows

I have .csv 30 files with name inst0.csv to inst30.csv. First 24 row contain unnecessary data and I do not want remove it manually and due to unnecessary data I am facing problem with importing data with header issue.
I want to read all data including header Scan,Time,101 <M1_RW1> (C),102 <M1_RW2> (C),103 <M1_RS1> (C),104 <M1_RS2> (C),105 <M1_RN2> (C),106 <M1_RD2> (C),107 <M1_RW3> (C),108 <M1_RW4>
those data will find row 25 or 26 but the problem is the Header row of Scan and time keep changing for every file and I want to be read and process the data having header of scan time for multiple files. So how to do it?
Also,How to get all the files from folder and read it as table using drop down in live editor?

 Réponse acceptée

Read the headerline containing the number of channels in the dataset so you can compute the number of headerlines before the dataset of interest.
I just did the following at the command line--
for i=1:numel(d)
fid=fopen(d(i).name,'r');
nhdr=8+cell2mat(textscan(fid,'Total Channels:%f',1,'headerlines',5,'delimiter',','));
fid=fclose(fid);
opt{i}=detectImportOptions(d(i).name,'headerlines',nhdr);
preview(d(i).name,opt{i})
end
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions
property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans =
8×18 table
Scan Time x101_M1_RW1__C_ x102_M1_RW2__C_ x103_M1_RS1__C_ x104_M1_RS2__C_ x105_M1_RN2__C_ x106_M1_RD2__C_ x107_M1_RW3__C_ x108_M1_RW4__C_ x109_M1_RT3__C_ x201_M1_RM2__C_ x203_M1_RN1__C_ x204_M1_RD1__C_ x205_M1_RR1__C_ x206_M1_RSD1__C_ x207_M1_RSD2__C_ x208_M1_RF1__C_
____ ___________________________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ ________________ ________________ _______________
1 {'2020-12-10 14:02:35:214'} 9.9e+37 27.855 27.884 27.841 28.037 27.857 27.707 27.769 9.9e+37 -9.9e+37 9.9e+37 -9.9e+37 27.864 -9.9e+37 -9.9e+37 9.9e+37
2 {'2020-12-10 14:02:38:097'} 9.9e+37 27.853 27.885 27.839 28.036 27.855 27.705 27.769 9.9e+37 9.9e+37 -9.9e+37 -9.9e+37 27.859 -9.9e+37 -9.9e+37 9.9e+37
3 {'2020-12-10 14:02:40:145'} 9.9e+37 27.851 27.884 27.836 28.035 27.854 27.704 27.767 9.9e+37 9.9e+37 9.9e+37 -9.9e+37 27.859 -9.9e+37 -9.9e+37 9.9e+37
4 {'2020-12-10 14:02:42:368'} 9.9e+37 27.851 27.883 27.836 28.035 27.851 27.703 27.765 9.9e+37 9.9e+37 9.9e+37 -9.9e+37 27.858 -9.9e+37 -9.9e+37 9.9e+37
5 {'2020-12-10 14:02:44:414'} 9.9e+37 27.851 27.88 27.836 28.033 27.853 27.704 27.763 9.9e+37 9.9e+37 9.9e+37 -9.9e+37 27.858 -9.9e+37 -9.9e+37 9.9e+37
6 {'2020-12-10 14:02:46:461'} 9.9e+37 27.846 27.879 27.832 28.032 27.853 27.703 27.763 9.9e+37 9.9e+37 9.9e+37 -9.9e+37 27.858 -9.9e+37 -9.9e+37 9.9e+37
7 {'2020-12-10 14:02:48:509'} 9.9e+37 27.849 27.879 27.833 28.029 27.852 27.701 27.761 9.9e+37 9.9e+37 9.9e+37 -9.9e+37 27.858 -9.9e+37 -9.9e+37 9.9e+37
8 {'2020-12-10 14:02:50:557'} 9.9e+37 27.847 27.88 27.832 28.03 27.852 27.699 27.761 9.9e+37 9.9e+37 9.9e+37 -9.9e+37 27.857 -9.9e+37 -9.9e+37 9.9e+37
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions
property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans =
8×19 table
Scan Time x101_M1_RW1__C_ x102_M1_RW2__C_ x103_M1_RS1__C_ x104_M1_RS2__C_ x105_M1_RN2__C_ x106_M1_RD2__C_ x107_M1_RW3__C_ x108_M1_RW4__C_ x109_M1_RT3__C_ x201_M1_RM2__C_ x202_M1_RM1__C_ x203_M1_RN1__C_ x204_M1_RD1__C_ x205_M1_RR1__C_ x206_M1_RSD1__C_ x207_M1_RSD2__C_ x208_M1_RF1__C_
____ ___________________________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ ________________ ________________ _______________
1 {'2020-12-10 14:11:13:851'} 9.9e+37 27.623 27.652 27.607 27.814 27.678 27.483 27.544 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.776 9.9e+37 9.9e+37 9.9e+37
2 {'2020-12-10 14:11:15:827'} 9.9e+37 27.623 27.651 27.607 27.81 27.681 27.483 27.542 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.777 9.9e+37 9.9e+37 9.9e+37
3 {'2020-12-10 14:11:17:827'} 9.9e+37 27.622 27.65 27.606 27.812 27.678 27.481 27.542 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.778 9.9e+37 9.9e+37 9.9e+37
4 {'2020-12-10 14:11:19:827'} 9.9e+37 27.623 27.65 27.606 27.813 27.676 27.482 27.541 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.777 9.9e+37 9.9e+37 9.9e+37
5 {'2020-12-10 14:11:21:827'} 9.9e+37 27.62 27.649 27.602 27.81 27.678 27.479 27.539 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.777 9.9e+37 9.9e+37 9.9e+37
6 {'2020-12-10 14:11:23:827'} 9.9e+37 27.621 27.648 27.602 27.808 27.675 27.48 27.54 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.777 9.9e+37 9.9e+37 9.9e+37
7 {'2020-12-10 14:11:25:827'} 9.9e+37 27.619 27.647 27.601 27.808 27.674 27.478 27.537 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.776 9.9e+37 9.9e+37 9.9e+37
8 {'2020-12-10 14:11:27:827'} 9.9e+37 27.618 27.646 27.602 27.809 27.676 27.477 27.539 9.9e+37 9.9e+37 9.9e+37 9.9e+37 9.9e+37 27.776 9.9e+37 9.9e+37 9.9e+37
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions
property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans =
8×19 table
Scan Time x101_M1_RW1__C_ x102_M1_RW2__C_ x103_M1_RS1__C_ x104_M1_RS2__C_ x105_M1_RN2__C_ x106_M1_RD2__C_ x107_M1_RW3__C_ x108_M1_RW4__C_ x109_M1_RT3__C_ x201_M1_RM2__C_ x202_M1_RM1__C_ x203_M1_RN1__C_ x204_M1_RD1__C_ x205_M1_RR1__C_ x206_M1_RSD1__C_ x207_M1_RSD2__C_ x208_M1_RF1__C_
____ ___________________________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ ________________ ________________ _______________
1 {'2020-12-10 14:49:56:530'} 9.9e+37 26.805 26.834 26.793 27.056 27.001 26.658 26.727 9.9e+37 27.367 27.417 26.644 27.241 27.302 9.9e+37 9.9e+37 9.9e+37
2 {'2020-12-10 14:49:58:507'} 9.9e+37 26.803 26.831 26.789 27.057 27 26.656 26.728 9.9e+37 27.367 27.418 26.644 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
3 {'2020-12-10 14:50:00:507'} 9.9e+37 26.799 26.831 26.786 27.054 26.999 26.652 26.724 9.9e+37 27.367 27.418 26.643 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
4 {'2020-12-10 14:50:02:507'} 9.9e+37 26.795 26.828 26.785 27.056 26.998 26.647 26.723 9.9e+37 27.366 27.417 26.643 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
5 {'2020-12-10 14:50:04:507'} 9.9e+37 26.791 26.827 26.785 27.053 26.999 26.642 26.72 9.9e+37 27.366 27.417 26.643 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
6 {'2020-12-10 14:50:06:507'} 9.9e+37 26.786 26.824 26.782 27.052 26.996 26.639 26.72 9.9e+37 27.365 27.416 26.642 27.239 27.301 9.9e+37 9.9e+37 9.9e+37
7 {'2020-12-10 14:50:08:507'} 9.9e+37 26.78 26.821 26.779 27.05 26.997 26.631 26.716 9.9e+37 27.365 27.416 26.641 27.239 27.301 9.9e+37 9.9e+37 9.9e+37
8 {'2020-12-10 14:50:10:507'} 9.9e+37 26.771 26.821 26.774 27.051 26.996 26.626 26.716 9.9e+37 27.365 27.415 26.641 27.238 27.301 9.9e+37 9.9e+37 9.9e+37
>>
Fixup the import object to read the datetime format.
You're on your own with the live editor; never used it.

17 commentaires

Arshey Dhangekar
Arshey Dhangekar le 27 Mai 2021
Modifié(e) : Arshey Dhangekar le 27 Mai 2021
What 'd' is defined here? could you paste the whole code so that I can understand better
Also How can I store those 30 .csv files in matlab and call it one by one for plotting data
d=dir('inst0*.csv');
You have each data file in turn in the loop, just use readtable instead of preview and save the result into a cell array is the most simplest coding-wise--
tInst{i}=readtable(d(i).name,opt{i});
Do whatever you want with each table in turn inside the loop or save the whole cell array of tables as a mat file to load and process later.
That's all the code needed--the power of MATLAB in action! :)
Hello I got error of "Array indices must be positive integers or logical values" for tInst{i}=readtable(d(i).name,opt{i});
clc
clear all
d=dir('inst0*.csv')
tInst{i}=readtable(d(i).name,opt{i});
for i=1:numel(d)
fid=fopen(d(i).name,'r');
nhdr=8+cell2mat(textscan(fid,'Total Channels:%f',1,'headerlines',5,'delimiter',','));
fid=fclose(fid);
opt{i}=detectImportOptions(d(i).name,'headerlines',nhdr);
preview(d(i).name,opt{i})
end
Because you put it outside the loop, i isn't yet defined...reread what I wrote--"...use readtable instead of preview...".
THINK about what you're trying to do, don't just blindly cut 'n paste. You can still preview the data if you want, but it's only that; displaying the first 8 records of the read table so isn't of any other use--I just did that to show that it does work as wanted. If you still want to see all that stuff scroll off the window anyway, replace preview with head after read the table. See the doc for head for rest of syntax.
Hello,
Thank you it works for me. But I got error No data is available to preview with the import options provided on preview(d(i).name,opt{i}). How can I read remaining datas from those excel file because every file has different ending rows that can see in csv files. From Header to till end I want read data and want to know which file I doing data processing?
dpb
dpb le 27 Mai 2021
Modifié(e) : dpb le 27 Mai 2021
Well, again, without seeing the code you actually used, how are we to know? And we don't know what is in all 30 files; only what was in each of the three.
Again, preview is what it says it is; it ONLY displays a smidgen of the beginning data contained in the file so you can observe/give a peek at what the table contains; it is nothing more or less. For the files you uploaded we see
for i=1:numel(d),fid=fopen(d(i).name,'r');nhdr=8+cell2mat(textscan(fid,'Total Channels:%f',1,'headerlines',5,'delimiter',','));fid=fclose(fid);
opt{i}=detectImportOptions(d(i).name,'headerlines',nhdr);tInst{i}=readtable(d(i).name,opt{i});end
>> cellfun(@height,tInst)
ans =
15 8 1333
>>
which shows those three files read 15, 8, and 1333 records, respectively. If there's something different in the ends of the files besides the data format of the section read, then code will have to be written to read whatever that might be; you can't expect magic.
>> [head(tInst{3}); tail(tInst{3})]
ans =
16×19 table
Scan Time x101_M1_RW1__C_ x102_M1_RW2__C_ x103_M1_RS1__C_ x104_M1_RS2__C_ x105_M1_RN2__C_ x106_M1_RD2__C_ x107_M1_RW3__C_ x108_M1_RW4__C_ x109_M1_RT3__C_ x201_M1_RM2__C_ x202_M1_RM1__C_ x203_M1_RN1__C_ x204_M1_RD1__C_ x205_M1_RR1__C_ x206_M1_RSD1__C_ x207_M1_RSD2__C_ x208_M1_RF1__C_
____ ___________________________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ ________________ ________________ _______________
1 {'2020-12-10 14:49:56:530'} 9.9e+37 26.805 26.834 26.793 27.056 27.001 26.658 26.727 9.9e+37 27.367 27.417 26.644 27.241 27.302 9.9e+37 9.9e+37 9.9e+37
2 {'2020-12-10 14:49:58:507'} 9.9e+37 26.803 26.831 26.789 27.057 27 26.656 26.728 9.9e+37 27.367 27.418 26.644 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
3 {'2020-12-10 14:50:00:507'} 9.9e+37 26.799 26.831 26.786 27.054 26.999 26.652 26.724 9.9e+37 27.367 27.418 26.643 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
4 {'2020-12-10 14:50:02:507'} 9.9e+37 26.795 26.828 26.785 27.056 26.998 26.647 26.723 9.9e+37 27.366 27.417 26.643 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
5 {'2020-12-10 14:50:04:507'} 9.9e+37 26.791 26.827 26.785 27.053 26.999 26.642 26.72 9.9e+37 27.366 27.417 26.643 27.24 27.302 9.9e+37 9.9e+37 9.9e+37
6 {'2020-12-10 14:50:06:507'} 9.9e+37 26.786 26.824 26.782 27.052 26.996 26.639 26.72 9.9e+37 27.365 27.416 26.642 27.239 27.301 9.9e+37 9.9e+37 9.9e+37
7 {'2020-12-10 14:50:08:507'} 9.9e+37 26.78 26.821 26.779 27.05 26.997 26.631 26.716 9.9e+37 27.365 27.416 26.641 27.239 27.301 9.9e+37 9.9e+37 9.9e+37
8 {'2020-12-10 14:50:10:507'} 9.9e+37 26.771 26.821 26.774 27.051 26.996 26.626 26.716 9.9e+37 27.365 27.415 26.641 27.238 27.301 9.9e+37 9.9e+37 9.9e+37
1326 {'2020-12-10 15:34:06:507'} 9.9e+37 33.727 33.603 33.616 33.356 31.936 33.532 33.654 9.9e+37 35.87 37.264 33.372 36.277 36.58 9.9e+37 9.9e+37 9.9e+37
1327 {'2020-12-10 15:34:08:507'} 9.9e+37 33.725 33.601 33.614 33.355 31.937 33.53 33.653 9.9e+37 35.869 37.262 33.373 36.272 36.578 9.9e+37 9.9e+37 9.9e+37
1328 {'2020-12-10 15:34:10:507'} 9.9e+37 33.72 33.601 33.612 33.356 31.936 33.526 33.649 9.9e+37 35.868 37.26 33.374 36.267 36.578 9.9e+37 9.9e+37 9.9e+37
1329 {'2020-12-10 15:34:12:507'} 9.9e+37 33.719 33.598 33.612 33.355 31.936 33.523 33.646 9.9e+37 35.867 37.258 33.375 36.262 36.576 9.9e+37 9.9e+37 9.9e+37
1330 {'2020-12-10 15:34:14:507'} 9.9e+37 33.718 33.596 33.608 33.358 31.936 33.522 33.643 9.9e+37 35.867 37.256 33.375 36.257 36.575 9.9e+37 9.9e+37 9.9e+37
1331 {'2020-12-10 15:34:16:507'} 9.9e+37 33.715 33.595 33.605 33.357 31.937 33.519 33.639 9.9e+37 35.865 37.254 33.376 36.252 36.575 9.9e+37 9.9e+37 9.9e+37
1332 {'2020-12-10 15:34:18:507'} 9.9e+37 33.711 33.594 33.605 33.354 31.933 33.515 33.639 9.9e+37 35.864 37.252 33.377 36.248 36.573 9.9e+37 9.9e+37 9.9e+37
1333 {'2020-12-10 15:34:20:507'} 9.9e+37 33.709 33.591 33.603 33.355 31.935 33.512 33.636 9.9e+37 35.863 37.249 33.378 36.243 36.572 9.9e+37 9.9e+37 9.9e+37
>>
shows what is in the third table of the sizable number of records; the data in each table will be whatever is in each file that follows the same format...if there's an empty file, we can't control that.
Hello,
All 30 files have same data with same headings I uploaded only 3 files out of 30 which having same data. I apologizes I haven't mentioned about it.
dpb
dpb le 27 Mai 2021
Modifié(e) : dpb le 27 Mai 2021
That's not the issue -- the above code will process all files in the location that begin with the "Inst0" filename and are .csv extensions.
As the above exercise shows, readtable will read all the records in the file from the given data section; the files you uploaded have the precise number of records indicated in them--I looked to check. If your other files are the same, then I have no doubt readtable will read them, too.
Again, preview does NOT save anything; it just shows you a small section of the beginning of the file -- it really is of no help for what you're trying to do other than a visible clue things are working -- it is readtable as shown that will actually return the data.
Then, you use each of those tables stored in a cell array to access the data -- remember you must dereference a cell array with the {} to see the content; read the documentation and "Getting Started" section on basic MATLAB syntax for use of cell arrays and the section on accessing data from tables on the table data class doc for using tables.
The error message about "No data is available to preview..." may be that one on of the files doesn't have any data in the section of interest; we don't know is the point of the original comment. Only you can tell whether that is so or something else happened.
Or, of course, you could have a mistake in your code that you ran but since you didn't post that, we can't tell whether that's a possibility or not.
" readtable will read all the records in the file from the given data section; ...Then, you use each of those tables stored in a cell array to access the data -"
Unless, of course, your intent is to catenate all the data from each file into one (potentially very large) table rather than to keep each as an identifiable table corresponding to the input file?
The same base code outline will work for that purpose, but one would then add each subsequent table to the existing from the previous loop.
Hello,
Could you explain in code where can I make changes so that It imports data from main header to the end of data of every csv file?
You need to make NO changes to do that -- that's what I've been telling you.
Look at what you get to see -- I showed you
>> cellfun(@height,tInst)
ans =
15 8 1333
>>
After the loop finished, the above at the command line shows it read the data in each of the sample files and returned the number of records in each matching the input file length.
Have you done that to see what you did get?
Hello,
I really appreciate you. I am not still understanding. Could just tell me for one file to acess all data from Main Header starts from (Scan, time) currently I am using readtable to read csv file however it retrieves unnessary data which I do not want. So could suggest code for that? In csv file you will file raw data if you read directly that I don't want.
tableT=readtable("inst0 138.221.155.178 11_19_2020 06_55_37 1.csv");
dpb
dpb le 27 Mai 2021
Modifié(e) : dpb le 27 Mai 2021
tableT=readtable("inst0 138.221.155.178 11_19_2020 06_55_37 1.csv");
is NOT using the code I provided you to read the header and determine the number of sensors and thereby compute the lines to skip. So, of course it's going to try to read the full file; what else could/should it do with no instruction otherwise?
Why aren't you using the code I gave you instead?
If you run it as written, you'll end up with a cell array of 30 tables, each containing the data for the file as requested and as I showed for the three sample files.
The only thing without seeing other files would be the possibility of one or more files having no sensors and so there being no data which to read -- if that is the case, then add a logic condition that skips trying to read that file if so --
for i=1:numel(d)
fid=fopen(d(i).name,'r');
nhdr=8+cell2mat(textscan(fid,'Total Channels:%f',1,'headerlines',5,'delimiter',','));
fid=fclose(fid);
if nhdr==8 % skip no data in file cases if that occurs (dpb conjecture)
warning(['No data. Skipped file: ' d(i).name])
continue
end
opt{i}=detectImportOptions(d(i).name,'headerlines',nhdr);
preview(d(i).name,opt{i})
end
Hello,
I am using your code but could you just tell for only one csv file to extract data that can be implemented to import all data from main header till end of rows of each rows? I want extract all the main data not just first 8 entries
That's EXACTLY what the code DOES! TRY IT AND LOOK!!!!
To get the full table replace the call to preview with readtable.
If all the files have the same format, you don't need to call detectImportOptions each time. This will save time.
opts = detectImportOptions(d(1).name,"Delimiter",","); % already know it's csv, avoid detecting it
T = {};
for i=1:numel(d)
fid = fopen(d(i).name,'r');
nhdr = 8 + cell2mat(textscan(fid,'Total Channels:%f',1,'HeaderLines',5,'Delimiter',','));
fid = fclose(fid);
if nhdr == 8 % skip no data in file cases if that occurs (dpb conjecture)
warning(['No data. Skipped file: ' d(i).name])
continue
end
opts.DataLines = [nhdr+2 inf];
opts.VariableNamesLine = nhdr+1;
T{i} = readtable(d(i).name,opts)
end
Arshey Dhangekar
Arshey Dhangekar le 28 Mai 2021
Modifié(e) : Arshey Dhangekar le 31 Mai 2021
Hello thank you so much It is working for me.
How can we use loop for ploting grpah for two columns?
I want to plot graph x vs y for all columns present in table ?
How can I do it for multiple csv file name of "WT_201119.csv" contain 52 main header with column
where I have to do changes? similar to above one csv file
Because I got error
'DataLines' must be specified as a positive integer or a 2-column matrix of positive integers."
And Kudos to Jeremy Hughe!!

Connectez-vous pour commenter.

Plus de réponses (0)

Produits

Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by