Removing Excess Data From a .csv File using Textscan
Afficher commentaires plus anciens
Good afternoon, all. I'm hoping someone with a keen eye can help. I have a .csv file I'm reading into MATLAB using textscan. Here's an example of my code:
fid = fopen('file.csv');
fstring = '%f %*14s %f %f/%f %f:%f:%f %f/%f %f:%f:%f %*1s %*s %*4s %f/%f %f:%f:%f %f %f %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s';
fstring = strcat(fstring, '%*[^\n]');
line = fgetl(fid);
data_matrix = [];
while (line ~= -1)
data = textscan(fid,fstring,'MultipleDelimsAsOne',0, 'Delimiter',',');
B = cell2mat(data);
data_matrix = [data_matrix;B];
line = fgetl(fid);
end
fclose(fid);
The goal is to get rid of the extraneous columns so I have a m-by-n matrix of numerical data. 99% of the time, this technique works flawlessly for what I need it to do, but this time, there's an issue I can't put my finger on. What I want to do is delete the last 10 columns, which correspond to the last 10 %*s entries in 'fstring'; however this isn't happening. Everywhere else there's a %*s, the data gets deleted like I want except for the columns in question. Of those 10 columns, the first 2 contain data and the rest are blank. Neither columns 1/10 or 2/10 contain data of fixed length, so I can't use %*'number_of_characters's as I did at the beginning of fstring.
As an alternative, I substituted %*[\n] for the repeated %*s entries per MathWorks' textscan help page, but I get the same error. For what it's worth, if I manually delete the last 10 columns in the source file (and subsequently don't include the multiple %*s entries in fstring), everything works just fine.
If I left anything unclear, don't hesitate to call me on it and I'll do my best to clarify.
3 commentaires
Walter Roberson
le 30 Avr 2018
Could you show a couple of sample lines of input?
Jeremy
le 30 Avr 2018
Walter Roberson
le 30 Avr 2018
Modifié(e) : Walter Roberson
le 30 Avr 2018
What you have named Example.csv is actually a .xlsx file and not a .csv file at all.
Réponse acceptée
Plus de réponses (2)
per isakson
le 30 Avr 2018
Modifié(e) : per isakson
le 1 Mai 2018
As Walter Roberson noted, the file Example.csv, which you attached, is an Excel-file. textscan cannot read Excel-files.
I made a little test:
- opened the file Example.csv with Excel and saved the content in a csv-file, CsvExample.csv.
- imported CsvExample.csv with your code.
I got the expected result. No problems.
In response to comment
Output from running your code.
>> data = cssm
data =
1.0e+06 *
Columns 1 through 15
9.2271 0.0000 0.0001 0.0020 0.0000 0.0000 0.0000 0.0001 0.0020 0.0000 0.0000 0.0000 0.0001 0.0020 0.0000
9.2297 0.0000 0.0001 0.0020 0.0000 0.0000 0.0001 0.0001 0.0020 0.0000 0.0000 0.0000 0.0001 0.0020 0.0000
Columns 16 through 29
0.0000 0.0000 0.0011 0.0000 0.0000 0.0002 0.0012 0.0000 1.5571 0.1049 9.2271 0.0000 0.0000 0.0000
0.0000 0.0000 0.0011 0.0000 0.0000 0.0005 0.0014 0.0000 1.5570 0.1041 9.2297 0.0000 0.0000 -0.0000
>> data(:,27:29)
ans =
1.0000 0.0319 0.0000
1.0000 0.1440 -0.0067
where
function data_matrix = cssm
fid = fopen('CsvExample.csv');
fstring = '%f%*14s%f%f/%f%f:%f:%f%f/%f%f:%f:%f%*1s%*s%*4s%f/%f%f:%f:%f%f%f%*s%*s%*s%*s%f%f%f%f%f%f%f%f%f%f%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s';
fstring = strcat(fstring, '%*[^\n]');
line = fgetl(fid);
data_matrix = [];
while (line ~= -1)
data = textscan(fid,fstring,'MultipleDelimsAsOne',false, 'Delimiter',',');
B = cell2mat(data);
data_matrix = [data_matrix;B];
line = fgetl(fid);
end
fclose(fid);
end
and where CsvExample.csv is a text file, which contains
A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,AA,AB,AC,AD,AE,AF,AG,AH,AI
9227077,Associated-INT,2,120/2018 17:46:46,120/2018 17:48:23,U,,TRUE,120/2018 18:01:39,1114,5,,,,,9.756394,197.121261,1212.91292,0.207306,1557117,104934,9227077,1,0.031927,0.000001,1114 ADS_TheSpacePlace_05,2f36bbaa-83e7-a4b2-a827-ce108a04421d,,,,,,,,
9229705,Associated-INT,2,120/2018 17:14:53,120/2018 17:17:19,U,,TRUE,120/2018 17:32:22,1117,5,,,,,10.552097,470.651237,1448.617523,4.710569,1557048,104053,9229705,1,0.143954,-0.006698,1117 ADS_TheSpacePlace_08,ea02655b-040e-a6fd-d7b7-d61148cabad3,,,,,,,,
"[...] the result was only 25 columns wide versus 35?"
No, I get 29 columns. The specifier %f appears 29 times in the format string, fstring.
>> fstring = '%f%*14s%f%f/%f%f:%f:%f%f/%f%f:%f:%f%*1s%*s%*4s%f/%f%f:%f:%f%f%f%*s%*s%*s%*s%f%f%f%f%f%f%f%f%f%f%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s';
>> pos = strfind( fstring, '%f' );
>> length( pos )
ans =
29
>>
"the website wouldn't let me upload the .xlsx version"
That's to prevent the user to open the downloaded file with a click. However, changing the extension doesn't change the format of the file. Example.csv is a binary file with a special format that is used by Excel.
9 commentaires
Jeremy
le 30 Avr 2018
Walter Roberson
le 1 Mai 2018
You can zip xlsx files and attach the zip.
But it isn't worth uploading now; people can rename it to xlsx extension if they still want to look at it.
per isakson
le 1 Mai 2018
Modifié(e) : per isakson
le 1 Mai 2018
@Jeremy, I added a response to the my answer.
Jeremy
le 2 Mai 2018
Walter Roberson
le 2 Mai 2018
? Are you working with a .xlsx file or are you working with a csv file? The file you had attached earlier was xlsx and the readtable() based code I attached works on it. If you are working with csv file then I will need a sample of that.
per isakson
le 2 Mai 2018
@Walter, I'm reading a proper csv-file. I opened the attached file with Excel, which required some "persuasion", and SaveAs a csv-file.
Walter Roberson
le 2 Mai 2018
I am concerned, though, that if I save as csv that the result might not be the same as the original csv, so if the original source is csv then it would be better to post some of it.
per isakson
le 2 Mai 2018
Modifié(e) : per isakson
le 2 Mai 2018
Obviously, something differs between my, CsvExample.csv, and the csv-file that @Jeremy tries to read. I attach mine here. That is a sample with two rows. Maybe the real file contains thousand rows and the error occurs while reading the fifth.
@Jeremy, Add 'ReturnOnError',false to the textscan statement. That should give some information on why textscan fails.
Jeremy
le 2 Mai 2018
Sarah Palfreyman
le 30 Avr 2018
0 votes
Catégories
En savoir plus sur Text Files dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!