Read a column which has comma seperated values
9 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi,
I have a .csv file which contains 5 columns - the first represents time, second to fifth columns are numbers, among the these columns the third column contains number separated with 'comma'. I was able to read the data, if there is no comma, but how to read the data with comma??
My .csv looks like this:
0:00:02 6 984 186.8 11.76
0:00:03 7 984 186.8 11.76
0:00:04 7 984 186.8 11.76
0:00:05 7 1,123 183.2 10.2
0:00:06 6 1,123 183.2 10.2
0:00:07 6 1,123 183.2 10.2
0:00:08 4 1,123 183.2 10.2
Here is my code to read the .csv file:
fid = fopen('file.csv');
allData = textscan(fid,'%s %d %d %d %*d','delimiter',',','collectoutput',1,'headerlines',1);
fclose(fid);
My code was able to read the first three line only.
Is there a way to read the comma separated values using textscan??
2 commentaires
per isakson
le 9 Avr 2012
I'm stubborn. My code below reads all seven lines! The ",", which I'm convinced is a "thousand separator", must be handled by special code.
per isakson
le 9 Avr 2012
textscan reads "comma separated values" without problems. However, your example is not "comma separated values". The extension "csv" doesn't change that.
Réponses (3)
per isakson
le 8 Avr 2012
The problem is most likely a 'thousand separator'. "[...]contains number separated with 'comma'." That is not two number separated by 'comma', but one number larger the one thousand.
0:00:02 6 984 186.8 11.76
0:00:03 7 984 186.8 11.76
0:00:04 7 984 186.8 11.76
0:00:05 7 1,123 183.2 10.2
0:00:06 6 1,123 183.2 10.2
0:00:07 6 1,123 183.2 10.2
0:00:08 4 1,123 183.2 10.2
See my answere to Data extraction from .txt file. This provides three alternative solutions. The last one offers the opportunity to operate on the cell array, cac, before parsing. Add the line
cac = strrep( cac{:}, ',', '' );
to remove the "thousand separator" (','). This illustrates one way to do it. You will need to edit the code. This smells a bit, but I don't think there is a better way.
--- EDIT ---
Your sample text contains three different separators - I assume
- decimal separator, "."
- thousand separator, ","
- list separator, " " --- EDIT --- 'Delimiter'
Matlab don't recognize the "thousand separator" regardless of what character is used to represent it. It has to be removed.
In addition "0:00:02" causes problems. There is only one character for hour.
This code reads the text and parses the numeric data. Time must be parsed in some dirty way. How come it is allowed to use a format like this.
fid = fopen( '.\cssm.txt', 'r' );
cac = textscan(fid,'%s','Whitespace','',Delimiter','\n','Headerlines',1 );
sts = fclose( fid );
cac = cac{:};
cac = strrep( cac, ',', '' ); % remove the "thousand separator"
cac = strtrim( cac ); % not needed(?)
str = transpose( char( cac ) );
str = cat( 1, str, repmat( ' ', 1, size(str,2) ) );
ca2 = textscan( str, '%s%f%f%f%f' );
num = [ ca2{2:end} ];
6 commentaires
per isakson
le 9 Avr 2012
Two different types of separators cannot be represented with the same character. Matlab cannot handle that. Since "dot" is used as decimal separator it cannot be used as list separator in the same file.
If there is one header line in the file add "'Headerlines', 1" to the first textscan call.
I cannot believe that "1,123" should be interpreted as two values!
per isakson
le 9 Avr 2012
I seems impossible to use continuation lines in the code part. "..." just disappears
Image Analyst
le 8 Avr 2012
Why not use a space as the delimiter? Or alternatively, try dlmread() with a space as the delimiter? It might have trouble with the colons in the time though (not sure).
3 commentaires
Image Analyst
le 8 Avr 2012
So did you pass it a space: ' '??? Or did you pass it no characters or multiple characters?
Peter Wittenberg
le 8 Avr 2012
What I have done for similar circumstances (and I have seen even worse) is to use the importdata function. Now importdata will read your whole set of data in as a string
myData = importdata(fullfile(mypathname, fileToBeConverted),',');
What happens now is that you have a big set of data. Now
remain = myData(2:end,:)
will cut out a single header line (change it if you have more). Now it is time to parse the data.
[dateStrin, remain] = strtok(remain,','); %#ok<*STTOK>
time=datenum(dateStrin,'HH:MM:SS.FFF');
What this does is to pull in your datestring with all those ugly colons and format it into a MATLAB date string. You'll need to adjust it for the day on which your events occurred and you may need to fiddle with it a bit more. Now we need to look out for all those other variables. We do that with strtok. The last set of code already removed the data string from remain, so we're ready to move ahead.
[myfirstnumvar, remain] = strtok(remain,',');
firstvar=str2num(char(myfirstnumvar)); %#ok<*ST2NM>
Now we've done the same thing with the first numeric variable. We go ahead like this for each of the columns that you have. while this is a modification from code I have actually written and successfully used, you will proba bly need small modifications to get yours to work The basic concept is to take the whole mess in as a string and then go after each column separately. You will need to remove the comma from one of your string variables. Some might ask why I produced something elaborate using successive string tokens. The basic reason is that the formats of the data changed (even worse than yours), so I need to read in the data as strings and have the flexibility to make my changes to each variable for its format. It's not really that hard to do wthis way. You are beyond using MATLAB's automated data reading to read in your whole file without incident, so this approach takes everything as strings and allows you to do your own parsing differently for each variable.
2 commentaires
Voir également
Catégories
En savoir plus sur Text Files dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!