Hello,
I got a csv-file that looks like this.
* text here
* more text...
1,20,3,4
2,30,4,5
* text again
3,4,6,7
*text
And so it goes on.
How do I read the csv-file and only get the numeric data. Everything that has a "*" and text after should be disgarded.
Thank you.

 Réponse acceptée

dpb
dpb le 12 Fév 2015

0 votes

doc textscan % NB: optional 'commentstyle' parameter

8 commentaires

Daniel
Daniel le 12 Fév 2015
Modifié(e) : dpb le 12 Fév 2015
Okey. I created TestFile.csv with the data and text in as in my question.
Now my code is:
fileID=fopen('TestFile.csv')
N=4
cdata=textscan(fileID,'%f %f %f %f', ...
N,'CollectOutput',1,'CommentStyle','*')
I get:
cdata =
[1x4 double]
I cant figure out how to get the data from each column in "cdata"?
Thank you.
dpb
dpb le 12 Fév 2015
For these cases where there's no need for a cell array at all I wrap textscan in cell2mat as--
cdata=cell2mat(textscan(fileID,'%f %f %f %f', ...
N,'CollectOutput',1,'CommentStyle','*'));
In general you dereference a cell array with the "curlies" as
cdata(:)
for the full array or "nested indexing" of
cdata(1){r,c)
for a given array element.
See the doc on cell arrays for the fuller details.
But the short story here is that there's no need for the cell arrray and it's unfortunate there's not a way to tell textscan to forego the needless creation of one when isn't needed.
Daniel
Daniel le 17 Fév 2015
Modifié(e) : Daniel le 17 Fév 2015
Thank you! My cdata looks like below when I use cell2mat:
cdata =
1 NaN NaN NaN
"1" is from row 1 and column 1 in my TestFile.csv I thought that it could be a bad csv-file but I tried to open other files to but it gives the same answer.
Am I using the wrong formatSpec?
dpb
dpb le 17 Fév 2015
Dunno...you don't show what you did in context...w/ the sample file copied into a text file here the example worked fine. NaN indicates a conversion of something not recognizable as a number so perhaps there's an embedded hidden character in the file or somesuch???
Daniel
Daniel le 18 Fév 2015
Okey. There should not be andy hidden characters in the file. That is confirmed.
This is my script:
---
fileID=fopen('TestFile.csv')
N=4
cdata=cell2mat(textscan(fileID,'%f %f %f %f',N,'CollectOutput',1,'CommentStyle','*'))
---
And this is the result from Matlab:
---
fileID = 8
N = 4
cdata = 1 NaN NaN NaN
---
And you have the exact same thing and it works for you? That is strange.
Thanks anyway!
dpb
dpb le 18 Fév 2015
Modifié(e) : dpb le 18 Fév 2015
Ayup...
>> type test.csv
* text here
* more text...
1,20,3,4
2,30,4,5
* text again
3,4,6,7
*text
>> fid=fopen('test.csv');
>> cell2mat(textscan(fid,repmat('%f',1,4),'delimiter',',', ...
'commentstyle','*', ...
'collectoutput',1))
ans =
1 20 3 4
2 30 4 5
3 4 6 7
>>
ADDENDUM
Oh, I see it isn't exact same thing; you don't need/want the repeat count specifier. That tells it to apply the format string N times but your file isn't consistent so it breaks when finds a non-numeric form. It would possibly work that way if 'commentstyle' were to force the whole file to be processed, the comment lines removed, then that file processed, but textscan works sequentially, not globally, simply skipping a line beginning with the comment character when it finds one and trying to convert the next line.
Daniel
Daniel le 20 Fév 2015
Modifié(e) : Daniel le 20 Fév 2015
Thank you for your help! It works fine now. So if I had five columns instead of four i would write "1,5". Now I get how it works.
dpb
dpb le 20 Fév 2015
Ayup; it's the silly way C implemented it's format strings ignoring the long-existing pattern used in Fortran wherein there can be a repeat specifier. Just to show they were smarter; the implementers reversed the order of the width field and the conversion type so there's no way to now write a repeat count unambiguously. In Fortran FORMAT it would be 4F8.0; in Matlab which uses C i/o libraries one has to use repmat to double up or write them all explicitly. On the newsgroup am working with a guy at this instant with a 159-column file...writing %f 159 separate times is rather painful as his initial plea noted until one either has the "a-ha!" moment one's self or somebody shows you the trick (S Lord pointed it out to me years ago; I had never thought of repmat for strings for the purpose despite complaining for years. At one time I wrote a mex file that accepted Fortran FORMAT strings and used the Fortran i/o and passed the values back. Unfortunately I lost the source in the retirement move and haven't had the gumption to re-invent it since.
OK, enough geezer stories/griping... :)

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Tags

Question posée :

le 12 Fév 2015

Commenté :

dpb
le 20 Fév 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by