reading mixed format csv data with empty value '-'

1 vue (au cours des 30 derniers jours)
Jiali
Jiali le 18 Déc 2014
Modifié(e) : per isakson le 19 Déc 2014
The data is mixed formatted. I just list the float number parts to show you my problem.
1 1 100.3 -45000 -0.23 0.2555 600000
2 1 100.3 -45000 -0.23 0.2555 800000
3 1 100.3 -45000 -0.23 0.2555 800000
4 1 - - - - -
5 1 - - - - -
I can not delete the empty lines since I want to know their location. But when I use
textscan (fid,'%f %f %f %f %f %f %f),
I have trouble with class of every column. And If I use
'TreatAsEmpty', '-'
inside textscan, all the negative value will be read as wrong. Does anyone have any suggestions?

Réponses (1)

per isakson
per isakson le 19 Déc 2014
Modifié(e) : per isakson le 19 Déc 2014
If the file together with the parsed result fits in memory try
>> out = cssm('cssm.txt')
out =
1.0e+05 *
0.0000 0.0000 0.0010 -0.4500 -0.0000 0.0000 6.0000
0.0000 0.0000 0.0010 -0.4500 -0.0000 0.0000 8.0000
0.0000 0.0000 0.0010 -0.4500 -0.0000 0.0000 8.0000
0.0000 0.0000 NaN NaN NaN NaN NaN
0.0001 0.0000 NaN NaN NaN NaN NaN
>>
where
function out = cssm( filespec )
str = fileread( filespec );
str = regexprep( str, '(?<=\s)\-(?=\s)', 'nan' );
out = cell2mat(textscan(str,'%f%f%f%f%f%f%f','CollectOutput',true ));
end
and where cssm.txt contains
1 1 100.3 -45000 -0.23 0.2555 600000
2 1 100.3 -45000 -0.23 0.2555 800000
3 1 100.3 -45000 -0.23 0.2555 800000
4 1 - - - - -
5 1 - - - - -
&nbsp
&nbsp
cssm.m above fails if the last character of the file is -. To fix that replace
str = regexprep( str, '(?<=\s)\-(?=\s)', 'nan' );
by
str = regexprep( str, '(?<=\s)\-(?=\s|$)', 'nan' );
to match the character, -, when it is the last character of the string.

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by