Effacer les filtres
Effacer les filtres

The efficient way (in terms of speed and consistency) for parsing a big text file with textscan

2 vues (au cours des 30 derniers jours)
I have a text file consists of apprx 500 000 lines. I appended the header and some data parts in below.
#dP2021 4 3 0 0 0.00000000 288 u+U IGb14 FIT GFZ
## 2151 518400.00000000 300.00000000 59307 0.0000000000000
++ 10 10 10 6 10 6 10 8 8 8 8 8 8 8 8 8 8
++ 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 8 10
%c M cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%i 0 0 0 0 0 0 0 0 0
%i 0 0 0 0 0 0 0 0 0
/* GeoForschungsZentrum Potsdam
* 2021 4 3 0 0 0.00000000
PC01 -34381.586112 24435.438444 69.245923 -596.854622
PE02 4493.250988 41924.015694 -226.819605 790.650809
PG03 -14754.803607 39520.337126 -938.295010 -436.165931
PG04 -39584.473454 14533.059977 -388.137635 370.305833
* 2021 4 3 0 5 0.00000000
PC01 -34381.437242 24436.228124 74.813357 -596.843988
PE02 4493.541869 41922.959643 -254.934261 790.641523
PG03 -14753.360421 39519.882073 -951.586932 -436.156224
PG04 -39584.568840 14533.349312 -380.297839 370.469467
I need to count separately for the all PC[0-9][0-9], PE[0-9][0-9], and PG[0-9][0-9] strings in the first column of data section after the header section and date. What is the efficent way for doing this using textscan?
  5 commentaires
sermet OGUTCU
sermet OGUTCU le 15 Mai 2021
Dear @Jan, the format of output is not important but it will be created as string array such as;
output=["PC" "120";"PG" "200";"PE" "110"]
Sulaymon Eshkabilov
Sulaymon Eshkabilov le 15 Mai 2021
You can test: fscanf() that works in a similar way alike textscan(). Specifiers and other parameters are the same.

Connectez-vous pour commenter.

Réponse acceptée

Jan le 15 Mai 2021
Modifié(e) : Jan le 15 Mai 2021
Str = fileread(FileName);
C = strsplit(Str, '\n');
nPC = sum(strncmp(C, 'PC', 2));
nPG = sum(strncmp(C, 'PG', 2));
nPE = sum(strncmp(C, 'PE', 2));
If the file do not match into your RAM:
fid = fopen(FileName, 'r');
nPC = 0;
nPG = 0;
nPE = 0;
while ~feof(fid)
s = fgets(fid);
if strncmp(s, 'PC', 2)
nPC = nPC + 1;
elseif strncmp(s, 'PG', 2)
nPG = nPG + 1;
elseif strncmp(s, 'PE', 2)
nPE = nPE + 1;

Plus de réponses (0)


En savoir plus sur Text Data Preparation dans Help Center et File Exchange


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by