Importing specific rows of Data from Text file

35 vues (au cours des 30 derniers jours)
pramit Sood
pramit Sood le 16 Déc 2014
Commenté : pramit Sood le 10 Jan 2015
Hi;
I am having some sensor data which is a very large text (.dat) file. Some of the relevant data from this file needs to be analyzed and plotted through help of MATLAB.
The example for the data is like:
C 0 0.001 -0.02 24.09 4.64 -100.00 -100.00
C 0 1.005 0.29 24.09 4.43 -100.00 -100.00
C 0 2.009 -0.34 24.09 8.26 -100.00 -100.00
C 0 3.014 -0.18 24.06 6.06 -100.00 -100.00
C 0 4.018 0.07 24.06 5.61 -100.00 -100.00
C 0 5.022 0.02 24.09 4.92 -100.00 -100.00
C 0 6.026 0.34 24.12 4.28 -100.00 -100.00
C 0 7.030 -0.46 24.09 8.37 -100.00 -100.00
C 0 8.034 -0.23 24.09 7.50 -100.00 -100.00
R 0 60.275 -0.157674 -0.006891 0.000000 ......
Now all I want to import to MATLAB and analyze is the rows which start with this alphabet 'R' , which stands for Result. There is a pattern to the occurrence of 'Result' data in this big text file. The 'R' row occurs at an interval of every 160 rows.
How can I achieve this solution to import only these rows which tell the 'Result' into MATLAB, maybe interactively or programmatically. I would deeply appreciate a detailed answer as I am on intermediate level of MATLAB programming.
Thank you so much in advance! Pramit
  1 commentaire
per isakson
per isakson le 7 Jan 2015
"very large text (.dat) file" &nbsp How large is that? The size is important. Does the entire file fits in memory? The total time of reading the file might be a problem.

Connectez-vous pour commenter.

Réponse acceptée

per isakson
per isakson le 7 Jan 2015
Modifié(e) : per isakson le 8 Jan 2015
If the entire file fits in memory, try this code
>> num = cssm()
num =
0 60.2750 -0.1577 -0.0069 0
0 60.2750 -0.1577 -0.0069 0
0 60.2750 -0.1577 -0.0069 0
0 60.2750 -0.1577 -0.0069 0
where
function out = cssm()
% read the entire file to one cell array with one row per cell
fid = fopen( 'cssm.txt', 'r' );
cac = textscan( fid, '%s', 'Delimiter', '\n' );
[~] = fclose( fid );
% find rows which begin with 'R'.
isR = cellfun( @(str) strncmp(strtrim(str),'R',1), cac{:} );
% extract the rows beginning with 'R'
rlt = cac{:}(isR);
% join all rows with results to one long string separated by '\n'
one_str = strjoin( rlt, '\n' );
% parse the string.
result = textscan( one_str, '%c%f%f%f%f%f', 'CollectOutput',true );
% make sure that only results are included in the output
assert( strcmp( unique(result{1}), 'R' ) ...
, 'Non-result rows included in result' )
out = result{2};
end
and where cssm.txt contains
C 0 6.026 0.34 24.12 4.28 -100.00 -100.00
C 0 7.030 -0.46 24.09 8.37 -100.00 -100.00
C 0 8.034 -0.23 24.09 7.50 -100.00 -100.00
R 0 60.275 -0.157674 -0.006891 0.000000
C 0 6.026 0.34 24.12 4.28 -100.00 -100.00
C 0 7.030 -0.46 24.09 8.37 -100.00 -100.00
C 0 8.034 -0.23 24.09 7.50 -100.00 -100.00
R 0 60.275 -0.157674 -0.006891 0.000000
C 0 6.026 0.34 24.12 4.28 -100.00 -100.00
C 0 7.030 -0.46 24.09 8.37 -100.00 -100.00
C 0 8.034 -0.23 24.09 7.50 -100.00 -100.00
R 0 60.275 -0.157674 -0.006891 0.000000
C 0 6.026 0.34 24.12 4.28 -100.00 -100.00
C 0 7.030 -0.46 24.09 8.37 -100.00 -100.00
C 0 8.034 -0.23 24.09 7.50 -100.00 -100.00
R 0 60.275 -0.157674 -0.006891 0.000000
&nbsp
... and an alternative, which is an order of magnitude faster
function out = faster()
% read the entire file to one string
str = fileread( 'cssm.txt' );
% find start and end indicies of all the "rows" beginning with 'R'
xpr = '(?<=\s)R[^(\n|\r)]+(\n|\r){1,2}';
[ix1,ix2] = regexp( str, xpr, 'start', 'end' );
% extract the "rows" beginning with 'R'
isi = false(1,length(str));
for ii = 1:length(ix1)
isi(ix1(ii):ix2(ii))=true;
end
one_str = str(isi);
% parse the string.
result = textscan( one_str, '%c%f%f%f%f%f', 'CollectOutput',true );
% make sure that only results are included in the output
assert( strcmp( unique(result{1}), 'R' ) ...
, 'Non-result rows included in result' )
out = result{2};
end
  1 commentaire
pramit Sood
pramit Sood le 10 Jan 2015
Thank you so much!!
Regards Pramit

Connectez-vous pour commenter.

Plus de réponses (3)

Shoaibur Rahman
Shoaibur Rahman le 18 Déc 2014
I think the following code will serve your purpose. I assume that the text file is named as textFile.txt , and is saved in your working directory, otherwise add the file path.
A few things about the code for your better understanding (yet, if you may have questions, please feel free to contact me):
  • cellData is your text data in cellular form.
  • First for loop finds the starting row of your data.
  • Second set of for loops generates a matrix ResultData that contains all your result data, so you can use that matrix for further analyses. Each row of ResultData corresponds to each your result row in the original text file, except the name R.
filename = '/textFile.txt';
delimiter = ' ';
formatSpec = '%s%f%f%f%f%f%f%f%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'MultipleDelimsAsOne', false, 'ReturnOnError', false);
fclose(fileID);
DataIndex = 2:8;
dataArray(DataIndex) = cellfun(@(x) num2cell(x), dataArray(DataIndex), 'UniformOutput', false);
cellData = [dataArray{1:end-1}];
for k = 1:size(cellData,1)
if cellData{k,1} == 'R'
RstartRow = k;
break
end
end
R_rows = RstartRow:160:size(cellData,1);
for k = 1:length(R_rows)
for kk = 2:size(cellData,2);
ResultData(k,kk-1) = cellData{k,kk};
end
end
  3 commentaires
Shoaibur Rahman
Shoaibur Rahman le 28 Déc 2014
Hi,
Thank you. Lets discuss this together, and to do so, we first take a simple example:
out = cellfun(@mean, {1:10,1:5})
This computes the mean of the two vectors 1:10 and 1:5. Each output is of same size, type and scaler, so 'UniformOutput' will be true, which is default.
Now, dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'MultipleDelimsAsOne', false, 'ReturnOnError', false); returns dataArray with different size and type (both cell and double).
To convert all of them into cell, we use a function handle defined by @(x) num2cell(x), where x is a dummy variable and is used to pass dataArray(DataIndex).
We also want to do this only with the data we are interested with, which is set by dataArray(DataIndex).
Finally, because each output is nonscalar and may be of different size, set UniformOutput to false.
pramit Sood
pramit Sood le 7 Jan 2015
Hey! Thanks for your great explanation! :)
I seriously need to discuss my final code (which I created based on your code as reference), as it is not giving me proper results..
I am attaching the code as a text file.. Please can you help me to understand where the problem lies.. I have troubleshooted it 5 times.. I cant get it... :(
It would be great if you could help..
Regards Pramit

Connectez-vous pour commenter.


Sean de Wolski
Sean de Wolski le 16 Déc 2014
You'll have to use textscan which provides an option for skipping rows. If you provide a small file (1000 rows or so) we can probably help out more.

Sudharsana Iyengar
Sudharsana Iyengar le 18 Déc 2014
Modifié(e) : Sudharsana Iyengar le 18 Déc 2014
I dont know if this would help. when i looked at your sensor data the first column consisted of strings while remaining columns consisted of numbers. you can do two of the following.
1) open your csv file in excel and arrange it asccending or descending and pick up the values manually.
2) you can set the string values into their ascii form. In your data C and M were there.ascii value for C is 01000011 and M is 01001101 and for R is 01010010. After making the transformation. you can import your data into matlab.
This would be stored as matrix. with first column having ascii values and remaing 7 with numbers. then you can use the following code.
your data will be stored as untitled.
j=1;k=1;l=1;
for i=1:length(untitled(:,1))
if untitled(i,1)== 01000011
B(j,1)=untitled(i,2);B(j,2)=untitled(i,3);B(j,3)=untitled(i,4);B(j,4)=untitled(i,5);B(j,5)=untitled(i,6);B(j,6)=untitled(i,7);j=j+1; %storing the remaining 7 columns as a separate varaible
end
if untitled(i,1)==01001101
C(k,1)=untitled(i,2);C(k,2)=untitled(i,3);C(k,3)=untitled(i,4);C(k,4)=untitled(i,5);C(k,5)=untitled(i,6);C(k,6)=untitled(i,7);k=k+1;
end
if untitled(i,1)==01010010
D(l,1)=untitled(i,2);D(l,2)=untitled(i,3);D(l,3)=untitled(i,4);D(l,4)=untitled(i,5);D(l,5)=untitled(i,6);D(l,6)=untitled(i,7);l=l+1;
end
end
This will create 2 files B C and D for separate C M and R values. Let me know if this was help full.

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by