MATLAB Answers

Code To Extract Data From .txt File to Excel

39 views (last 30 days)
Jack Morgan
Jack Morgan on 16 Mar 2020
Commented: Jack Morgan on 19 Mar 2020
Hi,
I have several text files with two columns and usually 2000+ rows (picture below). I would like to have some code to extract the value from the bottom row of the second column and put it into an Excel file in a certain place. Each of the text files has a different heading/name (same format as picture below though) and I would like to put the value from the text file into Excel under these headings. If necessary, I can create the headings myself and would be happy to just have code which outputs the same heading's value in the same column each time.
Currently I am doing this manually which takes a long time and will need doing lots more in the future.
Does anyone know how to do this and could help me? Happy to answer more questions or give more details.
Many thanks

  4 Comments

Show 1 older comment
Jack Morgan
Jack Morgan on 17 Mar 2020
Update on the code using three different files as an example.
clc
clear
% 1 Moment Z
fid = fopen('b2-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid);
fid = fopen('b2-moment-z-rfile.out','rt');
data = textscan(fid, '%*f %f', 'HeaderLines', linenum2);
celldisp(data)
fclose(fid);
% 2 Moment Z
fid = fopen('b1-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid);
fid = fopen('b1-moment-z-rfile.out','rt');
data = [data textscan(fid, '%*f %f', 'HeaderLines', linenum2)];
celldisp(data)
fclose(fid);
% 3 Moment Z
fid = fopen('b3-moment-z-rfile.out','rt');
linenum1 = textscan(fid, '%f %*f', 'HeaderLines',3);
linenum2 = length(linenum1{1}) + 2
fclose(fid);
fid = fopen('b3-moment-z-rfile.out','rt');
data = [data textscan(fid, '%*f %f', 'HeaderLines', linenum2)];
celldisp(data)
fclose(fid);
This creates the array of the results below but each number has brackets around it which I have to manually delete and therefore lose anytime I saved using the code.
1×3 cell array
{[1106763]} {[1108402]} {[1039780]}
Does anyone know how to get rid of these brackets? I tried using erasePunctuation and isstrprop but could not get them to work. Can anyone help me?
Guillaume
Guillaume on 17 Mar 2020
Firstly, if you end up copy/pasting the same code and doing slight tweaks to each copy, there's always a better of doing that. Computers are very good at doing repetitive tasks, so why are you doing the repetition yourself.
Now, with regards to your question, can you attach two different example of the files your dealing with. We don't need the full files, the first 10 rows would suffice. Screenshots are not very useful, we can't import them in matlab. Actual files are much better.
Note that for a number of years now, matlab has functions that make importing text files very easy, see readmatrix or readtable. They're usually much easier to use than textscan, they open and close the file for you, usually figure out the header, delimiter and format themselves, and output directly a matrix or table.
"A lot of the files have unique names so I will have to copy the code and change the names for the text file referenced". Again, copy-pasting and modifying code is the wrong approach. Let the computer do the repetition. The basics would be:
  • write a generic function that takes the full filename as an input and does whatever processing that needs to be done.
  • Obtain the list of file some way, possibly just a dir call with the appropriate filter. At worse, input the list of file as a cell array
  • Loop over the list of file, calling the function for each file.
"This creates the array of the results below but each number has brackets [...] Does anyone know how to get rid of these brackets? I tried using erasePunctuation and isstrprop but could not get them to work"
I'm afraid you're lacking some basic understanding of matlab here. I'm not the free course matlab onramp covers cell arrays, but if you haven't done that course, I'd recommend you go through it anyway. What you show is a cell array containing matrices. The brackets are the way matlab shows to you that the cell array contains matrices. The brackets are not part of the data. They're only for display. Trying to use erasePunctuation on something that is not text at all is meaningless.
In any case, if you use readmatrix you'd be dealing with matrices directly. Please attach examples so I can show you better code.
Jack Morgan
Jack Morgan on 17 Mar 2020
Thank you for your reply, it is really helpful. What I have done currently is mostly what I’ve picked up googling things.
I am going to be away from my computer for the next few hours so can’t access MATLAB but I’ve attached three files. These three have very similar names but the others won’t have. All of them will be similar though in that column 1 will be a count from 1 to n and column 2 will be the data. I am only interested in the final entry.
Thank you if you’re able to help at all.

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 17 Mar 2020
As we don't yet know how to identify the files to be processed, nor how exactly the data is to be exported to excel, the below just focuses on the file import:
filelist = {'b1-moment-z-rfile.txt', 'b2-moment-z-rfile.txt', 'b3-moment-z-rfile.txt'}; %it's not clear yet how this is to be obtained
datatoexport = cell(size(filelist)); %container for data to export as this is also not clear
for filenum = 1:numel(filelist)
wholecontent = readtable(filelist{filenum}); %read whole file, readtable automatically detects formatting and header and uses the header to name the table variables
datatoexport{filenum} = wholecontent(end, :); %keep last row of table (header always come with it)
end
With the above, you will get a warning for each file that matlab modifies the header to make it valid variable names (it removes the ( and " and replaces the - by _). The warning can be turned off with
warning('off', 'MATLAB:table:ModifiedAndSavedVarnames');
or matlab can be told not to do the replacement (thus keeping the (" in the variable names) with the 'PreserveVariableNames', false option of readtable.
Once details of how to identify files and what export is desired, the above can be modified accordingly.

  13 Comments

Jack Morgan
Jack Morgan on 18 Mar 2020
great thank you! New error now:
Error using readmatrix (line 148)
The extension '.out' is not a known text or spreadsheet extension. To specify the file type, use 'FileType' with either 'spreadsheet' or 'text'.
e.g opts = detectImportOptions(name,'FileType','text')
I am not sure where exactly to use FileType though, to be honest most of the code is beyond me now.
Guillaume
Guillaume on 18 Mar 2020
Oh, of course, well just do as it tells you, adds 'FileType', 'text' to the detectImportOptions calls.
[...] well yes, the error message is a bit misleading since there's no detectImportOptions in the code. It's in the readmatrix call that it needs to be added. See edits.
Jack Morgan
Jack Morgan on 19 Mar 2020
This worked fantastic, much better than I expected. Thank you for all your help.

Sign in to comment.

More Answers (2)

Kevin Chng
Kevin Chng on 17 Mar 2020
Accept my answer if it is working for you. Thanks
fid = fopen('12.txt','rt');
indata = textscan(fid, '%f %f', 'HeaderLines',3);
fclose(fid);
%your data is in indata

  2 Comments

Jack Morgan
Jack Morgan on 17 Mar 2020
Hi,
I changed the '12.txt' part to the name of my file and I am getting an error on line 2.
Jack Morgan
Jack Morgan on 17 Mar 2020
So with the following I get the value from the last row (in this case 2500) as indata.
fid = fopen('b2-moment-z-rfile.out','rt');
indata = textscan(fid, '%f %f', 'HeaderLines',2502);
fclose(fid);
%your data is in indata
What I would like is to repeat this for several text files and all the data be in a way I can copy it into Excel in one go.
Also worth noting the number of rows may change so is there a way to account for this?

Sign in to comment.


Image Analyst
Image Analyst on 17 Mar 2020
To process a sequence of files, place your code in the middle of a for loop over files.
For code samples, see the FAQ: FAQ#How_can_I_process_a_sequence_of_files?

  2 Comments

Jack Morgan
Jack Morgan on 17 Mar 2020
A lot of the files have unique names so I will have to copy the code and change the names for the text file referenced. The issue is the output - as it is in the latest code i posted indata gets overwritten each time. Something like an array where each new result is added to a new column would be good.
Walter Roberson
Walter Roberson on 17 Mar 2020
Is there any pattern that can be used to distinguish the files you want to process from the other ones you do not want to process in the same directory? For example do you want to process all of the *moment-z-rfile.txt files? If so then you can use
dinfo = dir('*moment-z-rfile.txt');
filenames = {dinfo.name};
numfiles = length(filenames);
for K = 1 : numfiles
thisfile = filenames{K};
[folder, basename, ext] = fileparts(thisfile);
outfile = fileparts(folder, [basename '.out']);
....
end

Sign in to comment.

Sign in to answer this question.


Translated by