Line numbers when files are being saved in cell array

I asked a question here on mathworks previously inquiring how to remove the line numbers from a file. In that case, I was creating smaller files from a larger one and removing the line numbers on each row. My query is today, how might I adapt that code when I am not creating new text files, rather creating a cell array. I have the code that I used previously but the command for removing the line numbers is built into an fprintf statement. I now wish to take this same code and apply it to a cell array. The code I had been using was
str = fgetl(fid); % read the file line by line removing line breaks
[row,~,~,idx] = sscanf(str,'%f');
fprintf(f2d,'%s\n',str(idx+1:end)); %ignore the line number of each row
Obviously this code was more integrated into the original code. My problem arises with trying to insert the '%s\n',str(idx+1:end) code into the generation of the cell array. Is it possible to continue using textscan? As fgetl and sscanf are reading the data already making textscan superfluous. Since there is no command now to create text files, I am unsure where '%s\n',str(idx+1:end) can go. It seems that textscan is a much faster way of reading the data than fgetl as it does not need to go line by line through the data. Is there a method to combine the line number removal method with textscan or must i develop a new way for the code to remove the line number?
filePattern = fullfile(myFolder, '*.asc'); % Call all files with '.asc' from the chosen folder
Files = dir(filePattern); % list folder contents
finishCell = cell(length(Files));
for K = 1 : length(Files) % for all files files in the folder
baseFileName = Files(K).name;
FileName = fullfile(myFolder, baseFileName);
fid = fopen(FileName); % open the file from chosen folder
str = fgetl(fid); % read the file line by line removing line breaks
[row,~,~,idx] = sscanf(str,'%f'); % ignore the line number of each row
Cell = textscan( fid, '%f', 'delimiter', ';'); % scanning data from files
fclose(fid); % close file from chosen folder
data = cell2mat(Cell); % convert the cell data to matrix
N = 1024; % Number of numbers per row
Finish0 = reshape(data, N, [])'; % reshape the data into the correct format
finishCell{K} = Finish0;
end
Essentially, is it possible to remove line numbers from files and then save them in a cell array without making new text files?

4 commentaires

Stephen23
Stephen23 le 12 Juin 2017
Modifié(e) : Stephen23 le 12 Juin 2017
"Essentially, is it possible to remove line numbers from files and then save them in a cell array without making new text files?"
Possibly. But before you get everyone excited about how trivial this is, please put links to your earlier questions on this topic, and make a note of the size of the file.
https://uk.mathworks.com/matlabcentral/answers/340012-removing-the-line-number-from-a-numerical-matrix
Previous question on this subject. The files are each about 7 to 10 mb. Originally I had a huge file that i needed to separate into smaller files. This is where I had the line number removal stuff before. Now I am looking at smaller source original files (rather than being one huge one containing hundreds of tests, they are each individual test. This just means there is one number too many still on each row. You helped me with previously with removing the line number when creating new files, now the new data needs to be saved straight into a cell array. This is where my difficulty with adapting the old code arose.
Stephen23
Stephen23 le 12 Juin 2017
Modifié(e) : Stephen23 le 12 Juin 2017
@Aaron Smith: I think you should approach this as a new task, rather than trying to adapt that code that I gave you earlier. Your original file was, if my memory serves me correctly, a text file of about 200 megabytes, with 1025*1024 values in each row. Processing all 200 MB by importing it as numeric data proved to be troublesome, which is why I showed you how to split in into smaller files by reading and writing each line as text: not particularly fast, but it avoided all of the "out of memory" errors.
Now you have smaller files that could conceivably be handled by some numeric importation operator, and depending on what your goals are this might be a better solution for the smaller files.
Summary: do not adapt the old code (it had a very different purpose). Trying the standard MATLAB numeric importation functions would be worthwhile. Or using tall arrays.
Could you please:
  1. upload two cut-down (not full size) files in a new comment.
  2. describe how you want the data to be once it is in MATLAB memory: do you want it as numeric data, or kept as string data?
I have attached two cut down files, basically a few cull rows of data from two files. I need the data to be numeric, saved as matrices inside a cell array so that i do not need to create new text files. It will help with keeping memory and not cluttering my desktop with multiple similar versions of the same data. The code I added in the original question is doing what i need it to do, i just need there to be one less value on each row.

Connectez-vous pour commenter.

Réponses (1)

Stephen23
Stephen23 le 13 Juin 2017
Modifié(e) : Stephen23 le 26 Avr 2021
This simple code works on the cut-down files, you can try it on the full-size files and see what happens. I used dlmread to read the entire numeric array (this is fast and efficient), and then simply ignore the first column (easily using indexing):
P = 'absolute/relative path to where the files are saved';
S = dir(fullfile(P,'cut down *.txt'));
S = natsortfiles(S); % optional, see below.
for k = 1:numel(S)
M = dlmread(fullfile(P,S(k).name),';');
S(k).data = M(:,2:end);
end
Giving:
size(S)
ans = 1×2
2 1
size(S(1).data)
ans = 1×2
8 1025
size(S(2).data)
ans = 1×2
15 1025
S(1).data
ans = 8×1025
765.44 0 0 1148.2 765.44 382.72 382.72 1148.2 382.72 1148.2 1913.6 382.72 765.44 382.72 382.72 1148.2 382.72 0 0 382.72 1148.2 0 765.44 382.72 765.44 382.72 382.72 765.44 382.72 382.72 0 382.72 382.72 3827.2 0 382.72 382.72 765.44 382.72 765.44 765.44 765.44 382.72 0 765.44 765.44 382.72 765.44 382.72 1148.2 382.72 765.44 765.44 382.72 1148.2 382.72 0 1530.9 382.72 382.72 382.72 0 382.72 382.72 765.44 382.72 382.72 0 765.44 382.72 382.72 382.72 0 382.72 765.44 382.72 0 382.72 382.72 382.72 1148.2 0 0 765.44 765.44 1148.2 765.44 1148.2 382.72 382.72 0 0 0 382.72 382.72 0 382.72 0 382.72 382.72 765.44 382.72 765.44 382.72 382.72 382.72 382.72 382.72 0 22038 765.44 765.44 382.72 382.72 382.72 765.44 382.72 382.72 0 765.44 765.44 0 0 382.72 765.44 0 1148.2 382.72 382.72 382.72 382.72 0 382.72 0 1148.2 0 1148.2 382.72 382.72 765.44 382.72 382.72 382.72 0 765.44 1530.9 382.72 765.44 0 382.72 382.72 382.72 765.44 765.44 382.72 382.72 382.72 0 1148.2 765.44 382.72 382.72 765.44 0 382.72 382.72 765.44 382.72 765.44 765.44 382.72 382.72 0 765.44 382.72 765.44 765.44 382.72 765.44 765.44 0 1530.9 0 382.72 765.44 765.44 382.72 382.72 382.72 382.72 382.72 382.72 0 382.72 765.44 0 0 765.44 382.72 765.44 765.44 382.72 382.72 765.44 382.72 382.72 382.72 765.44 382.72 0 0 0 382.72 382.72 0 382.72 382.72 382.72 765.44 382.72 382.72 765.44 382.72 382.72 1148.2 382.72 765.44 765.44 765.44 765.44 765.44 382.72 1148.2 765.44 382.72 765.44 1148.2 382.72 382.72 765.44
Note that I used my FEX submission natsortfiles to sort the filenames into numeric order. You can download natsortfiles here:
The files I used for testing are attached here:

20 commentaires

Thanks Stephen. This is what my code looks like now after integrating yours:
myFolder = uigetdir('C:\Users\c13459232\Documents\MATLAB'); % Generate command window to choose a folder
if ~isdir(myFolder) % if the directory is not a valid path
errorMessage = sprintf('Error: the following folder does not exist: \n%s', myFolder); % print this error message
uiwait(warndlg(errorMessage)); % block the execution of program and wait to resume
return;
end
filePattern = fullfile(myFolder, '*.asc'); % Call all files with '.asc' from the chosen folder
Files = dir(filePattern); % list folder contents
N = natsortfiles({Files.name});
finishCell = cell(size(N));
for K = 1 : numel(N) % for all files files in the folder
baseFileName = Files(K).name;
FileName = fullfile(myFolder, baseFileName);
fid = fopen(FileName); % open the file from chosen folder
Cell = dlmread(N{k},';');
fclose(fid); % close file from chosen folder
data = cell2mat(Cell); % convert the cell data to matrix
N = 1024; % Number of numbers per row
Finish0 = reshape(data, N, [])'; % reshape the data into the correct format
finishCell{K} = Cell(:,2:end);
end
There is an error occurring:
Undefined function 'natsortfiles' for input arguments of type 'cell'.
I suspect this is because I simply downloaded your files from file exchange. Where do these files need to be saved? In the Matlab program files? Thanks for your help
Stephen23
Stephen23 le 13 Juin 2017
Modifié(e) : Stephen23 le 26 Avr 2021
unzip them and put them into the current directory, or anywhere on the MATLAB path.
Your code is very confusing. For example:
  • Why do you use fopen and generate a file ID fid when this is never used by anything? My (working and tested) code did not use fopen. Why do you need it?
  • Note that dlmread returns a numeric matrix, therefore it is pointless to apply cell2mat to its output. Did I use cell2mat anywhere?
  • A dialog box is NOT a "command window" in MATLAB terminology (or any other language that I have ever used).
  • You call natsortfiles to return a sorted cell array of file names, but inside the loop you instead access the name field of the unsorted structure returned by dir.
  • Using Cell as a variable names is a bad idea because it is so similar to the inbuilt cell command and it is totally misleading (as the output of dlmread is a numeric array, not a cell array).
  • I have no idea what that reshape things at the end is supposed to do, it makes no sense to me, and appears to be totally unused. Can you explain that please.
In any case, here is a significantly simplified version of your code, with all the unused bits (e.g. fopen) removed (untested):
P = 'absolute/relative path to where the files are saved';
S = dir(fullfile(P,'*.asc'));
S = natsortfiles(S);
C = cell(size(S));
for k = 1:numel(S)
F = fullfile(P,S(k).name);
M = dlmread(F,';');
C{k} = M(:,2:end);
end
You might like to actually try the code and read the documentation for the functions that you are using (e.g. dlmread) before making random unnecessary changes.
Fopen is a holdover from when textscan was used. I just quickly combined the codes to test it out so I hadn't gone over it to remove things that were unnecessary. Reshape and cell2mat are also remnants of the previous code where they were used.
An out of memory error occurs when running your code:
Error using dlmread (line 139)
Out of memory. Type HELP MEMORY for your options.
Is this likely because of the number of files that need to be read? This is what fgetl was introduced to combat previously
@Aaron Smith: well, it was worth a try. If you are getting an "out of memory" error then you can try importing the files line-by-line, but I suspect that you might get an "out-of-memory" error just trying to hold all of that data in memory. It would be worth reading this:
You should be prepared for changing your algorithm, if is not possible to store all of those matrices in memory.
But first lets try Plan B, reading line-by-line:
S = dir('cut down *.txt');
N = natsortfiles({S.name});
C = cell(size(N));
D = cell(size(N));
for k = 1:numel(N)
fid = fopen(N{k},'rt');
while ~feof(fid)
vec = sscanf(fgetl(fid),'%f;');
C{k}(end+1,:) = vec(2:end);
end
fclose(fid);
%M = dlmread(N{k},';');
%D{k} = M(:,2:end-1);
end
Tested on the same files as in my answer.
This code does not work due to an error with feof.
Error using feof
Invalid file identifier. Use fopen to generate a valid file identifier.
This has been a persistent error whenever I have used a while loop and I'm unclear on exactly why it occurs. fid has been specified so the invalidity of the file identifier is confusing. Would reading the data line by line not take an extremely long time? Also, there should be enough memory for the data as I have previously separated a large file containing all of these files and then created a cell array from it and there was no memory error
You never check that your fopen call succeeded before using the file identifier it returns in feof. Call fopen with two output arguments, and if the first output argument is -1 (indicating fopen failed) display the second output (which will be a message explaining why fopen failed.)
For what reason would fopen fail?
Common reasons why fopen fails:
  • the file does not exist. I know, you will immediately say "but I know the file exists!". You have to stop thinking like you, and start thinking like your computer. If you tell it to open the file 'A/BBB.txt' it will look for that file in that location. MATLAB cannot magically know that the file is actually somewhere else, or that the name is spelled slightly differently, or that you have changed directory (beginners seem to tend to write slow and unreliable code using cd ). Really really really check that the path and filenames are correct: use exist to check that the file can be seen by MATLAB, or try using dir to get the file names in that directory, then use double to really check if the names are what you expect (i.e. no hidden characters. Yes, it happens, particularly when copy-and-pasting). Providing the wrong name/path is a common mistake that everyone denies doing.
  • The file is already open by another application. This can include applications that you think have closed but might still be running somehow in the background... Office products do this sometimes.
  • Permission is not granted by the OS for you to open the file. Depending on the OS there can be various reasons why this happens.
  • If the file is on a windows symbolic link (Symlink) this may cause problems (in my experience anyway).
  • etc, etc
Search this forum for "fopen invalid" and you will get many threads discussing this. As Steven Lord pointed out, you should return the second fopen output too, one simple way is:
[fid,msg] = fopen(filename,'rt');
assert(fid>=3,msg)
Aaron Smith
Aaron Smith le 16 Juin 2017
Modifié(e) : Aaron Smith le 16 Juin 2017
I am not specifying files for the code, rather selecting a code from which to take the files:
myFolder = uigetdir('C:\Users\c13459232\Documents\MATLAB'); % Generate command window to choose a folder
if ~isdir(myFolder) % if the directory is not a valid path
errorMessage = sprintf('Error: the following folder does not exist: \n%s', myFolder); % print this error message
uiwait(warndlg(errorMessage)); % block the execution of program and wait to resume
return;
end
S = dir(fullfile(myFolder,'*.asc'));
N = natsortfiles({S.name});
C = cell(size(N));
D = cell(size(N));
for k = 1:numel(N)
fid = fopen((N{k}),'rt');
while ~feof(fid)
vec = sscanf(fgetl(fid),'%f;');
C{k}(end+1,:) = vec(2:end);
end
fclose(fid);
%M = dlmread(N{k},';');
%D{k} = M(:,2:end-1);
end
Would this impact the fopen error? I have shut down my desktop to insure nothing else is using the files in question. I used exist to check if the first file in my folder exists, the answer from which was 0. This usually means that the files exist (opposed to -1). When I specified the type of file to open rather than the folder to take the files from, fid never appeared in the workspace.
Stephen23
Stephen23 le 16 Juin 2017
Modifié(e) : Stephen23 le 27 Juin 2017
"I used exist to check if the first file in my folder exists, the answer from which was 0. This usually means that the files exist (opposed to -1)."
That is not how exist works. I am sure that you are capable of finding and reading the documentation of functions that you are using, and I encourage you to do so. (first hint: a zero output means that the file was not found) (second hint: you should call exist with the second optional argument)
"Would this impact the fopen error?"
Clearly yes. Have a look at your code: you correctly use fullfile to make the path string (with the folder and file match string) and provide this to dir:
dir(fullfile(myFolder,'*.asc'));
But then inside the loop you call fopen without any folder information at all (and so MATLAB will only look in the current directory):
fid = fopen((N{k}),'rt');
If you had looked in N you would have seen that in only has filenames in it, and no path information: do you expect fopen to magically know the path where the file is located? I did explain common ways that fopen cannot find a file, you might like to read that comment.
The solution is to provide fopen the path, just like you did with dir:
fid = fopen(fullfile(myFolder,N{k}),'rt');
PS: try same thing with exist too!
I used path to check if the folders I need were in the path and it appeared that they were not. I then used pathtool to add the necessary folders to the path and this still resulted in the same fopen error:invalid permission
Stephen23
Stephen23 le 27 Juin 2017
Modifié(e) : Stephen23 le 27 Juin 2017
@Aaron Smith: did you actually try the solution that I explained in my comment? Here it is again, just in case you missed it "The solution is to provide fopen the path, just like you did with dir:", and then I gave you an example of this. I did not change the MATLAB Search Path anywhere, and did not even mention it anywhere in my comment. Changing the Search Path will not resolve your problem.
I did try this an it did not change the result at all
Stephen23
Stephen23 le 27 Juin 2017
Modifié(e) : Stephen23 le 27 Juin 2017
@Aaron Smith: please run this and tell me what msg is:
[fid,msg] = fopen(fullfile(myFolder,N{k}),'rt')
and also what this displays:
exist(fullfile(myFolder,N{k}),'file')
and show this string too:
fullfile(myFolder,N{k})
myFolder = uigetdir('C:\Users\c13459232\Documents\MATLAB'); % Generate command window to choose a folder
if ~isdir(myFolder) % if the directory is not a valid path
errorMessage = sprintf('Error: the following folder does not exist: \n%s', myFolder); % print this error message
uiwait(warndlg(errorMessage)); % block the execution of program and wait to resume
return;
end
S = dir(fullfile(myFolder,'*.asc'));
N = natsortfiles({S.name});
C = cell(size(N));
D = cell(size(N));
for k = 1:numel(N)
[fid,msg] = fopen(fullfile(myFolder,N{k}),'rt');
while ~feof(fid)
vec = sscanf(fgetl(fid),'%f;');
C{k}(end+1,:) = vec(2:end);
end
fclose(fid);
M = dlmread(N{k},';');
D{k} = M(:,2:end-1);
end
Error using dlmread (line 139)
Out of memory. Type HELP MEMORY for your options.
When I tried this code with the dlmread and D{k} lines commented out, Matlab just continued running. Nothing happened but it would not carry out any other actions. The exist line also generates an answer of 2.
Stephen23
Stephen23 le 28 Juin 2017
Modifié(e) : Stephen23 le 28 Juin 2017
@Aaron Smith: it is not clear what you want. Thus far:
  • You have not shown the three specific pieces of information that I explicitly requested in my last comment.
  • You are running code that contains a basic error that I have explained in detail in earlier comments how to fix ( dlmread without path info).
  • You are running code that calls dlmread, which results in an "out of memory error". In the comments above I comment-out dlmread because it causes an out-of- memory error and showed you how to import that file using another method (reading the file line-by-line). Why are you uncommenting dlmread?
  • Your last code seems to run correctly, except for the fact that you pointlessly uncommented dlmread (which I had commented out).
  • "When I tried this code with the dlmread and D{k} lines commented out, Matlab just continued running. Nothing happened but it would not carry out any other actions" Of course reading that file into memory will be slow. It is a large file. Do you expect it to take one nanosecond to read into memory? You could print a line counter if you are interested in how quickly the file is being read. I suspect that you would be much better off using tall arrays.
It is not clear what you want now, given that the code seems to run correctly (apart from you deciding to use dlmread again, which we already know throws an error on a file that large, and I had commented out in my code).
I don't mean that the action did not complete quickly enough with dlmread commented out. I mean that nothing happened for 20 minutes, 30 minutes, 40 minutes. It showed no signs of doing anything. I will look into using tall arrays instead because if this is just the code working then it takes far too long to ever be usable.
Stephen23
Stephen23 le 28 Juin 2017
Modifié(e) : Stephen23 le 28 Juin 2017
@Aaron Smith: one easy alternative to reading line-by-line is to read blocks of data using textscan:
tip: put disp(N{k}) on the first line of the for loop to see how fast it is processing the files.
File = uigetfile('C:\Users\c13459232\Documents\MATLAB\Fixing this\Bulk');
fid = fopen(File);
N = 1025;
formatspec = '%f';
k= 0;
C = cell(size(k));
while ~feof(fid)
k = k+1;
vec = textscan(fid,formatSpec,N,'Delimiter',';');
C{k}(end+1,:) = vec(2:end);
end
fclose(fid);
Error using feof
Invalid file identifier. Use fopen to generate a valid file identifier.
I tried using the textscan in blocks method and combining it with the code you wrote to create the cell array while removing the line numbers.
I am not sure how I could remove the line numbers on the data when using tall arrays, unless I do it beforehand and then save the data as tall arrays. This still encounters the problem of running out of memory before the data can be fully processed and formatted
Stephen23
Stephen23 le 29 Juin 2017
Modifié(e) : Stephen23 le 26 Avr 2021
@Aaron Smith: the error that you show above has exactly the same cause as all the other times that you have shown this error. You need to pass the path data to any function that you want to use to open/read that file.
(hint one: uigetfile has three outputs. hint two: read the documentation for functions you are using).

Connectez-vous pour commenter.

Produits

Modifié(e) :

le 26 Avr 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by