Line numbers when files are being saved in cell array
Afficher commentaires plus anciens
I asked a question here on mathworks previously inquiring how to remove the line numbers from a file. In that case, I was creating smaller files from a larger one and removing the line numbers on each row. My query is today, how might I adapt that code when I am not creating new text files, rather creating a cell array. I have the code that I used previously but the command for removing the line numbers is built into an fprintf statement. I now wish to take this same code and apply it to a cell array. The code I had been using was
str = fgetl(fid); % read the file line by line removing line breaks
[row,~,~,idx] = sscanf(str,'%f');
fprintf(f2d,'%s\n',str(idx+1:end)); %ignore the line number of each row
Obviously this code was more integrated into the original code. My problem arises with trying to insert the '%s\n',str(idx+1:end) code into the generation of the cell array. Is it possible to continue using textscan? As fgetl and sscanf are reading the data already making textscan superfluous. Since there is no command now to create text files, I am unsure where '%s\n',str(idx+1:end) can go. It seems that textscan is a much faster way of reading the data than fgetl as it does not need to go line by line through the data. Is there a method to combine the line number removal method with textscan or must i develop a new way for the code to remove the line number?
filePattern = fullfile(myFolder, '*.asc'); % Call all files with '.asc' from the chosen folder
Files = dir(filePattern); % list folder contents
finishCell = cell(length(Files));
for K = 1 : length(Files) % for all files files in the folder
baseFileName = Files(K).name;
FileName = fullfile(myFolder, baseFileName);
fid = fopen(FileName); % open the file from chosen folder
str = fgetl(fid); % read the file line by line removing line breaks
[row,~,~,idx] = sscanf(str,'%f'); % ignore the line number of each row
Cell = textscan( fid, '%f', 'delimiter', ';'); % scanning data from files
fclose(fid); % close file from chosen folder
data = cell2mat(Cell); % convert the cell data to matrix
N = 1024; % Number of numbers per row
Finish0 = reshape(data, N, [])'; % reshape the data into the correct format
finishCell{K} = Finish0;
end
Essentially, is it possible to remove line numbers from files and then save them in a cell array without making new text files?
4 commentaires
"Essentially, is it possible to remove line numbers from files and then save them in a cell array without making new text files?"
Possibly. But before you get everyone excited about how trivial this is, please put links to your earlier questions on this topic, and make a note of the size of the file.
Aaron Smith
le 12 Juin 2017
@Aaron Smith: I think you should approach this as a new task, rather than trying to adapt that code that I gave you earlier. Your original file was, if my memory serves me correctly, a text file of about 200 megabytes, with 1025*1024 values in each row. Processing all 200 MB by importing it as numeric data proved to be troublesome, which is why I showed you how to split in into smaller files by reading and writing each line as text: not particularly fast, but it avoided all of the "out of memory" errors.
Now you have smaller files that could conceivably be handled by some numeric importation operator, and depending on what your goals are this might be a better solution for the smaller files.
Summary: do not adapt the old code (it had a very different purpose). Trying the standard MATLAB numeric importation functions would be worthwhile. Or using tall arrays.
Could you please:
- upload two cut-down (not full size) files in a new comment.
- describe how you want the data to be once it is in MATLAB memory: do you want it as numeric data, or kept as string data?
Aaron Smith
le 13 Juin 2017
Réponses (1)
This simple code works on the cut-down files, you can try it on the full-size files and see what happens. I used dlmread to read the entire numeric array (this is fast and efficient), and then simply ignore the first column (easily using indexing):
P = 'absolute/relative path to where the files are saved';
S = dir(fullfile(P,'cut down *.txt'));
S = natsortfiles(S); % optional, see below.
for k = 1:numel(S)
M = dlmread(fullfile(P,S(k).name),';');
S(k).data = M(:,2:end);
end
Giving:
size(S)
size(S(1).data)
size(S(2).data)
S(1).data
Note that I used my FEX submission natsortfiles to sort the filenames into numeric order. You can download natsortfiles here:
The files I used for testing are attached here:
20 commentaires
Aaron Smith
le 13 Juin 2017
Your code is very confusing. For example:
- Why do you use fopen and generate a file ID fid when this is never used by anything? My (working and tested) code did not use fopen. Why do you need it?
- Note that dlmread returns a numeric matrix, therefore it is pointless to apply cell2mat to its output. Did I use cell2mat anywhere?
- A dialog box is NOT a "command window" in MATLAB terminology (or any other language that I have ever used).
- You call natsortfiles to return a sorted cell array of file names, but inside the loop you instead access the name field of the unsorted structure returned by dir.
- Using Cell as a variable names is a bad idea because it is so similar to the inbuilt cell command and it is totally misleading (as the output of dlmread is a numeric array, not a cell array).
- I have no idea what that reshape things at the end is supposed to do, it makes no sense to me, and appears to be totally unused. Can you explain that please.
In any case, here is a significantly simplified version of your code, with all the unused bits (e.g. fopen) removed (untested):
P = 'absolute/relative path to where the files are saved';
S = dir(fullfile(P,'*.asc'));
S = natsortfiles(S);
C = cell(size(S));
for k = 1:numel(S)
F = fullfile(P,S(k).name);
M = dlmread(F,';');
C{k} = M(:,2:end);
end
You might like to actually try the code and read the documentation for the functions that you are using (e.g. dlmread) before making random unnecessary changes.
Aaron Smith
le 13 Juin 2017
Stephen23
le 13 Juin 2017
@Aaron Smith: well, it was worth a try. If you are getting an "out of memory" error then you can try importing the files line-by-line, but I suspect that you might get an "out-of-memory" error just trying to hold all of that data in memory. It would be worth reading this:
You should be prepared for changing your algorithm, if is not possible to store all of those matrices in memory.
But first lets try Plan B, reading line-by-line:
S = dir('cut down *.txt');
N = natsortfiles({S.name});
C = cell(size(N));
D = cell(size(N));
for k = 1:numel(N)
fid = fopen(N{k},'rt');
while ~feof(fid)
vec = sscanf(fgetl(fid),'%f;');
C{k}(end+1,:) = vec(2:end);
end
fclose(fid);
%M = dlmread(N{k},';');
%D{k} = M(:,2:end-1);
end
Tested on the same files as in my answer.
Aaron Smith
le 14 Juin 2017
Steven Lord
le 14 Juin 2017
You never check that your fopen call succeeded before using the file identifier it returns in feof. Call fopen with two output arguments, and if the first output argument is -1 (indicating fopen failed) display the second output (which will be a message explaining why fopen failed.)
Aaron Smith
le 14 Juin 2017
Stephen23
le 14 Juin 2017
Common reasons why fopen fails:
- the file does not exist. I know, you will immediately say "but I know the file exists!". You have to stop thinking like you, and start thinking like your computer. If you tell it to open the file 'A/BBB.txt' it will look for that file in that location. MATLAB cannot magically know that the file is actually somewhere else, or that the name is spelled slightly differently, or that you have changed directory (beginners seem to tend to write slow and unreliable code using cd ). Really really really check that the path and filenames are correct: use exist to check that the file can be seen by MATLAB, or try using dir to get the file names in that directory, then use double to really check if the names are what you expect (i.e. no hidden characters. Yes, it happens, particularly when copy-and-pasting). Providing the wrong name/path is a common mistake that everyone denies doing.
- The file is already open by another application. This can include applications that you think have closed but might still be running somehow in the background... Office products do this sometimes.
- Permission is not granted by the OS for you to open the file. Depending on the OS there can be various reasons why this happens.
- If the file is on a windows symbolic link (Symlink) this may cause problems (in my experience anyway).
- etc, etc
Search this forum for "fopen invalid" and you will get many threads discussing this. As Steven Lord pointed out, you should return the second fopen output too, one simple way is:
[fid,msg] = fopen(filename,'rt');
assert(fid>=3,msg)
Aaron Smith
le 16 Juin 2017
Modifié(e) : Aaron Smith
le 16 Juin 2017
"I used exist to check if the first file in my folder exists, the answer from which was 0. This usually means that the files exist (opposed to -1)."
That is not how exist works. I am sure that you are capable of finding and reading the documentation of functions that you are using, and I encourage you to do so. (first hint: a zero output means that the file was not found) (second hint: you should call exist with the second optional argument)
"Would this impact the fopen error?"
Clearly yes. Have a look at your code: you correctly use fullfile to make the path string (with the folder and file match string) and provide this to dir:
dir(fullfile(myFolder,'*.asc'));
But then inside the loop you call fopen without any folder information at all (and so MATLAB will only look in the current directory):
fid = fopen((N{k}),'rt');
If you had looked in N you would have seen that in only has filenames in it, and no path information: do you expect fopen to magically know the path where the file is located? I did explain common ways that fopen cannot find a file, you might like to read that comment.
The solution is to provide fopen the path, just like you did with dir:
fid = fopen(fullfile(myFolder,N{k}),'rt');
PS: try same thing with exist too!
Aaron Smith
le 27 Juin 2017
@Aaron Smith: did you actually try the solution that I explained in my comment? Here it is again, just in case you missed it "The solution is to provide fopen the path, just like you did with dir:", and then I gave you an example of this. I did not change the MATLAB Search Path anywhere, and did not even mention it anywhere in my comment. Changing the Search Path will not resolve your problem.
Aaron Smith
le 27 Juin 2017
@Aaron Smith: please run this and tell me what msg is:
[fid,msg] = fopen(fullfile(myFolder,N{k}),'rt')
and also what this displays:
exist(fullfile(myFolder,N{k}),'file')
and show this string too:
fullfile(myFolder,N{k})
Aaron Smith
le 28 Juin 2017
@Aaron Smith: it is not clear what you want. Thus far:
- You have not shown the three specific pieces of information that I explicitly requested in my last comment.
- You are running code that contains a basic error that I have explained in detail in earlier comments how to fix ( dlmread without path info).
- You are running code that calls dlmread, which results in an "out of memory error". In the comments above I comment-out dlmread because it causes an out-of- memory error and showed you how to import that file using another method (reading the file line-by-line). Why are you uncommenting dlmread?
- Your last code seems to run correctly, except for the fact that you pointlessly uncommented dlmread (which I had commented out).
- "When I tried this code with the dlmread and D{k} lines commented out, Matlab just continued running. Nothing happened but it would not carry out any other actions" Of course reading that file into memory will be slow. It is a large file. Do you expect it to take one nanosecond to read into memory? You could print a line counter if you are interested in how quickly the file is being read. I suspect that you would be much better off using tall arrays.
It is not clear what you want now, given that the code seems to run correctly (apart from you deciding to use dlmread again, which we already know throws an error on a file that large, and I had commented out in my code).
Aaron Smith
le 28 Juin 2017
@Aaron Smith: one easy alternative to reading line-by-line is to read blocks of data using textscan:
tip: put disp(N{k}) on the first line of the for loop to see how fast it is processing the files.
Aaron Smith
le 29 Juin 2017
@Aaron Smith: the error that you show above has exactly the same cause as all the other times that you have shown this error. You need to pass the path data to any function that you want to use to open/read that file.
(hint one: uigetfile has three outputs. hint two: read the documentation for functions you are using).
Catégories
En savoir plus sur Text Files dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!