Read an input file- process it line by line

697 views (last 30 days)
Hi I'm trying to figure out how to process an input file, where you check to see if each line is a word. But I don't have any clue on how to begin doing so. I would use something like "load" I think? Then I would save that word in a variable.

Accepted Answer

Walter Roberson
Walter Roberson on 26 Nov 2011
%at this point, insert the code to initialize the variable you will be storing the words in
%then
fid = fopen('YourFile.txt','rt');
while true
thisline = fgetl(fid);
if ~ischar(thisline); break; end %end of file
%now check whether the string in thisline is a "word", and store it if it is.
%then
end
fclose(fid);
%you have now loaded all of the data.
  3 Comments
Walter Roberson
Walter Roberson on 26 Nov 2011
"while true" is an infinite loop. The loop will be exited by the "break" statement in the "if" inside the loop.
fgetl() returns a string ("string" is defined as an array of type char) *except* when fgetl() encounters end of file. The test is thus to see whether end of file has been it. It is not a test for a "non-word". You have not defined what a "word" or "non-word" is, so you will have to put that in at the point I show the comment.
feof(fid) only ever becomes true when there is no input remaining _and_ a read operation then attempts to read input that is not there. feof() does not look forward to see whether there is more input or not: it just reports on whether the eof flag has been set already by a read operation on an empty buffer having tried and failed to get input. If you had the case where, for example, you had a file that ended in XYZPDQ with no end-of-line indicator, then when a read operation got to that line, it would read the data that is there but *not* set the end-of-line indicator (in this case the buffer was not empty), and feof would not report true until the *next* read attempt found the buffer empty and nothing more to fetch.
This is how the C language standards and POSIX *define* end of file processing.
Because of this, you must test the result of each read operation, to see whether that was the read operation that found end-of-file.
If feof(fid) is true then reading from fid would fail, but feof() is not predictive, so the fact that feof() is false does not mean that a read will succeed.

Sign in to comment.

More Answers (1)

Raymond Chiu
Raymond Chiu on 9 May 2018
See: https://www.mathworks.com/help/matlab/ref/feof.html
fid = fopen('badpoem.txt');
while ~feof(fid)
tline = fgetl(fid);
disp(tline)
end
fclose(fid);
  2 Comments
Walter Roberson
Walter Roberson on 7 Jul 2020
Caution: feof() does not reliably predict end-of-file.
The underlying C and POSIX libraries are explicit that feof is not set until an attempt is made to read past end of file and the attempt fails -- you have to try to pick up the next envelope before you find out that there is no next envelope.
MATLAB documents that at least under some conditions, feof() will be set if there is no further input, even if you have not tried to read the input.
With regards to using fgets() or fgetl(), if the file happens to end in a newline character, then the fgets() or fgetl() logically ends there, before you have looked to see if there are any more characters. If MATLAB can tell you feof() at that point, then it could only do so by "peeking ahead", which is not strictly forbidden but can lead to subtle confusion about the state and would not be recommended by POSIX.
With regards to using fgets() or fgetl(), if the file does not happen to end in a newline character (which is entirely valid), then fgets() or fgetl() would not know they had finished reading until they got told end-of-file by the I/O system, and it is normal and expected the feof() would become true in that case.
When you using fread(), or when you use I/O on a serial port or on a number of other kinds of device, or on memory-mapped files, it is a violation of POSIX for feof() to "predict" end of file before it has been specifically signalled (devices) or before an attempt has been made to read past end of file (fread). I think I have encountered cases where MATLAB is violating that POSIX policy, but I am not certain of that.
TLDR: after using fgets() or fgetl() you should always check that what you got back was not the end of file indicator, instead of relying on feof() already being true if there is no more to read:
fid = fopen('badpoem.txt');
while true
tline = fgetl(fid);
if ~ischar(tline); break; end %end of file
disp(tline);
end
Note: in the case of a line that has no characters other than the end-of-line markers, then the result of fgetl() will be the empty character array, ''

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by