Make xlsread stop when encounter NaN
4 views (last 30 days)
chlor thanks on 12 Mar 2020
When I used xlsread I noticed it is retrieving all the cells including the empty ones, in my case the NaN ones. This is taking a much longer time to run while getting information that I don’t need, how do I make xlsread to stop retrieving when it sees a NaN or empty cell?
Walter Roberson on 13 Mar 2020
.xls files are quite different internally than .xlsm or .xlsx files. .xls files are binary files . .xlsm and .xlsx files, on the other hand, are really ZIP files that contain a directory structure of multiple .xml text files that represent all of the cell entries as text format.
The way .xlsm and .xlsx files are read is to unzip them and read the text xml files, and then use text processing tools (such as regexp()) to extract the pieces of text. Then the extracted values are examined and a decision is made whether to convert them to numeric form or to leave them as text. This is done after all the hard work has been done to pull them out of the XML text file.
Because the entire XML text file is processed, you have already done the work of reading in the cells that are going to become nan, and you would not save much time by stopping as soon as you hi t something that would be converted to nan. It would probably take the tools longer to do that, because they would not be able to run the tight loops of already-compiled code. Stopping to make a decision at each point at the MATLAB level is often slower than letting everything be processed in compiled code and having a look at the results later.
xlsread() internally gets back a cell array of character vectors, and post-processes that to turn it into numeric, after it has all been read. It attempts to convert them to numeric. It creates the "raw" (third) output as a cell array containing the numeric form of anything that could be converted to numeric, and the original character vector otherwise. Then it finds all of the leading and trailing rows and columns that entirely converted to nan (non-numeric) and removes those and returns the rest as the first output -- so for example if you had a file that contained
A B C D
10 20 30 40
A B C D
A B C D
then the first and third A B C D row would get removed because they entirely converts to nan and are on the outside, but the second A B C D row has numeric data after it and cannot be removed without distorting the shape. The rectangular matrix
10 20 30 40
50 nan nan nan
60 nan nan nan
nan nan nan nan
80 nan nan nan
would be returned in the first parameter.
If you are using xlsread then unless you specified a range, you will never see trailing rows or columns of entirely nan in the first output: they would be trimmed out of that particular output (but the original data would still be present in the "raw" output)