Matlab readmatrix inconsistently reading csv files

I'm using matlabs readmatrix function to read in data from a csv file and store to a variable. The csv files are identical in format, with a bunch of lines of text at the start before the data starts at line 21. However, the readmatrix function seems to behave inconsistently, sometimes capturing all the text at the start of the csv and storing as NaN, and other times ignoring these first 21 lines and only grabbing the data. Why is this? What is a better way to do this?

7 commentaires

"Why is this?"
Because the files are different.
"What is a better way to do this?"
Specify the format, e.g. using
and modifying the options based on prior knowledge about the file format.
As per the question, the files are not different. They follow the exact same format, the only change is the value of the data which starts at line 21.
Can you show small samples of two CSV files that are formatted exactly the same but are interpreted by readmatrix differently? Can you also show the exact readmatrix command you're executing that interpreted those two files differently?
Stephen23
Stephen23 le 24 Août 2023
Modifié(e) : Stephen23 le 24 Août 2023
"As per the question, the files are not different. "+
They are different, even if only in the data values. Click the paperclip button to upload two files, which show different behavior when imported using READMATRIX.
Also show your exact code where you import them.
Unfortunately it's on a machine not connected to the internet which makes it difficult. The code I'm using is essentially:
fileName = 'my file path'
fData(:,:) = readmatrix(fileName)
fileName = 'different file path'
oData(:,:)= readmatrix(fileName)
oData skips the text at the start of the csv and goes straight to the data at line 21, whereas fData fills in the first 20 lines with NaN. I'm looking at these csv file now and they look virtually identical. Essentially 20 lines of text, each line starting with !.
I have now found a workaround by specifically referencing the range of interest however I'm still curious as to why this was happening.
Update: I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour.
Stephen23
Stephen23 le 24 Août 2023
Modifié(e) : Stephen23 le 24 Août 2023
"I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour."
Yes, differences between the files is most likely the cause.
Of course the algorithm used by READTABLE et al is not perfect (there is no such thing) and it cannot read minds: what is obevious to a human is not obvious to a machine. It is always possible to trick or confuse an algorithm with the right combination of data or whatever, such things are mathematically unavoidable.
Note that relying on what files "look like" in MS Excel is a number one mistake that you should avoid: MS Excel mangles data in all sorts of horrible ways that look indistinguishable from inside Excel, e.g. adding or changing dlimiters. It can also change data without any warning:
If you want reliable data processing do NOT open and save text files using MS Excel. It is a great tool for Excel spreadsheets... but for anything else... beware of dragons!

Connectez-vous pour commenter.

 Réponse acceptée

Steven Lord
Steven Lord le 24 Août 2023

1 vote

If you know exactly how many header lines your file contains, I would specify the NumHeaderLines name-value argument in your readmatrix call.
Alternately you can create a file import options object using detectImportOptions. Once it's been created check that its properties that specify where the data is located (either DataRange or DataLines) and where any variable metadata is located (VariableNamesLine, VariableDescriptionsLine, VariableUnitsLine, or the corresponding Range properties for SpreadsheetImportOptions) match your expectations for where the data / metadata is located based on the expected format of the files. Once you've confirmed that they match your expectations, pass that import options object into readmatrix as the opts input argument.
If the import options properties don't match what you expect, and reviewing the file doesn't indicate to you why MATLAB is detecting the values for those properties that it is, please send a sample data file that demonstrates this behavior to Technical Support using this link along with the import options object and describe the results you expect. It's possible that you've identified a bug or an ambiguous edge case in the import options detection algorithm.

Plus de réponses (0)

Produits

Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by