Why does 'readtable' generate NaN values for text data imported from a file in R2020a or later?

Question

MathWorks Support Team le 23 Nov 2021

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/1593704-why-does-readtable-generate-nan-values-for-text-data-imported-from-a-file-in-r2020a-or-later

Modifié(e) : MathWorks Support Team le 23 Nov 2021

Réponse acceptée : MathWorks Support Team

Ouvrir dans MATLAB Online

I'm reading in an Excel spreadsheet file that has columns containing numerical and text data.

Depending on the order of the values, 'readtable' appears to choose different data types.

The data in the spreadsheet is:

COL_1     |   COL_2
TEXT_1    |   2
TEXT_2    |   TEXT
-1        |   2

'readtable' returns the following data:

>> readtable('MixedInputs.xlsx')
ans =
  3×2 table
     COL_1        COL_2
    ________      _____
    {'TEXT_1'}       2 
    {'TEXT_2'}     NaN 
    {'-1'  }         2 

COL_1 imports all data as character vectors.

COL_2 has broken data 'NaN' where "TEXT" should be read.

Is this expected behavior? I was expecting 'readtable' to revert all data to character vectors or strings, for any column containing mixed data types.

Connectez-vous pour répondre à cette question.

Answer 1

MathWorks Support Team le 23 Nov 2021

5
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1593704-why-does-readtable-generate-nan-values-for-text-data-imported-from-a-file-in-r2020a-or-later#answer_838259

Modifié(e) : MathWorks Support Team le 23 Nov 2021

Ouvrir dans MATLAB Online

This is the expected behavior in R2020a or later. The 'readtable' function determined that the data type appropriate for the values detected in the second column is a numeric array. However "TEXT" is not numeric. Then, 'readtable' fills in the gaps with appropriate missing values — a 'NaN' value for numeric variables, and an empty character vector for text.

Option 1:

To get a table with character vector columns or string columns, use the 'setvartype' file import option:

filename = 'MixedInputs.xlsx';
opts = detectImportOptions(filename);
opts = setvartype(opts,'char');  % or 'string'
T = readtable(filename,opts)
T =
  3×2 table
     COL_1         COL_2  
    ________      ________
    {'TEXT_1'}    {'2'   }
    {'TEXT_2'}    {'TEXT'}
    {'-1'  }      {'2'   }

Refer to the following documentation page for more information regarding 'setvartype':

https://www.mathworks.com/help/matlab/ref/matlab.io.text.delimitedtextimportoptions.setvartype.html

Option 2:

If your table data has text headers, as is the case here, you can restore the default behavior from previous releases by specifying the ('Format','auto') name-value pair argument. 'readtable' then reads the headers as data, and as a result, it converts all the table variables to text:

>> T = readtable('MixedInputs.xlsx','Format','auto')
T =
  3×2 table
     COL_1         COL_2  
    ________      ________
    {'TEXT_1'}    {'2'   }
    {'TEXT_2'}    {'TEXT'}
    {'-1'  }      {'2'   }

Refer to the following documentation page for more information regarding name-value pair arguments for text files:

https://www.mathworks.com/help/matlab/ref/readtable.html#mw_de8dba79-6cba-4cea-924b-516d8873a6e3

Option 3:

If there are strings and numbers mixed in a single column, then 'readtable' may not be the right function. 'readcell' might be better to represent the data as mixed cell array:

>> C = readcell('MixedInputs.xlsx')
C =
  4×2 cell array
    {'COL_1'}      {'COL_2'}
    {'TEXT_1' }    {[    2]}
    {'TEXT_2' }    {'TEXT' }
    {[   -1]}      {[    2]}

Refer to the following documentation page for more information about 'readcell':

https://www.mathworks.com/help/matlab/ref/readcell.html

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Why does 'readtable' generate NaN values for text data imported from a file in R2020a or later?

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Why does 'readtable' generate NaN values for text data imported from a file in R2020a or later?

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens